Creating a Prototype Data Project on Google Cloud Platform

If you are working on a more complex project (as opposed to ad-hoc or one-off analysis) which you intend to be run in production at some point, it may be worthwhile provisioning a separate prototype GCP project for it with access to our datasets. From the Google Cloud Console, you may then:

  • Provision service accounts for querying BigQuery (including our production tables) or accessing other GCP resources from the command-line or inside Docker containers
  • Write and query data to private BigQuery tables, without worrying about interfering with what we have in production
  • Make Docker images available via the Google Container Registry (see the cookbook on deploying containers)
  • Create Google Cloud Storage buckets for storing temporary data
  • Create Google Compute Instances for test-running software in the cloud
  • Create a temporary Kubernetes cluster for test-running a scheduled job with telemetry-airflow
  • Create static dashboards with protosaur (see Creating Static Dashboards with Protosaur)
  • Track the costs for all of the above using the Google Cost Dashboard feature of the GCP console

This has a number of advantages over our traditional approach of creating bulk "sandbox" projects for larger teams:

  • Easy to track costs of individual components
  • Can self-serve short-lived administrative credentials which exist only for the lifespan of the project.
  • Can easily spin down projects and resources which have run their course

Note that these prototype GCP projects are not intended to be used for projects which are already in production-- those should be maintained on operations-supported projects, presumably after a development phase. Nor are they meant for ad-hoc analysis or experimentation-- for that, just file a request as outlined in the Accessing BigQuery cookbook.

Each sandbox project has a data engineering contact associated with it: this is the person that will create the project for you. Additionally, they are meant to be a resource you can freely ask for advice on how to query or use GCP, and how to build software that lends itself to productionization. If you are a data engineer, the data engineering contact may be yourself, but you should still follow the procedure below for tracking purposes in any case.

To request the creation of a prototype GCP project, file a bug using the provided template. Not sure if you need a project like this? Don't know who to specify as a Data Engineering contact? Not sure what your project budget might be? Get in touch with the data platform team.

We are currently tracking these projects on mana (link requires Mozilla LDAP)