BigQuery Artifact Deployment
Artifacts that get deployed automatically, usually during nightly Airflow runs, to BigQuery include:
- user-defined functions
- datasets
- tables
- views
Dataset Deployment
Terraform will deploy datasets defined in bigquery-etl and private-bigquery-etl and datasets that are configured via cloudops-infra. The dataset deployment is triggered via Jenkins whenever bigquery-etl publishes a new container or after schema deployment.
User-defined Function (UDF) Deployment
There are two categories of user-defined functions:
- UDFs for internal use only: These UDFs are published to the
udf
andudf_js
datasets in themoz-fx-data-shared-prod
project and managed via bigquery-etl - public UDFs that can be used even outside of Mozilla projects: These UDFs are published to the
mozfun
project and managed via bigquery-etl
The UDF deploy is triggered through the publish_public_udfs
and publish_persistent_udfs
Airflow task in the bqetl_artifact_deployment
Airflow DAG which run nightly, but can be triggered manually by clearing the tasks.
Table Deployment
Tables get deployed through the publish_new_tables
Airflow task which runs nightly. This task will run all SQL generators, generate schemas for each query and will deploy schemas for tables and apply any schema-compatible changes for existing tables. The task will fail if changes that are incompatible with existing schemas (such as removing fields or changing field types) are to be applied.
View Deployment
View deployment runs after new tables have been published through the publish_views
Airflow task. This task will run all SQL generators, which ensures that SQL for generated views will be available. Only views with either changes to their SQL definition, schema, metadata or newly created views will be deployed.
Views that have been defined in bigquery-etl will be tagged with a managed
label. This label is used to automatically remove views from BigQuery that have been deleted in bigquery-etl. Having this label ensures that manually created views or views created through other tooling won't get deleted as part of this process.
Views get published to moz-fx-data-shared-prod
and publicly-facing views get published to mozdata
.