Artifacts that get deployed automatically, usually during nightly Airflow runs, to BigQuery include:
- user-defined functions
Terraform will deploy datasets defined in bigquery-etl and private-bigquery-etl and datasets that are configured via cloudops-infra. The dataset deployment is triggered via Jenkins whenever bigquery-etl publishes a new container or after schema deployment.
There are two categories of user-defined functions:
- UDFs for internal use only: These UDFs are published to the
udf_jsdatasets in the
moz-fx-data-shared-prodproject and managed via bigquery-etl
- public UDFs that can be used even outside of Mozilla projects: These UDFs are published to the
mozfunproject and managed via bigquery-etl
The UDF deploy is triggered through the
publish_persistent_udfs Airflow task in the
bqetl_artifact_deployment Airflow DAG which run nightly, but can be triggered manually by clearing the tasks.
Tables get deployed through the
publish_new_tables Airflow task which runs nightly. This task will run all SQL generators, generate schemas for each query and will deploy schemas for tables and apply any schema-compatible changes for existing tables. The task will fail if changes that are incompatible with existing schemas (such as removing fields or changing field types) are to be applied.
View deployment runs after new tables have been published through the
publish_views Airflow task. This task will run all SQL generators, which ensures that SQL for generated views will be available. Only views with either changes to their SQL definition, schema, metadata or newly created views will be deployed.
Views that have been defined in bigquery-etl will be tagged with a
managed label. This label is used to automatically remove views from BigQuery that have been deleted in bigquery-etl. Having this label ensures that manually created views or views created through other tooling won't get deleted as part of this process.
Views get published to
moz-fx-data-shared-prod and publicly-facing views get published to