Creating Your Own Dataset to Query in re:dash
- Create a spark notebook that does the transformations you need, either on
raw data (using Dataset API) or on parquet data
- Output the results of that to an S3 location, usually
This would partition by
submission_date, meaning each day this runs and is
outputted to a new location in S3. Do NOT put the
submission_date in the
parquet file as well! A column name cannot also be the name of a partition.
- Using this template,
open a bug to load the dataset in Presto with the following attributes:
- Assigned to
- Title: "Add Dataset to Presto"
- Content: Location of the dataset and the desired table name