Automated Notebooks Tutorial
This tutorial will provide guidance and suggested practices to ensure a notebook is ready to be used in an automation.
Notebook Parameters
- notebook widgets should be used to define which catalog and schema the code targets
- development work and testing should be done in the
sandbox
catalog - automated pipelines should typically point to the
curated
catalog - notebook parameters allow the automation to dynamically point the code to the correct location
Notebook Location
- ensure the notebook is located in the appropriate git repository
- it is important that the path of the notebook does not change
- automations always execute code from the
main
branch of a repository - pull requests should be enforced when merging code into the main branch
- code reviews during the pull request ensure code quality
- always be sure to keep the code in sync with the remote repository through frequent pulls and/or rebasing main onto your working branch
Scheduling and Performance
when building a notebook that will execute on a recurring basis, it is important to consider how the data will be managed
- ensure the notebook chooses the correct strategy for querying the data
- look for watermark columns that can be used to limit the amount of data that is initially returned from the source
- try to do delta/incremental data loads to prevent the need to continuously reload the same data