RDC ETL v2 Workflow
RDC_ETL_Notebook (main notebook)
- imports libraries
- set notebook path
- get paths and parameters
- Notebook definitions
- Starting the RDC ETL process
- Get configuration info from config_main table
-
Check the etl run status etl_status_sql_string
- Get this run's
- run_type (full,transform)
- main_id
- Department_name
- TargetOMOPEnv (dev, qa, prod)
- Get this run's
-
Increment Job_id for new job (unless reuse_job_id = true)
-
If configuration is active and is not currently running
- Update config_main table to indicate that it's running now
Stage Process
Step 1: Populate the metadata table
- Populate information tables populate_information_tables
Step 2: Truncate all relevant stage_omop tables
- Truncate the stage tables truncate_stage_tables
Step 3: Set up variables
- Get list of stage functions
- Get the last run date
- For each enabled record, run the named notebook
OMOP Process
- RDC definitions
- Get delta rules (job_def_sql_string)
- For each enabled record, call omop_stagings
- Apply NK
- Apply Concept
- Error Recording
- Delta Process
- Merge Process
-
- Get configuration info from config_main table