TPI Transition Outline

General Topics

transparency to the larger team (6/30?)
what is Chris' day to day tasks
someone should shadow all of Chris' tasks
direct DB/DW to us
who to shadow: Niel, Ian, and Suhas
Chris to send all requests to PE
- PE to field complete the request with Chris' oversight if needed
What does DW need from Chris
- Meet with Snehil to get her list
What does DB need from Chris
- Ask Bri what Chris does for them
Onboarding users
Adding new tables/data sources
Troubleshooting existing ETLs
Chris to assist with moving off of NiFI
- John & Rob validation of OMOP?
TimeTracker knowledge transfer for time spent

Things to Document

Below is a list of things we need to ensure are documented before TPI transitions away from ICS

a day in the life of Chris
- admin / maintenance tasks
databasin troubleshooting guide - Ian
- how to check pipelines and automations in azure
- how to troubleshoot clarity ETLs
how to deploy changes to curate schemas - Ian
sandbox vs cleansed vs curated schemas and when to use which - Ian
- In progress
billing process (in progress)
- review and expand
- include storage billing info
how to make changes to RDC & Datalake architecture - Dave0
- what repos are most important
- what can be retired
- what do we not know
- inventory storage accounts
document RDC (OMOP) deployment process in databricks
- Suhas?
bjc neural frame process
data broker auditing details - Chris
- legacy and databasin auditing
- external/non-databasin processes
how to onboard new data sources - Niel
document details on delta load options - Chris
- working with files and tables
Databsin best practices - Chris
- file based ingestion
How to request Databasin support - Ian

Tasks to Complete

Below is a list of things we need TPI to complete before they transition away from ICS

Verantos - hand off to PE & DW
EPIC - assign an ICS rep that is Tier5 and ingests data
- Loop in Ian, Niel, and Snehil
GPC - Should probably try and get that to run in under 24 hours.
- Niel and John N
- In progress, still tweaking performance
- John to try it with June data with performance updates
ArcGIS - All the logic is done, but it needs to live in its own Catalog. Will be easy. - Ian, Niel
- Finalizing documentation - done
- Switch target schema
- Schedule to run 1/week (at least)
- Shape file ingestion, ad-hoc task once a year
Move sparc to Workflow
- Dev hour task
BJC neural frame incremental load code changes/documentation
- pending BJC SFTP location, Chris to request SFTP site
- databasin file ingestion, snapshot
- automation job pulls ingested data and insert into BJC synapse
Button up this billing process, so we can start charging. - Ian, Niel
- completed storage script
- WUSM azure storage analysis (notebook name)
  - https://adb-7423990253170059.19.azuredatabricks.net/editor/notebooks/4226720711723220?o=7423990253170059#command/4766097712866535
- add tags to script
- move to wusm-data-lake-automations
- verify it is crawling the metastore
- move all billing notebooks into monorepo/Billing
Group, permissions, and schema clean up
- OMOP, Clarity, etc.
IMPORTANT: We/I need to move the old databricks jobs to databasin. Most of these are "file drop" jobs. - Ian, Niel, and Jack
- $0 marketplace install, approval by Amy
- Move code into repo for notebooks etc that are currently used in workflows and pipelines etc
  - find all existing/legacy notebooks / workflows
  - add some docs about each
  - same as billing process, but all other "administrative" automated tasks
  - remove anything that is no longer needed
RDC migration - Suhas, Snehil, Shinji?
- PE Code review
- Provide any guidance on issues we find
Tempus changes?
- Move to repository
- Get RAW data from SFTP https://databasin.wustl.edu/projects/uJEj8d/pipelines/70 (adls.file_tempus)
- Chris made changes in code already
- Document tempus process/code
- waiting on Snehil for direction, key not being parsed correctly by spark
  - clinical trials key, maybe case sensitive now?
  - need to re-run all tempus data provided to date
Onboard PE to Zoho for support
- Niel, Nicole, and Ian for now
- Document process to request support

Training for the team

Databricks administration / best practices
Spark
Insert values will not work, need to mass insert via spark or through staging location
Packer for devops build agent
- add azcopy

DW & DB Needs

Brokerage
- help understanding automated tasks
- audit database
- table ingestion, mostly done
- databasin
Data warehouse
- databasin
- epic clarity pipeline
- intake/onboarding
  - initial meeting/discovery
  - data sources
  - teams
- cluster scaling
- 7GB / 5 hours timeout

Data Sources

Platforms

Teams

TPI Transition Outline

General Topics

Things to Document

Tasks to Complete

Training for the team

DW & DB Needs

Table of Contents