TPI Transition Outline

General Topics

  • transparency to the larger team (6/30?)
  • what is Chris' day to day tasks
  • someone should shadow all of Chris' tasks
  • direct DB/DW to us
  • who to shadow: Niel, Ian, and Suhas
  • Chris to send all requests to PE
    • PE to field complete the request with Chris' oversight if needed
  • What does DW need from Chris
    • Meet with Snehil to get her list
  • What does DB need from Chris
    • Ask Bri what Chris does for them
  • Onboarding users
  • Adding new tables/data sources
  • Troubleshooting existing ETLs
  • Chris to assist with moving off of NiFI
    • John & Rob validation of OMOP?
  • TimeTracker knowledge transfer for time spent

Things to Document

Below is a list of things we need to ensure are documented before TPI transitions away from ICS

  • a day in the life of Chris
    • admin / maintenance tasks
  • databasin troubleshooting guide - Ian
    • how to check pipelines and automations in azure
    • how to troubleshoot clarity ETLs
  • how to deploy changes to curate schemas - Ian
  • sandbox vs cleansed vs curated schemas and when to use which - Ian
    • In progress
  • billing process (in progress)
    • review and expand
    • include storage billing info
  • how to make changes to RDC & Datalake architecture - Dave0
    • what repos are most important
    • what can be retired
    • what do we not know
    • inventory storage accounts
  • document RDC (OMOP) deployment process in databricks
    • Suhas?
  • bjc neural frame process
  • data broker auditing details - Chris
    • legacy and databasin auditing
    • external/non-databasin processes
  • how to onboard new data sources - Niel
  • document details on delta load options - Chris
    • working with files and tables
  • Databsin best practices - Chris
    • file based ingestion
  • How to request Databasin support - Ian

Tasks to Complete

Below is a list of things we need TPI to complete before they transition away from ICS

  • Verantos - hand off to PE & DW
  • EPIC - assign an ICS rep that is Tier5 and ingests data
    • Loop in Ian, Niel, and Snehil
  • GPC - Should probably try and get that to run in under 24 hours.
    • Niel and John N
    • In progress, still tweaking performance
    • John to try it with June data with performance updates
  • ArcGIS - All the logic is done, but it needs to live in its own Catalog. Will be easy. - Ian, Niel
    • Finalizing documentation - done
    • Switch target schema
    • Schedule to run 1/week (at least)
    • Shape file ingestion, ad-hoc task once a year
  • Move sparc to Workflow
    • Dev hour task
  • BJC neural frame incremental load code changes/documentation
    • pending BJC SFTP location, Chris to request SFTP site
    • databasin file ingestion, snapshot
    • automation job pulls ingested data and insert into BJC synapse
  • Button up this billing process, so we can start charging. - Ian, Niel
  • Group, permissions, and schema clean up
    • OMOP, Clarity, etc.
  • IMPORTANT: We/I need to move the old databricks jobs to databasin. Most of these are "file drop" jobs. - Ian, Niel, and Jack
    • $0 marketplace install, approval by Amy
    • Move code into repo for notebooks etc that are currently used in workflows and pipelines etc
      • find all existing/legacy notebooks / workflows
      • add some docs about each
      • same as billing process, but all other "administrative" automated tasks
      • remove anything that is no longer needed
  • RDC migration - Suhas, Snehil, Shinji?
    • PE Code review
    • Provide any guidance on issues we find
  • Tempus changes?
  • Onboard PE to Zoho for support
    • Niel, Nicole, and Ian for now
    • Document process to request support

Training for the team

  • Databricks administration / best practices
  • Spark
  • Insert values will not work, need to mass insert via spark or through staging location
  • Packer for devops build agent
    • add azcopy

DW & DB Needs

  • Brokerage

    • help understanding automated tasks
    • audit database
    • table ingestion, mostly done
    • databasin
  • Data warehouse

    • databasin
    • epic clarity pipeline
    • intake/onboarding
      • initial meeting/discovery
      • data sources
      • teams
    • cluster scaling
    • 7GB / 5 hours timeout

Updated on August 7, 2025