Schema Cleanup Project Plan

Objective

The goal of this project is to standardize schemas, catalogs, and data access in the WUSM Data Lake. This initiative aims to improve data management, user experience, and compliance by aligning with existing workflows and policies.

Background

The WUSM Data Lake organizes data into several catalogs (sandbox, review, cleansed, and curated) to support different stages of the data lifecycle. However, inconsistencies in schema usage and access policies have led to challenges in data governance and user experience. This project will address these issues by:

  • Migrating user access to appropriate catalogs.
  • Standardizing access policies.
  • Retiring and/or removing access to cleansed schemas.
  • Ensuring alignment with workflows defined in the Data Collaboration Workflow and Data Lake Catalog Policies.

Scope

This project will focus on:

  1. Schema Migration:

    • Migrate users and data from cleansed.epic_clarity_orgfilter to curated.epic_clarity and from cleansed.omop to curated.omop.
    • Ensure tables and views in the curated schemas use standard names that conform to the source system schema and data standards.
    • Establish data source groups for both curated.epic_clarity and curated.omop, starting with an "unrestricted" group for full access to each schema. Additional groups may be created to restrict access to limited datasets as needed. See Data Lake Admin Guide for group conventions and access management.
    • Identify other schemas in cleansed that should be moved to curated based on common access requests.
  2. Access Policy Updates:

    • Ensuring only ICS service principal has write access to cleansed schemas.
    • Granting read access to curated schemas for approved users.
    • Cleaning up groups/permissions for both clarity and omop data sources.
  3. User Communication:

    • Notifying affected users about schema migrations and access changes.
    • Providing timelines and support for code migration.
  4. Audit and Reporting:

    • Generating reports on active users of cleansed schemas.
    • Auditing usage of cleansed.omop and cleansed.epic_clarity over the last two months.

Implementation Plan

Phase 1: Planning and Preparation

  • Timeline: August 2025

  • Tasks:

    • Review existing workflows and policies.
    • Identify schemas in cleansed that require migration.
    • Generate user activity reports for cleansed.omop and cleansed.epic_clarity_orgfilter.
    • Finalize access policies for curated and cleansed catalogs.

Phase 2: Schema Migration

  • Timeline: September 2025

  • Tasks:

    • Migrate cleansed.epic_clarity_orgfilter to curated.epic_clarity.
    • Migrate users from cleansed.omop to curated.omop.
    • Update Databasin pipelines to reflect schema changes.
    • Restrict access to schemas in cleansed catalog to only members of ICS.

Phase 3: Access Policy Updates

  • Timeline: October 2025

  • Tasks:

    • Update permissions for cleansed and curated schemas.
    • Ensure all users have appropriate access to curated schemas.
    • Clean up groups/permissions for clarity and omop.

Phase 4: User Communication and Support

  • Timeline: November 2025

  • Tasks:

    • Notify users about schema migrations and access changes.
    • Provide documentation and support for code migration.
    • Address user feedback and resolve issues.

Phase 5: Audit and Validation

  • Timeline: December 2025

  • Tasks:

    • Audit usage of curated schemas to ensure compliance.
    • Validate that all migrations and access updates are complete.
    • Document lessons learned and update policies as needed.

Roles and Responsibilities

  • ICS DW Team:

    • Perform schema migrations.
    • Update Databasin pipelines.
    • Manage access policies.
  • Project Teams:

    • Review and approve migrated schemas.
    • Update code to use new curated schemas.
  • Platform Engineering Team:

    • Generate user activity reports.
    • Clean up groups/permissions.
    • Provide user support and documentation.

Risks and Mitigation

  • Risk: User resistance to schema changes.

    • Mitigation: Provide clear communication, timelines, and support.
  • Risk: Disruption to existing workflows.

    • Mitigation: Perform migrations during low-usage periods and validate changes.
  • Risk: Data integrity issues during migration.

    • Mitigation: Test migrations in a staging environment before production.

Success Metrics

  • All targeted schemas are migrated to curated.
  • Access policies are updated and enforced.
  • Users successfully transition to new schemas with minimal disruption.
  • Improved user experience and compliance with data governance policies.

References


Updated on August 7, 2025