Schema Cleanup Project Plan
Objective
The goal of this project is to standardize schemas, catalogs, and data access in the WUSM Data Lake. This initiative aims to improve data management, user experience, and compliance by aligning with existing workflows and policies.
Background
The WUSM Data Lake organizes data into several catalogs (sandbox
, review
, cleansed
, and curated
) to support different stages of the data lifecycle. However, inconsistencies in schema usage and access policies have led to challenges in data governance and user experience. This project will address these issues by:
- Migrating user access to appropriate catalogs.
- Standardizing access policies.
- Retiring and/or removing access to
cleansed
schemas. - Ensuring alignment with workflows defined in the Data Collaboration Workflow and Data Lake Catalog Policies.
Scope
This project will focus on:
-
Schema Migration:
- Migrate users and data from
cleansed.epic_clarity_orgfilter
tocurated.epic_clarity
and fromcleansed.omop
tocurated.omop
. - Ensure tables and views in the
curated
schemas use standard names that conform to the source system schema and data standards. - Establish data source groups for both
curated.epic_clarity
andcurated.omop
, starting with an "unrestricted" group for full access to each schema. Additional groups may be created to restrict access to limited datasets as needed. See Data Lake Admin Guide for group conventions and access management. - Identify other schemas in
cleansed
that should be moved tocurated
based on common access requests.
- Migrate users and data from
-
Access Policy Updates:
- Ensuring only ICS service principal has write access to
cleansed
schemas. - Granting read access to
curated
schemas for approved users. - Cleaning up groups/permissions for both
clarity
andomop
data sources.
- Ensuring only ICS service principal has write access to
-
User Communication:
- Notifying affected users about schema migrations and access changes.
- Providing timelines and support for code migration.
-
Audit and Reporting:
- Generating reports on active users of
cleansed
schemas. - Auditing usage of
cleansed.omop
andcleansed.epic_clarity
over the last two months.
- Generating reports on active users of
Implementation Plan
Phase 1: Planning and Preparation
-
Timeline: August 2025
-
Tasks:
- Review existing workflows and policies.
- Identify schemas in
cleansed
that require migration. - Generate user activity reports for
cleansed.omop
andcleansed.epic_clarity_orgfilter
. - Finalize access policies for
curated
andcleansed
catalogs.
Phase 2: Schema Migration
-
Timeline: September 2025
-
Tasks:
- Migrate
cleansed.epic_clarity_orgfilter
tocurated.epic_clarity
. - Migrate users from
cleansed.omop
tocurated.omop
. - Update Databasin pipelines to reflect schema changes.
- Restrict access to schemas in
cleansed
catalog to only members of ICS.
- Migrate
Phase 3: Access Policy Updates
-
Timeline: October 2025
-
Tasks:
- Update permissions for
cleansed
andcurated
schemas. - Ensure all users have appropriate access to
curated
schemas. - Clean up groups/permissions for
clarity
andomop
.
- Update permissions for
Phase 4: User Communication and Support
-
Timeline: November 2025
-
Tasks:
- Notify users about schema migrations and access changes.
- Provide documentation and support for code migration.
- Address user feedback and resolve issues.
Phase 5: Audit and Validation
-
Timeline: December 2025
-
Tasks:
- Audit usage of
curated
schemas to ensure compliance. - Validate that all migrations and access updates are complete.
- Document lessons learned and update policies as needed.
- Audit usage of
Roles and Responsibilities
-
ICS DW Team:
- Perform schema migrations.
- Update Databasin pipelines.
- Manage access policies.
-
Project Teams:
- Review and approve migrated schemas.
- Update code to use new
curated
schemas.
-
Platform Engineering Team:
- Generate user activity reports.
- Clean up groups/permissions.
- Provide user support and documentation.
Risks and Mitigation
-
Risk: User resistance to schema changes.
- Mitigation: Provide clear communication, timelines, and support.
-
Risk: Disruption to existing workflows.
- Mitigation: Perform migrations during low-usage periods and validate changes.
-
Risk: Data integrity issues during migration.
- Mitigation: Test migrations in a staging environment before production.
Success Metrics
- All targeted schemas are migrated to
curated
. - Access policies are updated and enforced.
- Users successfully transition to new schemas with minimal disruption.
- Improved user experience and compliance with data governance policies.