Schemas and Groups Policy
This document outlines the naming conventions, provisioning, and access model for schemas and security groups in the WUSM Data Lake. It is intended for ICS team members, data stewards, and all data lake users. The goal is to ensure consistency, compliance, and effective management of data assets and access across different environments.
Naming Conventions
Schemas (and their corresponding security groups) follow a standardized naming scheme using a prefix that denotes the project type, an identifier, and a department code. Valid prefixes include:
Prefix | Description |
---|---|
data_brokers | Data broker team schemas (ICS or departmental) |
irb | Research study teams (using the IRB# as ID) |
qaqi | Quality assurance/quality improvement teams |
class | Classroom or academic project teams |
ops | Operational or other project teams |
The identifier is typically the IRB number for research studies, or a short name/code for other projects (e.g., a project name or class code). A department or unit code is appended as well. This yields a schema name (and group name) like irb_<IRB#>_<dept>
for a study, or class_<CourseCode>_<dept>
for a class project. This short name is used consistently for the team's workspace and makes the schema easily identifiable. Data broker teams do not include an identifier, consisting only of the data_broker
prefix and the dept
suffix.
All Databricks groups for project teams are prefixed with wusm_datalake_
followed by the team name, matching the schema naming convention.
Schema Provisioning and Access Model
The WUSM Data Lake organizes data into several catalogs, each serving a distinct role in the data lifecycle and access model. Each project or team is provisioned with schemas in the following catalogs as appropriate:
Catalog | Who Gets a Schema? | Example Schema Name | Access Model / Notes |
---|---|---|---|
sandbox | All project, broker, and class teams | irb_12345_med, data_brokers_med | R/W for team only; private development |
review | All project and broker teams | irb_12345_med, data_brokers_med | R/W for ICS/broker; project team read-only |
curated | All project teams (not brokers) | irb_12345_med | R/O for team; R/W for ICS only; approved assets only |
- Project Teams: Provided with schemas in
sandbox
,review
, andcurated
catalogs. The schema names follow the naming convention above. Project teams have R/W access to theirsandbox
schema, read-only access to theirreview
andcurated
schemas. - Data Broker Teams: Provided with schemas in
sandbox
andreview
catalogs, but not incurated
. May have access to additional schemas incurated
orcleansed
as required for their role. - Class Teams: Provided only a schema in
sandbox
. Provisioned access to data in thecurated
schema as requested by the instructor and approved by ICS.
Access Control Summary
- Sandbox: Each team has R/W access to their own schema. No access for other teams or ICS unless explicitly granted.
- Review: ICS and broker teams have R/W access to review schemas. Project teams have read-only access to their review schema for validation and feedback.
- Curated: Only ICS has write access. Project teams and approved users have read-only access to their curated schema.
Lifecycle and Workflow
- Development: Teams develop and test assets in their
sandbox
schema. - Review: Assets are promoted to the
review
schema for validation and approval. ICS/broker teams review and may request changes. - Approval: Once approved, assets are promoted to the
curated
schema (for project teams). Only approved, production-ready assets are published here. - Access: Project teams and approved users can query curated data. Further changes require a new development and review cycle.
Glossary / Acronyms
- Schema: A collection of database objects (tables, views) within a catalog.
- Catalog: A logical grouping of schemas within the data lake, each serving a specific purpose (e.g.,
sandbox
,review
,curated
). - ICS: Institute for Clinical and Translational Sciences.
- Data Broker: A team member or group responsible for building, preparing, and submitting data assets for review and promotion.
- Project Team: A group of individuals associated with a particular study or initiative.
- Class Team: A group of individuals associated with a classroom or academic project.