Schemas and Groups Policy

This document outlines the naming conventions, provisioning, and access model for schemas and security groups in the WUSM Data Lake. It is intended for ICS team members, data stewards, and all data lake users. The goal is to ensure consistency, compliance, and effective management of data assets and access across different environments.

Naming Conventions

Schemas (and their corresponding security groups) follow a standardized naming scheme using a prefix that denotes the project type, an identifier, and a department code. Valid prefixes include:

Prefix Description
data_brokers Data broker team schemas (ICS or departmental)
irb Research study teams (using the IRB# as ID)
qaqi Quality assurance/quality improvement teams
class Classroom or academic project teams
ops Operational or other project teams

The identifier is typically the IRB number for research studies, or a short name/code for other projects (e.g., a project name or class code). A department or unit code is appended as well. This yields a schema name (and group name) like irb_<IRB#>_<dept> for a study, or class_<CourseCode>_<dept> for a class project. This short name is used consistently for the team's workspace and makes the schema easily identifiable. Data broker teams do not include an identifier, consisting only of the data_broker prefix and the dept suffix.

All Databricks groups for project teams are prefixed with wusm_datalake_ followed by the team name, matching the schema naming convention.

Schema Provisioning and Access Model

The WUSM Data Lake organizes data into several catalogs, each serving a distinct role in the data lifecycle and access model. Each project or team is provisioned with schemas in the following catalogs as appropriate:

Catalog Who Gets a Schema? Example Schema Name Access Model / Notes
sandbox All project, broker, and class teams irb_12345_med, data_brokers_med R/W for team only; private development
review All project and broker teams irb_12345_med, data_brokers_med R/W for ICS/broker; project team read-only
curated All project teams (not brokers) irb_12345_med R/O for team; R/W for ICS only; approved assets only
  • Project Teams: Provided with schemas in sandbox, review, and curated catalogs. The schema names follow the naming convention above. Project teams have R/W access to their sandbox schema, read-only access to their review and curated schemas.
  • Data Broker Teams: Provided with schemas in sandbox and review catalogs, but not in curated. May have access to additional schemas in curated or cleansed as required for their role.
  • Class Teams: Provided only a schema in sandbox. Provisioned access to data in the curated schema as requested by the instructor and approved by ICS.

Access Control Summary

  • Sandbox: Each team has R/W access to their own schema. No access for other teams or ICS unless explicitly granted.
  • Review: ICS and broker teams have R/W access to review schemas. Project teams have read-only access to their review schema for validation and feedback.
  • Curated: Only ICS has write access. Project teams and approved users have read-only access to their curated schema.

Lifecycle and Workflow

  1. Development: Teams develop and test assets in their sandbox schema.
  2. Review: Assets are promoted to the review schema for validation and approval. ICS/broker teams review and may request changes.
  3. Approval: Once approved, assets are promoted to the curated schema (for project teams). Only approved, production-ready assets are published here.
  4. Access: Project teams and approved users can query curated data. Further changes require a new development and review cycle.

Glossary / Acronyms

  • Schema: A collection of database objects (tables, views) within a catalog.
  • Catalog: A logical grouping of schemas within the data lake, each serving a specific purpose (e.g., sandbox, review, curated).
  • ICS: Institute for Clinical and Translational Sciences.
  • Data Broker: A team member or group responsible for building, preparing, and submitting data assets for review and promotion.
  • Project Team: A group of individuals associated with a particular study or initiative.
  • Class Team: A group of individuals associated with a classroom or academic project.

See Also


Updated on August 7, 2025