Add Data Sources

Intake and Approval Process for New Shared Data Sources

Purpose of the Intake Process

This document describes the process for submitting and approving new shared data sources to the WUSM Data Lake.

How to Submit an Intake Form

All requests for new data sources must be submitted via the designated intake form.

Details Required in the Intake Form

When filling out the intake form, ensure you include the following:

  • Description of the data source and its contents.
  • Business needs for adding the data source.
  • The department making the request.
  • The billing cost center.
  • Designated data steward.
  • List of users requiring access.

Collaboration with Data Stewards

Upon approval, the data lake team will:

  • Collaborate with the data steward to integrate the new data source into the data lake.
  • Configure access controls based on the details provided in the intake form.

Finalizing the Process

Ensuring all required details are provided and following the process will result in a smooth and efficient addition of new data sources to the WUSM Data Lake.

Data Access Control Policies and Configuration

Steps for Configuring Access Control

  1. Create Databricks Groups:

    • Navigate to the Databricks Admin Console.
    • Under the "Groups" section, create a new group for the data source (e.g., data_source_team).
    • Assign users to the group based on their roles and responsibilities.
  2. Assign Permissions:

    • Use the Databricks SQL Editor to assign permissions to the group.
    • Example SQL commands:
      GRANT SELECT ON SCHEMA data_source_schema TO `data_source_team`;
      GRANT USAGE ON DATABASE data_source_db TO `data_source_team`;
  3. Configure Role-Based Access:

    • Define roles such as read-only, read-write, and admin for the data source.
    • Assign these roles to users or groups as needed.
  4. Set Sensitivity Levels:

    • Classify data fields as PHI, Non-PHI, or Restricted.
    • Use masking or filtering techniques for sensitive data.
  5. Audit and Review:

    • Schedule regular audits to review access permissions.
    • Use Databricks audit logs to track changes and ensure compliance.

By following these steps, you can ensure secure and efficient access control for new data sources in the WUSM Data Lake.

Metadata Tagging Guidelines

Planned Enhancements for Metadata Tagging

This section will be completed in a future update to provide detailed steps for applying metadata tags to data assets.

Data Classification and Restrictions

Planned Enhancements for Data Classification

This section will be completed in a future update to provide guidelines for classifying data and restricting sensitive data.

Modification and Notification Process for Existing Data Assets

Planned Enhancements for Modification Process

This section will be completed in a future update to provide steps for updating existing data assets and the notification process involved.

Data Deletion and Archival Procedures

Planned Enhancements for Data Deletion

This section will be completed in a future update to provide procedures for deleting obsolete data sources and managing audit logs and archival.

Data Inventory Maintenance

Planned Enhancements for Inventory Maintenance

This section will be completed in a future update to provide guidelines for maintaining the data inventory and conducting regular reviews.

Sandbox Catalog Management

Planned Enhancements for Sandbox Management

This section will be completed in a future update to provide steps for managing assets in the dedicated schema and configuring access controls.

Compliance Reporting

Planned Enhancements for Compliance Reporting

This section will be completed in a future update to provide steps for generating compliance reports and managing review and action items.

Auditing Procedures

Planned Enhancements for Auditing

This section will be completed in a future update to provide detailed steps for initiating data access reviews, validating access, and addressing discrepancies.

Metadata Documentation

Planned Enhancements for Metadata Documentation

This section will be completed in a future update to provide guidelines for maintaining metadata via Databricks, tagging and commenting, and creating a knowledge catalog.

Communication and Change Log Maintenance

Planned Enhancements for Change Log Maintenance

This section will be completed in a future update to provide steps for logging data asset changes, the notification process, and maintaining a changelog.


Updated on August 7, 2025