Data Asset Introduction

Overview

Data assets are structured or unstructured datasets that hold value for an organization. In the context of Databricks and the WUSM Data Lake, data assets include schemas, tables, and views used for analytics, reporting, and decision-making.

Effective management of data assets ensures compliance, discoverability, data quality, collaboration, and scalability. This document explains the key concepts, processes, and criteria related to data asset management in the WUSM Data Lake.

Key Concepts

Structured vs. Unstructured Datasets

  • Structured Datasets: Data organized in a predefined schema, such as rows and columns in a database or spreadsheet. Examples include relational databases and spreadsheets.
  • Unstructured Datasets: Data without a predefined schema, such as images, videos, or free-text documents. These require specialized tools for storage and analysis.

Data Ingestion

Data ingestion is the process of collecting, importing, and processing data from various sources into a centralized storage system like a data lake. This ensures data is available for analysis, reporting, and decision-making.

Curated vs. Cleansed Catalogs

  • Curated Catalog: Focuses on enriching and validating datasets for specific use cases, ensuring they are ready for immediate use by stakeholders.
  • Cleansed Catalog: Emphasizes data quality by removing errors, inconsistencies, and redundancies, serving as a foundation for further processing or analysis.

Management Processes

Roles and Responsibilities

Data assets in the WUSM Data Lake are managed collaboratively by the ICS team and appointed data stewards:

  • ICS Team: Oversees the technical infrastructure, ensuring compliance, security, and scalability.
  • Data Stewards: Maintain metadata, review data quality, and ensure assets meet organizational and regulatory standards.

Criteria for Promotion During Review Period

To be promoted during the review period, data assets must meet the following criteria:

  1. Compliance: Adherence to organizational and regulatory standards.
  2. Metadata Completeness: Inclusion of all required metadata tags.
  3. Data Quality: High accuracy, consistency, and reliability.
  4. Relevance: Alignment with organizational goals and stakeholder needs.
  5. Approval: Validation by data stewards and relevant stakeholders.

Risks of Non-Compliance

Failing to tag data assets can lead to compliance violations, reduced discoverability, and inefficiencies in data management. Organizations may face penalties such as restricted access to data, additional audits, or even legal consequences depending on regulatory requirements.

See Also


Updated on August 7, 2025