WUSM Data Lake

Welcome to the documentation section dedicated to the WUSM Data Lake, a powerful data management solution built on Azure Databricks. In this section, you will find comprehensive information and resources to help you understand and leverage the capabilities of the WUSM Data Lake. Whether you are a data engineer, data scientist, or business analyst, this documentation will guide you through the process of setting up, managing, and extracting valuable insights from your data using the WUSM Data Lake on Azure Databricks. Let's dive in and explore the endless possibilities of this cutting-edge data management platform.

-- ChatGPT

Please also see the databasin tool for data lake management.

Clarity2 information

Without training, users will have limited views in the DL. With Clarity training, users get all access to Clarity2 within the DL which also provides the models. Those are proprietary and if not certified, they can not access. The DL has approximately 500 tables from Clarity that is a real-time copy. There is some additional data in Clarity, but we ingest the most asked for tables. Users can request additional tables. Our team reaches out to the Epic Cogito team and requests to ingest it.
Clarity2 lives within the DL and also outside of the DL. We ask users to access it through the data lake. We dissuade users from querying against the instance of Clarity2 that lives outside of the data lake because it conflicts with clinical work. For users who are access Clarity2 outside of the DL, they would need to rewrite their SQL using Spark SQL to pull from DL Clarity2.
Where does Clarity2 exist outside of the DL??????
What is Karen and their group telling them to do? She has them sign the Epic ROTR, discusses compliance and points them to the servers for Clarity2 and caboodle.


Updated on August 7, 2025