Overview

A data lake is a hybrid, multi-cloud operating, secure environment for storing clinical health data, which can be accessed and analyzed by clinicians, data scientists, researchers and others to gain insights into health trends, disease patterns, and treatment outcomes in medical research. It enables the storage of raw data without the need for pre-structuring, making it a flexible and scalable platform for data analysis on large datasets. With the increasing amount of health data being generated, data lakes are becoming essential for advancing medical knowledge and improving healthcare outcomes.

Requirements for Submission

Submitting a request will require you to provide the following information:

  • Your name
  • Your contact email address
  • Your department
    In addition to the above information, depending on the level of access requested, you may need to provide proof of HIPAA, CITI, or Epic Clinical Data Model training (tier 5 training) required for Clarity access for Research.

Be sure to choose the appropriate selection indicating if the request is for yourself or another colleague. If you are submitting the request on behalf of another, you may be required to provide their information in a following step.

Types of Requests

Data Lake Access Request

To gain access to the WUSM Data Lake environment, you must submit an Access Request. There are different types of access requests that can be submitted based on your specific scenario. The type of access request you submit should be guided by the type of work that will be done with this access. For example, if you will be using the Data Lake for a specific research project with its own IRB, you would select the Research - Project Based access type. The following sections will provide additional guidance about each type of access and when to submit a given type of access request.

Note

Most types of access will request a cost center to be provided. This enables us to invoice your department for usage, storage and/or brokerage charges.

Research - Project Based

  • It has a project-specific IRB that outlines the allowed data set and a timeline for the project. If you do not have an IRB approved study but need feasibility counts only, our data brokerage team can assist you with this type of request.

Research - BYOB Bring Your Own Broker - Self-Service Model

  • If your department has their own analyst who can retrieve the data needed for your research projects, you may consider this self-service model. There is a monthly charge for this access, which includes our team reviewing your IRB and an auditing system to ensure compliance.

Operational

  • Used for business reporting needs

Clinical/QA/QI

  • Used for clinical and QA/QI reporting needs

Class - Student Access (This access is limited to our Informatics students and requires approval.)

  • Used for students to gain access to a particular course hosted in the data lake

End User Access to the Data Lake

Access Request Form

Ingestion Request

  • This is for users that already have access to the Data Lake but need additional access to data and or need data added to the Data Lake for their project

Ingestion Request Form.

Class Setup Request

  • This is for Informatics instructors and students

Class Setup Request Form.

Remove Access Request

  • This is to remove someone from access to the Data Lake

Remove Access Request Form.

Additional Help

If you are unable to access the Data Lake using the steps above, please contact I2DB Service Desk for further assistance.

You can open a ticket by contacting i2help@wustl.edu or through the form found here.


Updated on August 7, 2025