Levels of MDClone Data Accessibility and Pricing

Synthetic Data
Synthetic datasets can be accessed for free by Washington University researchers who have obtained access to MDClone. There is no limit to the number of queries you can create and synthetic datasets you can download. Do note, that some large queries (i.e., More than 1 million cohort or including hundreds of columns in your dataset) may have issues generating.

  1. Obtain access to MDClone
  2. Build a query
  3. Generate and download synthetic data

MDClone Data Lake Data
Data ingested into the MDClone data lake is a limited EHR dataset. The data are considered limited because 1) all PHI are stripped, 2) MRN numbers are hashed*, 3) dates are shifted+, and 4) only structured data from the medical record (i.e. no notes, images, etc.) are ingested. Review of IRB application and extraction of the limited data set are completed by the data brokerage team at an $150 per hour fee. Follow these steps to access a limited dataset:

  1. Complete steps 1-3 under synthetic data
  2. Obtain IRB approval
  3. Complete OHIDS Central Request Form which will include IRB application number and status
  4. Have a consultation with data brokerage
  5. Receive Statement of Work (SOW) with estimated total cost
    1. Total Hours X Estimated Hours of Effort = Total Cost
  6. Send MDClone query to Data Brokerage
  7. Receive dataset in a WUSTL Box folder
  8. For most queries, this will require 1-3 weeks to complete steps 4-7 and about $150

EHR Unstructured Data not in MDClone Data Lake
Following analyses of synthetic data, researchers may wish to obtain additional data for their cohort that are not available in the MDClone data lake. For example, if you create a query looking at total knee reconstruction, you may wish to obtain MRI imaging for patients in your cohort. It is not possible to ingest imaging data into MDClone, however, you can work with the data brokerage team to obtain these data for patients in your cohort. Review of IRB application and extraction of the limited data set are completed by the data brokerage team at $150 per hour fee. Follow these steps to access additional, unstructured data for patients in your cohort

  1. Complete steps 1-3 under synthetic data
  2. Obtain IRB approval
  3. Complete OHIDS Central Request Form which will include IRB application number and status
  4. Have a consultation with data brokerage
  5. Receive Statement of Work (SOW) with estimated total cost
    1. Total Hours X Estimated Hours of Effort = Total Cost
  6. Send MDClone query to Data Brokerage
  7. Receive dataset in a WUSTL Box folder
  8. For most queries, this will require 7-10 weeks to complete steps 4-7 and unmask limited dataset and $750-$1500

ICTS JIT funds can be used to cover the costs of the data pull. The 1-hour consultation along with their Statement of Work (SOW) will help you to know what the budget will look like for the JIT.

Click here to learn more about JIT funding.


Updated on August 7, 2025