Part 1: retrieve the original dataset from MDClone
Log in to MDClone and navigate to Admin Dashboard.
Search for the project either via query name or query ID.
Note: make sure to double check the SOW and/or comments in the BE or US to verify that you’re accessing the correct query
Click the hyperlink to open the project.
If the project requires real patient identifiers (MRN, date of birth, name, etc.), then navigate to section 4 (Demographics) and ensure that Patient ID is present and selected. I also recommend having Source EHR or database selected, because this can often help with troubleshooting later on. Then click next.
From step 5 (Finalize Cohort & Output), ensure that the toggle on upper right hand corner for “Synthetic mode” is set to “Off”, and then click “Generate Original.”
Download the generated file.
Part 2: linkage to real patient identifiers
Open the downloaded csv file.
Note that the date of birth present in the “original” file is actually date-shifted, so you will need to pull date of birth out of OMOP to retrieve the true date of birth.
Create a column at the beginning with the row number (it helps with merging the patient identifiers back in).
(optional) Use Notepad++ to properly format the values for the INSERT statement in the SQL template: Notepad++ formatting
Create SQL -- see steps in the accompanying .sql template file for MDClone re-identification.
Part 3: Merging the data back into the MDClone output file
After pasting the supplemental/identifying data back into the MDClone output, make sure to perform the following checks to ensure that the data is correctly aligned
Check that row numbers match up
e.g. =IF(A2=F2, “match”, “ERROR”)
Check that hash id’s match up
Can use similar Excel syntax as above
Part 4: Formatting the final file
Format the file in a way that they understand what came out of MDClone vs. what you are supplementing (e.g. giving column names that indicate the field is “original” to MDClone vs “real”; highlighting column names that correspond to “real” fields supplemented in vs output from MDClone; and/or adding a dividing column to separate “real” (supplemental) fields from raw MDClone output)
NOTE: IT IS NOT YOUR RESPONSIBILITY TO SHIFT DATES THROUGHOUT THE FILE BACK TO REAL DATES – JUST PROVIDE THEM WITH THE DATESHIFT AND LET THEM APPLY IT.
Make sure to retain the following fields in the final file:
Row number
MRN’s
Patient name
Date of birth (real)
Dateshift
Make sure to remove the following fields from the final file to be delivered to the customer:
OMOP id (person_id)
Hash id (Including the hashed patient_id from the MDClone data)
Check fields (the fields created to verify that row numbers and hash ids match up when merging data back in)
Part 5: Delivering the final file
Create a Box folder under HSIL for the final file
Update the DevOps BE/US/Task with a link to the Box folder, and tag the person responsible for regulatory review so they know that it’s ready for regulatory review
DO NOT SHARE THE FILE/FOLDER WITH THE CUSTOMER UNTIL REGULATORY REVIEW HAS BEEN COMPLETED
The regulatory reviewer will share the Box folder with the customer when it passes regulatory review