MDClone re-identification requests

Part 1: retrieve the original dataset from MDClone

Log in to MDClone and navigate to Admin Dashboard.
Search for the project either via query name or query ID.
- Note: make sure to double check the SOW and/or comments in the BE or US to verify that you’re accessing the correct query
Click the hyperlink to open the project.
If the project requires real patient identifiers (MRN, date of birth, name, etc.), then navigate to section 4 (Demographics) and ensure that Patient ID is present and selected. I also recommend having Source EHR or database selected, because this can often help with troubleshooting later on. Then click next.
From step 5 (Finalize Cohort & Output), ensure that the toggle on upper right hand corner for “Synthetic mode” is set to “Off”, and then click “Generate Original.”
Download the generated file.

Open the downloaded csv file.
- Note that the date of birth present in the “original” file is actually date-shifted, so you will need to pull date of birth out of OMOP to retrieve the true date of birth.
Create a column at the beginning with the row number (it helps with merging the patient identifiers back in).
(optional) Use Notepad++ to properly format the values for the INSERT statement in the SQL template: Notepad++ formatting
Create SQL -- see steps in the accompanying .sql template file for MDClone re-identification.

After pasting the supplemental/identifying data back into the MDClone output, make sure to perform the following checks to ensure that the data is correctly aligned
- Check that row numbers match up
  e.g. =IF(A2=F2, “match”, “ERROR”)
- Check that hash id’s match up
  - Can use similar Excel syntax as above

Format the file in a way that they understand what came out of MDClone vs. what you are supplementing (e.g. giving column names that indicate the field is “original” to MDClone vs “real”; highlighting column names that correspond to “real” fields supplemented in vs output from MDClone; and/or adding a dividing column to separate “real” (supplemental) fields from raw MDClone output)
- NOTE: IT IS NOT YOUR RESPONSIBILITY TO SHIFT DATES THROUGHOUT THE FILE BACK TO REAL DATES – JUST PROVIDE THEM WITH THE DATESHIFT AND LET THEM APPLY IT.
Make sure to retain the following fields in the final file:
- Row number
- MRN’s
- Patient name
- Date of birth (real)
- Dateshift
Make sure to remove the following fields from the final file to be delivered to the customer:
- OMOP id (person_id)
- Hash id (Including the hashed patient_id from the MDClone data)
- Check fields (the fields created to verify that row numbers and hash ids match up when merging data back in)

Create a Box folder under HSIL for the final file
Update the DevOps BE/US/Task with a link to the Box folder, and tag the person responsible for regulatory review so they know that it’s ready for regulatory review
DO NOT SHARE THE FILE/FOLDER WITH THE CUSTOMER UNTIL REGULATORY REVIEW HAS BEEN COMPLETED
The regulatory reviewer will share the Box folder with the customer when it passes regulatory review