MDClone re-identification requests

Part 1: retrieve the original dataset from MDClone

  • Log in to MDClone and navigate to Admin Dashboard.
  • Search for the project either via query name or query ID.
    • Note: make sure to double check the SOW and/or comments in the BE or US to verify that you’re accessing the correct query
  • Click the hyperlink to open the project.
  • If the project requires real patient identifiers (MRN, date of birth, name, etc.), then navigate to section 4 (Demographics) and ensure that Patient ID is present and selected. I also recommend having Source EHR or database selected, because this can often help with troubleshooting later on. Then click next.
  • From step 5 (Finalize Cohort & Output), ensure that the toggle on upper right hand corner for “Synthetic mode” is set to “Off”, and then click “Generate Original.”
  • Download the generated file.

Part 2: linkage to real patient identifiers

  • Open the downloaded csv file.
    • Note that the date of birth present in the “original” file is actually date-shifted, so you will need to pull date of birth out of OMOP to retrieve the true date of birth.
  • Create a column at the beginning with the row number (it helps with merging the patient identifiers back in).
  • (optional) Use Notepad++ to properly format the values for the INSERT statement in the SQL template: Notepad++ formatting
  • Create SQL -- see steps in the accompanying .sql template file for MDClone re-identification.

Part 3: Merging the data back into the MDClone output file

  • After pasting the supplemental/identifying data back into the MDClone output, make sure to perform the following checks to ensure that the data is correctly aligned
    • Check that row numbers match up
      e.g. =IF(A2=F2, “match”, “ERROR”)
    • Check that hash id’s match up
      • Can use similar Excel syntax as above

Part 4: Formatting the final file

  • Format the file in a way that they understand what came out of MDClone vs. what you are supplementing (e.g. giving column names that indicate the field is “original” to MDClone vs “real”; highlighting column names that correspond to “real” fields supplemented in vs output from MDClone; and/or adding a dividing column to separate “real” (supplemental) fields from raw MDClone output)
    • NOTE: IT IS NOT YOUR RESPONSIBILITY TO SHIFT DATES THROUGHOUT THE FILE BACK TO REAL DATES – JUST PROVIDE THEM WITH THE DATESHIFT AND LET THEM APPLY IT.
  • Make sure to retain the following fields in the final file:
    • Row number
    • MRN’s
    • Patient name
    • Date of birth (real)
    • Dateshift
  • Make sure to remove the following fields from the final file to be delivered to the customer:
    • OMOP id (person_id)
    • Hash id (Including the hashed patient_id from the MDClone data)
    • Check fields (the fields created to verify that row numbers and hash ids match up when merging data back in)

Part 5: Delivering the final file

  • Create a Box folder under HSIL for the final file
  • Update the DevOps BE/US/Task with a link to the Box folder, and tag the person responsible for regulatory review so they know that it’s ready for regulatory review
  • DO NOT SHARE THE FILE/FOLDER WITH THE CUSTOMER UNTIL REGULATORY REVIEW HAS BEEN COMPLETED
  • The regulatory reviewer will share the Box folder with the customer when it passes regulatory review

Updated on August 7, 2025