Purpose

The purpose of this relational database is to build a system that can track the MRNs accessed by the data brokers.

The primary application of the database would be during an audit. It can also serve as a querying platform to obtain details on various kinds of data brokerage tasks.
Example use cases:

  • How many MRNs were accessed in the last 11 months
  • Was MRN #xyz retrieved by data brokers for a requests.?
  • If yes, for which DevOps tickets./ request ids. was it retrieved for?
  • How many requests were completed in the past month?
  • For an IRB id/ DevOps ticket number give all the MRNs accessed.
  • How many requests did data broker “abc” handle last month?
  • What was the query used for a DevOps ticket?
  • What was the time taken to complete DevOps task #123?
  • For a request give all access dates
  • When did the data broker team last provide data to a PI?

Database structure

The database consists of 4 tables:

1. Request

*This table holds the data associated with an IRB ID. The columns of the table are described in the table below. The column “id” is an autogenerated unique identifier that also serves as the primary key.*
Columns Data type Comments
id int4 Autogenerated ID
irb_id Varchar IRB id associated with the request
data_broker Varchar Data broker assigned to the request.
pi_first_name Varchar First name of the PI
pi_last_name Varchar Last name of the PI
pi_email Varchar Email ID of the PI
requestor_first_name Varchar If the person requesting the data is different than the PI. First name of the requestor
requestor_last_name Varchar Last name of the requestor
requestor_email Varchar Email ID of the requestor
request_date Varchar Request date for the task. Can be used to identify the time taken for completion of a request.
ticket_system Varchar System where the ticket for this task was created. Example – Devops, Jira, REDCap etc.
ticket_id Varchar Ticket number or ID used in DevOps/Jira/REDCap to track this task. Can be the billable epic or the task id or the link to DevOps/Jira ticket.

2. request_mrn

This table holds the MRN data associated with an IRB ID. The columns of the table are described in the table below. The column “request_id” is a foreign key that references column “id” from the table “request”.

Columns Data type Comments
request_id int4 Foreign key – reference column is id from table request
mrn varchar MRNs associated with a request
data_source varchar Data source for the MRN. Example – RDC, HDC etc.
mrn_access_date Date The date MRNs were accessed for the request.

3. request_query

This table holds the query associated with an IRB ID/project. The columns of the table are described in the table below. The column “request_id” is a foreign key that references column “id” from the table “request"

4. Log

This table holds the log data for database upload. The columns of the table are described in the table below. The column “comments” holds the error message generated after failed attempt of database entry.

Additional table notes

Apart from the 4 tables mentioned above a staging data table called stage_dat is used for temporarily staging all of the data from CSV files before they are entered into respective tables.
Additional Table Notes

System requirements

  • Azure PostgreSQL Version - 11.11 (community edition)
  • HIPAA compliance as it will save MRNs
  • Production level system
  • Data storage projected over the next 5 years- ~ 2 GB
  • Memory requirement – 4 GB
  • Also need software to send emails in case of failed run(s)

Process Flowchart

Process Flowchart

Data Input (refer to the flowchart above)

After a completing a request, he/she will create a csv file and sql file/s as the example below:
Data Input

CSV file with the following column headers:

  • irb_id

  • pi_first_name

  • pi_last_name

  • requestor_first_name

  • requestor_last_name

  • pi_email

  • requestor_email

  • request_date

  • query

  • data_broker

  • ticket_system

  • ticket_id

  • MRN

  • data_source

  • data_access_date

    Notes

    1. The CSV file and the SQL file should follow a naming pattern of “databroker_name_task id_pi_name_date.csv” and “databroker_name_task id_pi_name_date.sql”
    2. We will use BOX storage space for uploading the CSV and SQL files.
    3. Data broker Admin User will periodically twice a month. pick the files from BOX and put it in the Azure File Storage space. There is going to be 2 folders in the Azure File storage space allocated for this project – Active and Archive. All active files that need to be uploaded to the DB will be housed in the Active folder and files already uploaded will be moved to the Archive folder. Files that fail upload due to technical reasons will stay in the Active folder for retrial.
    4. A time triggered Azure Function will pick all the files from the active folder and put content from each one of them into the database.
    5. This Azure function internally will call a stored procedure named “uploadauditdata” for each CSV file and SQL file pair. This stored procedure will populate the data from CSV files to the “request” and “request_mrn” tables and from the SQL file to the “request_query” table.
    6. The query column of the CSV file contains the name of the SQL file and starts with a prefix “FILE:”. Example – “File:Eric_h_20201521_05272021.sql”. The program then picks up the content of the SQL file automatically.
    7. In case of successful upload of a CSV file, an entry will be admitted into the “log” table.
    8. In case of a failed attempt, the cron job will copy the error message from the database and put it with other details to the “log” table.
    9. The program cannot handle UTF-8 encoded characters in the SQL file. These characters should be removed from the input files.
    10. An email with the error message, date and filename will also be sent in case of a failed run.
    11. In case of multiple SQL files the entries in the CSV file should be as follows

    SQL Listing

Actors

Task Responsibility
Database Schema Madhurima/ Warren
Query to create the tables Madhurima/ Warren
Stored procedure code Madhurima /Warren
Cron Job Warren
Creation of CSV file for data input Data brokers
Creation of SQL file for data input Data brokers
Putting the CSV and SQL files to BOX storage location Data brokers
Putting the CSV and SQL files to Azure file share location Madhu (Brianna back up.
Table creation, update, and maintenance of production DB Warren
Querying the Audit database Warren/ Data brokers(read only access./ Madhurima(read only access.
Investigating failed runs Warren /Madhurima

Box folder for data upload

Box Folder

Architecture

Architecture

Example queries for audit

1. How many MRNs accessed in last 11 months

select count(distinct mrn) from request_mrn where mrn_access_date > date_trunc('day', NOW(. - interval '11 month'.

2. Was MRN xyz accessed by us? which project? 

select irb_id, ticket_id,data_broker from request rt ,request_mrn rmt where rt.id = rmt.request_id and rmt.mrn ='35664'

3. How many requests were fulfilled in the past month

select count(id) from request rt where id in(select distinct request_id from request_mrn rmt where mrn_access_date > date_trunc('day', NOW(. - interval '1 month'..

4. For a IRB id/ Devops User Story give all the MRNs

select mrn from request rt ,request_mrn rmt where rt.id = rmt.request_id and rt.irb_id ='201810010';

select mrn from request rt ,request_mrn rmt where rt.id = rmt.request_id and rt.ticket_id ='25069';

5. How many request did a data broker handle last month 

select count(ticket_id. from request where data_broker ='Eric Hogue'

6. What was the query used for the devops ticket 20735

select rq.query from request rt,request_query rq where rt.id = rq.id and ticket_id ='20735';

7. For a request, give all access dates 

select distinct rmt.mrn_access_date from request rt ,request_mrn rmt where rt.id =rmt.request_id and rt.irb_id ='202121234'

Enhancements for future releases

  1. Improve the text in the notification emails
    • For failed uploads - provide a reason for failure
    • For some failed uploads the subject line of email says WARNING. It should say ERROR.
    • For successful uploads – provide how many records were uploaded from each file
    • For uploads where some files failed, and some passed – provide how many records were uploaded from files that passed and the reason for failure
  2. Source of MRN
    • A person ID from RDC can be associated with different MRNs in RDC. Hence it is wise to use a separate column called Type of ID in the CSV file and in the database.
    • For the projects already in the database, we need to figure a way out to handle the Type of ID values for them.
  3. Use drop-down menu options in the CSV template to reduce spelling errors by enforcing values like Source of MRN
  4. Investigate into Azure Active Directory, which will help replace the SAS key authentication. Azure Active Directory will be synced with WUSTL Key.

Updated on August 7, 2025