Purpose

The purpose of this relational database is to build a system that can track the MRNs accessed by the data brokers.

The primary application of the database would be during an audit. It can also serve as a querying platform to obtain details on various kinds of data brokerage tasks.
Example use cases:

How many MRNs were accessed in the last 11 months
Was MRN #xyz retrieved by data brokers for a requests.?
If yes, for which DevOps tickets./ request ids. was it retrieved for?
How many requests were completed in the past month?
For an IRB id/ DevOps ticket number give all the MRNs accessed.
How many requests did data broker “abc” handle last month?
What was the query used for a DevOps ticket?
What was the time taken to complete DevOps task #123?
For a request give all access dates
When did the data broker team last provide data to a PI?

Database structure

The database consists of 4 tables:

1. Request

*This table holds the data associated with an IRB ID. The columns of the table are described in the table below. The column “id” is an autogenerated unique identifier that also serves as the primary key.*

Columns	Data type	Comments
id	int4	Autogenerated ID
irb_id	Varchar	IRB id associated with the request
data_broker	Varchar	Data broker assigned to the request.
pi_first_name	Varchar	First name of the PI
pi_last_name	Varchar	Last name of the PI
pi_email	Varchar	Email ID of the PI
requestor_first_name	Varchar	If the person requesting the data is different than the PI. First name of the requestor
requestor_last_name	Varchar	Last name of the requestor
requestor_email	Varchar	Email ID of the requestor
request_date	Varchar	Request date for the task. Can be used to identify the time taken for completion of a request.
ticket_system	Varchar	System where the ticket for this task was created. Example – Devops, Jira, REDCap etc.
ticket_id	Varchar	Ticket number or ID used in DevOps/Jira/REDCap to track this task. Can be the billable epic or the task id or the link to DevOps/Jira ticket.

2. request_mrn

This table holds the MRN data associated with an IRB ID. The columns of the table are described in the table below. The column “request_id” is a foreign key that references column “id” from the table “request”.

Columns	Data type	Comments
request_id	int4	Foreign key – reference column is id from table request
mrn	varchar	MRNs associated with a request
data_source	varchar	Data source for the MRN. Example – RDC, HDC etc.
mrn_access_date	Date	The date MRNs were accessed for the request.

3. request_query

This table holds the query associated with an IRB ID/project. The columns of the table are described in the table below. The column “request_id” is a foreign key that references column “id” from the table “request"

4. Log

This table holds the log data for database upload. The columns of the table are described in the table below. The column “comments” holds the error message generated after failed attempt of database entry.

Additional table notes

Apart from the 4 tables mentioned above a staging data table called stage_dat is used for temporarily staging all of the data from CSV files before they are entered into respective tables.
Additional Table Notes

System requirements

Azure PostgreSQL Version - 11.11 (community edition)
HIPAA compliance as it will save MRNs
Production level system
Data storage projected over the next 5 years- ~ 2 GB
Memory requirement – 4 GB
Also need software to send emails in case of failed run(s)

Process Flowchart

Data Input (refer to the flowchart above)

After a completing a request, he/she will create a csv file and sql file/s as the example below:
Data Input

CSV file with the following column headers:

irb_id
pi_first_name
pi_last_name
requestor_first_name
requestor_last_name
pi_email
requestor_email
request_date
query
data_broker
ticket_system
ticket_id
MRN
data_source
data_access_date

Notes
1. The CSV file and the SQL file should follow a naming pattern of “databroker_name_task id_pi_name_date.csv” and “databroker_name_task id_pi_name_date.sql”
2. We will use BOX storage space for uploading the CSV and SQL files.
3. Data broker Admin User will periodically twice a month. pick the files from BOX and put it in the Azure File Storage space. There is going to be 2 folders in the Azure File storage space allocated for this project – Active and Archive. All active files that need to be uploaded to the DB will be housed in the Active folder and files already uploaded will be moved to the Archive folder. Files that fail upload due to technical reasons will stay in the Active folder for retrial.
4. A time triggered Azure Function will pick all the files from the active folder and put content from each one of them into the database.
5. This Azure function internally will call a stored procedure named “uploadauditdata” for each CSV file and SQL file pair. This stored procedure will populate the data from CSV files to the “request” and “request_mrn” tables and from the SQL file to the “request_query” table.
6. The query column of the CSV file contains the name of the SQL file and starts with a prefix “FILE:”. Example – “File:Eric_h_20201521_05272021.sql”. The program then picks up the content of the SQL file automatically.
7. In case of successful upload of a CSV file, an entry will be admitted into the “log” table.
8. In case of a failed attempt, the cron job will copy the error message from the database and put it with other details to the “log” table.
9. The program cannot handle UTF-8 encoded characters in the SQL file. These characters should be removed from the input files.
10. An email with the error message, date and filename will also be sent in case of a failed run.
11. In case of multiple SQL files the entries in the CSV file should be as follows

Actors

Task	Responsibility
Database Schema	Madhurima/ Warren
Query to create the tables	Madhurima/ Warren
Stored procedure code	Madhurima /Warren
Cron Job	Warren
Creation of CSV file for data input	Data brokers
Creation of SQL file for data input	Data brokers
Putting the CSV and SQL files to BOX storage location	Data brokers
Putting the CSV and SQL files to Azure file share location	Madhu (Brianna back up.
Table creation, update, and maintenance of production DB	Warren
Querying the Audit database	Warren/ Data brokers(read only access./ Madhurima(read only access.
Investigating failed runs	Warren /Madhurima

Box folder for data upload

Box Folder

Architecture

Example queries for audit

1. How many MRNs accessed in last 11 months

select count(distinct mrn) from request_mrn where mrn_access_date > date_trunc('day', NOW(. - interval '11 month'.

2. Was MRN xyz accessed by us? which project?

select irb_id, ticket_id,data_broker from request rt ,request_mrn rmt where rt.id = rmt.request_id and rmt.mrn ='35664'

3. How many requests were fulfilled in the past month

select count(id) from request rt where id in(select distinct request_id from request_mrn rmt where mrn_access_date > date_trunc('day', NOW(. - interval '1 month'..

4. For a IRB id/ Devops User Story give all the MRNs

select mrn from request rt ,request_mrn rmt where rt.id = rmt.request_id and rt.irb_id ='201810010';

select mrn from request rt ,request_mrn rmt where rt.id = rmt.request_id and rt.ticket_id ='25069';

5. How many request did a data broker handle last month

select count(ticket_id. from request where data_broker ='Eric Hogue'

6. What was the query used for the devops ticket 20735

select rq.query from request rt,request_query rq where rt.id = rq.id and ticket_id ='20735';

7. For a request, give all access dates

select distinct rmt.mrn_access_date from request rt ,request_mrn rmt where rt.id =rmt.request_id and rt.irb_id ='202121234'

Enhancements for future releases

Improve the text in the notification emails
- For failed uploads - provide a reason for failure
- For some failed uploads the subject line of email says WARNING. It should say ERROR.
- For successful uploads – provide how many records were uploaded from each file
- For uploads where some files failed, and some passed – provide how many records were uploaded from files that passed and the reason for failure
Source of MRN
- A person ID from RDC can be associated with different MRNs in RDC. Hence it is wise to use a separate column called Type of ID in the CSV file and in the database.
- For the projects already in the database, we need to figure a way out to handle the Type of ID values for them.
Use drop-down menu options in the CSV template to reduce spelling errors by enforcing values like Source of MRN
Investigate into Azure Active Directory, which will help replace the SAS key authentication. Azure Active Directory will be synced with WUSTL Key.

Data Sources

Platforms

Teams