Restoring REDCap Data
There are two primary methods for data recovery: using Databricks Table History or performing a Full Database Restore. The choice of method depends on the circumstances and time constraints of the data loss incident.
Using Databricks Table History
Databricks maintains a history of Delta Lake tables which can be useful for recovering recently deleted data. Specifically, if data was deleted within the last 30 days, it can potentially be restored from this table history.
To facilitate this recovery, a notebook has been developed that allows users to specify the project ID, record ID, and the date from which they wish to restore the data. The notebook will then write the recovered data to a temporary table in the sandbox.team_redcap
schema. This data can subsequently be processed using the v14 Project Data Export
notebook, resulting in a downloadable CSV file. This file can then be used to import the recovered data back into the REDCap project.
Considerations
- Time Sensitivity: This method is only effective if the data was deleted within the last 30 days due to the retention policy of the database backups.
- Ease of Use: Using Databricks Table History is generally faster and less labor-intensive compared to a full database restore.
Full Database Restore
When data cannot be located within Databricks, a full database restore is the next option. This process involves a Database Administrator (DBA) restoring* the entire REDCap database in Azure, which can take up to a full day. Once the database is restored, the data will need to be extracted into a usable format through custom queries written by the team.
- There are two types of restores
- Full recovery for an ad-hoc instance of REDCap
- Setting up a new mirror instance of the REDCap at a certain point
- This would be if the entire database did not need restored
- Full restoral on production
- This resets the entire application and all projects
- Resets it back to a specific time that we choose
- Full recovery for an ad-hoc instance of REDCap
Considerations
- Complexity: A full database restore is a complex and time-consuming process, requiring substantial technical expertise and resources.
- Cost: Due to the intensive nature of this method, customers are generally billed for the time spent on recovering data.
- Last Resort: This method should be used only when no other options are available, given its drawbacks in terms of time and labor.
Limitations and Recommendations
While both methods offer viable solutions for data recovery, they come with limitations. Database backups are only retained for 30 days, so if the data is not found within this period, recovery may not be possible. Each data recovery scenario should be evaluated individually to determine the best approach and feasibility of restoration.
See Also
Restore REDCap Data From History notebook
v14 Project Data Export notebook