Databasin
Databasin is a data lake management tool that ICS uses to manage the WUSM Data Lake.
The application can be accessed at: databasin.wustl.edu
Overview
Databasin is a self service tool for managing ETL pipelines and process automations. It provides the following features empower users to construct ETL pipelines and automations by creating "connectors" to a variety of data sources.
Connectors
- Connectors provide a way to establish connection to various systems/platforms
- Connectors can be used in Pipelines and Automations to move and process data
Pipelines
- Pipelines provide an easy way to extract and load data from one connector to another
- Pipelines can have one or more "artifacts" that represent source and target data sets, typically tables
Artifacts
- Artifacts represent source and target data sets. They can typically be thought of as tables of data
- Each artifact will be extracted from the source connector and loaded into the target connector
- Artifacts have settings to control various aspects of the ingestion process
Automations
- Automations allow users to chain together multiple tasks to be executed
- These tasks use connectors to access a system and execute the task
- There are several task types available, including executing Databricks notebooks and a task to create and upload CSV files to a target connectors
Tasks
- Tasks represent a unit of work that will be executed by the automation
- An automation can have many tasks of varying types
- Tasks use connectors to establish a connection to the system the task will execute against
How-To's
See the databasin how to's for more useful how-tos on getting various common tasks done.