Databasin

Databasin is a data lake management tool that ICS uses to manage the WUSM Data Lake.
The application can be accessed at: databasin.wustl.edu

Overview

Databasin is a self service tool for managing ETL pipelines and process automations. It provides the following features empower users to construct ETL pipelines and automations by creating "connectors" to a variety of data sources.

Connectors

Connectors provide a way to establish connection to various systems/platforms
Connectors can be used in Pipelines and Automations to move and process data

Pipelines

Pipelines provide an easy way to extract and load data from one connector to another
Pipelines can have one or more "artifacts" that represent source and target data sets, typically tables

Artifacts

Artifacts represent source and target data sets. They can typically be thought of as tables of data
Each artifact will be extracted from the source connector and loaded into the target connector
Artifacts have settings to control various aspects of the ingestion process

Automations

Automations allow users to chain together multiple tasks to be executed
These tasks use connectors to access a system and execute the task
There are several task types available, including executing Databricks notebooks and a task to create and upload CSV files to a target connectors

Tasks

Tasks represent a unit of work that will be executed by the automation
An automation can have many tasks of varying types
Tasks use connectors to establish a connection to the system the task will execute against

How-To's

See the databasin how to's for more useful how-tos on getting various common tasks done.

Tutorials and Explanations

References

Databasin Changelog

Data Sources

Platforms

Teams