Databasin

Databasin is a data lake management tool that ICS uses to manage the WUSM Data Lake.
The application can be accessed at: databasin.wustl.edu

Overview

Databasin is a self service tool for managing ETL pipelines and process automations. It provides the following features empower users to construct ETL pipelines and automations by creating "connectors" to a variety of data sources.

Connectors

  • Connectors provide a way to establish connection to various systems/platforms
  • Connectors can be used in Pipelines and Automations to move and process data

Pipelines

  • Pipelines provide an easy way to extract and load data from one connector to another
  • Pipelines can have one or more "artifacts" that represent source and target data sets, typically tables

Artifacts

  • Artifacts represent source and target data sets. They can typically be thought of as tables of data
  • Each artifact will be extracted from the source connector and loaded into the target connector
  • Artifacts have settings to control various aspects of the ingestion process

Automations

  • Automations allow users to chain together multiple tasks to be executed
  • These tasks use connectors to access a system and execute the task
  • There are several task types available, including executing Databricks notebooks and a task to create and upload CSV files to a target connectors

Tasks

  • Tasks represent a unit of work that will be executed by the automation
  • An automation can have many tasks of varying types
  • Tasks use connectors to establish a connection to the system the task will execute against

How-To's

See the databasin how to's for more useful how-tos on getting various common tasks done.

Tutorials and Explanations

References


Updated on August 7, 2025