Loading Data from Unity Catalog Into a Databricks Notebook
Problem
You want to load data in the Unity Catalog into your Databricks Notebook.
Solution
Via Spark/Databricks SQL
-- loading
SELECT * FROM csv.`/Volumes/my_catalog/my_schema/my_volume/data.csv`;
-- listing
LIST '/Volumes/my_catalog/my_schema/my_volume/'
Via dbutils
df = spark.read.format('csv').load(
'/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv',
header=True,
inferSchema=True
)
dbutils.data.summarize(df)
Python
os.listdir('/Volumes/my_catalog/my_schema/my_volume/path/to/directory')
or via Pandas
df = pd.read_csv('/Volumes/my_catalog/my_schema/my_volume/data.csv')
or pip install
-ing a python package placed inside Unity Catalog
%pip install /Volumes/my_catalog/my_schema/my_volume/my_library.whl
R
df <- read.df("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", source = "csv", header="true", inferSchema = "true")
dbutils.data.summarize(df)
Scala
val df = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")
dbutils.data.summarize(df)
Discussion
In general, external data should be placed in the Unity Catalog Volumes. See the discussion in recipe "Saving Results from a Databricks Notebook to a File" for more information about Unity Catalog.