Ollama on RIS

"cowboy coding" notes

LSF bsub command to get a GPU job

DOCKER_VOLUMES=(
    /home/$USER:/home/$USER
    /storage1/fs1/I2/Active:/storage1/fs1/I2/Active
    /scratch1/fs1/i2:/scratch1/fs1/i2
)

LSF_DOCKER_GPUS="all" \
LSF_DOCKER_PORTS='8888:8888' \
LSF_DOCKER_VOLUMES="${DOCKER_VOLUMES[*]}" \
bsub \
  -Is \
  -M 30GB \
  -gpu "num=1:gmodel=NVIDIAA100_SXM4_40GB" \
  -R 'rusage[mem=30GB,tmp=30] select[mem>32GB && tmp>30 && port8888=1 && gpuhost] span[hosts=1]' \
  -a 'docker(indraniel/ics-llm-kit:v1)' \
  -G compute-ohids \
  -q ohids-interactive \
  /bin/bash -l

Ollama installation

cd /scratch1/fs1/i2/idas/ollama

curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
tar zxvf ollama-linux-amd64.tgz -C .

export export PATH=/usr/local/cuda/bin:/opt/washu-i2db/ics/python-3.12.3/bin:$PATH
export export LD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/opt/washu-i2db/ics/python-3.12.3/lib
export OLLAMA_HOST=compute1-exec-390.ris.wustl.edu:8888
./bin/ollama serve

# in another terminal
$ export OLLAMA_HOST=compute1-exec-390.ris.wustl.edu:8888
$ /scratch1/fs1/i2/idas/ollama/bin/ollama -v
ollama version is 0.6.5

Ollama setup model

mkdir models
cd models
curl -L -O https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2-GGUF/resolve/main/Llama-3.1-8B-Lexi-Uncensored_V2_Q8.gguf

export OLLAMA_MODELS=/scratch1/fs1/i2/idas/ollama/models
vim Modelfile

./bin/ollama create test-model -f Modelfile
./bin/ollama serve &
./bin/ollama run test-model
jobs
kill %1

Databricks connectivity

Databricks SQL connector

This links to the "SQL Warehouses" clusters. These are faster machines, but limit you to only using Spark SQL.

Note the environment variables

export DATABRICKS_SERVER_HOSTNAME='adb-7423990253170059.19.azuredatabricks.net'
export DATABRICKS_TOKEN='<your-databricks-developer-token>'
export DATABRICKS_HTTP_PATH='<your-cluster-id>'

SQL Warehouse Server Details

Log in to your Azure Databricks workspace
In the sidebar, click SQL > SQL Warehouses
Choose an appropriate warehouse to connect to
On the Connection Details tab, copy the connection details.

This should get you the Server Hostname and access_token parameters.

Developer/Personal Access Token

If you don't already have an Azure Databricks personal access token, do the following:

In your Azure Databricks workspace, click your Azure Databricks username in the top bar, and then select User Settings from the drop down.
Click Developer.
Next to Access tokens, click Manage.
Click Generate new token.
(Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
Click Generate.
Copy the displayed token to a secure location, and then click Done.

Note

Be sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (Revoke) icon next to the token on the Access tokens page.

If you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator.

Usage

# See https://learn.microsoft.com/en-us/azure/databricks/dev-tools/python-sql-connector
# Short story:
#    pip install databricks-sql-connector

import os
from databricks import sql

databricks_sql_environment = {
    'server-hostname' : os.environ['DATABRICKS_SERVER_HOSTNAME'],
    'http-path' : os.environ['DATABRICKS_HTTP_PATH'],
    'access-token' : os.environ['DATABRICKS_TOKEN']
}

def main():
    with sql.connect(
        server_hostname = databricks_sql_environment['server-hostname'],
        http_path = databricks_sql_environment['http-path'],
        access_token = databricks_sql_environment['access-token']
    ) as connection:
        with connection.cursor() as cursor:
            cursor.execute("select * from samples.nyctaxi.trips LIMIT 2")
            result = cursor.fetchall()

            for row in result:
                print(row)


if __name__ == "__main__":
    main()

Cluster Compute: Spark with ollama

Note the environment variables

export DATABRICKS_SERVER_HOSTNAME='adb-7423990253170059.19.azuredatabricks.net'
export DATABRICKS_TOKEN='<your-databricks-developer-token>'
export DATABRICKS_CLUSTER_ID='<your-cluster-id>'

Asertaining Your Cluster ID

In your Azure Databricks workspace, click on Compute on the left side pane.
Search and select your compute cluster of interest
Left of the "Terminate" button near the top right corner, click on the vertical ellipsis and select the "View JSON" option.
Note the value of the cluster_id key in the displayed JSON data structure.

Usage

# pip install databricks-connect===15.4.8
import os
from databricks.connect import DatabricksSession
host = f"https://{os.environ['DATABRICKS_SERVER_HOSTNAME']}"
token = os.environ['DATABRICKS_TOKEN']
cluster_id = os.environ['DATABRICKS_CLUSTER_ID']
spark = DatabricksSession.builder.remote(
    host=host,
    token=token,
    cluster_id=cluster_id
).getOrCreate()
df = spark.sql("select * from samples.nyctaxi.trips limit 4")
df.show()

Data Sources

Platforms

Teams

Ollama on RIS

LSF bsub command to get a GPU job

Ollama installation

Ollama setup model

Databricks connectivity

Databricks SQL connector

Note the environment variables

SQL Warehouse Server Details

Developer/Personal Access Token

Usage

Cluster Compute: Spark with ollama

Note the environment variables

Asertaining Your Cluster ID

Usage

Table of Contents