Databricks

In private preview

This guide explains how to create a service account (metadata access only) for Revefi on Databricks.

Step 1: Create a new Databricks access token
Revefi connects to your Databricks via an access token. You can create a new service principal for Revefi and then generate an access token for that corresponding service principal. You can also use an access token for an existing user in your workspace if that's easier. Both the approaches are described below

[Option 1]: Generate access token for a new service principal (Prefered)

  1. Create a service principal for Revefi
    Use the guide to create a service principal in your Databricks account and add it to your workspace. Save the application id

  2. Grant token usage to service principal in workspace
    Use the guide to grant the above service principal permissions to use access tokens.

  3. Generate an access token for service principal
    Use the guide to generate an access token for the new service principal.

[Option 2]: Generate personal access token for your user
1. Use the guide to generate a personal access token for an existing user.

Step 2: Enable System tables on Unity Catalog

Use the guide to enable information_schema, access, workflow, compute, query, billing and lakeflow system schemas if not enabled already. This requires metastoreID which can be found in databricks workspace under catalog -> select any catalog -> details

Step 3: Grant Unity Catalog data permission to the service principal
Run these commands on each catalog that Revefi should have access to.

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;  
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<application_id>`;  
GRANT SELECT ON CATALOG <catalog_name> TO `<application_id>`;

Revefi also needs access to thesystem catalog. Use the above commands to grant access to system Catalog as well. Note that access to system catalog can only be granted by a metastore admin

Step 4: Create a Databricks SQL Warehouse
Use the guide to create a new SQL Warehouse for Revefi(Serverless is preferred). Use the 'Permissions' button and give the new service principal 'Can use' permissions on this warehouse.

Step 5: Add Databricks as a connection in Revefi
Finally now you can add your Databricks on Revefi. On the connections page, click the 'Add connection' and select Databricks as the source. The HostName, Port and HTTP Path fields come from the SQL Warehouse created in Step 4. The Access token comes from Step 1.