Databricks
In private preview
This guide explains how to create a service account (metadata access only) for Revefi on Databricks.
Step 1: Create a new Databricks access token
Revefi connects to your Databricks via an access token. You can create a new service principal for Revefi and then generate an access token for that corresponding service principal. You can also use an access token for an existing user in your workspace if that's easier. Both the approaches are described below
[Option 1]: Generate access token for a new service principal (Prefered)
-
Create a service principal for Revefi
Use the guide to create a service principal in your Databricks account and add it to your workspace.
Ensure that the Service Principal has Databricks SQL access and Workspace access Entitlements.
Save the application id -
Grant token usage to service principal in workspace
Use the guide to grant the above service principal permissions to use access tokens. -
Generate an access token for service principal
Use the guide to generate an access token for the new service principal.
[Option 2]: Generate personal access token for your user
1. Use the guide to generate a personal access token for an existing user.
Step 2: Grant Unity Catalog data permission to the service principal
Run these commands on each catalog that Revefi should have access to.
GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<application_id>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<application_id>`;
Revefi also needs access to the system
catalog. Note that access to system
catalog can only be granted by a metastore admin
GRANT USE_CATALOG ON CATALOG system TO `<application_id>`;
GRANT USE_SCHEMA ON CATALOG system TO `<application_id>`;
GRANT SELECT ON CATALOG system TO `<application_id>`;
Step 3: Create a Databricks SQL Warehouse
Use the guide to create a new SQL Warehouse for Revefi(Serverless is preferred). Use the 'Permissions' button and give the new service principal 'Can use' permissions on this warehouse.
Step 4: Add Databricks as a connection in Revefi
Finally now you can add your Databricks on Revefi. On the connections page, click the 'Add connection' and select Databricks as the source.
The HostName, Port and HTTP Path fields come from the SQL Warehouse created in Step 3.
The Access token comes from Step 1.

Step 5: Additional requirements for Job Optimization
Revefi can also optimize your Databricks jobs which requires additional setup.
- Grant CAN_MANAGE permissions to the service principal on all jobs and all-purpose clusters. This can be done using Terraform or using the script below
#!/bin/bash
# === CONFIGURATION ===
SERVICE_PRINCIPAL_ID="<service-principal-object-id>" # e.g., "12345678-aaaa-bbbb-cccc-1234567890ab"
# === Fetch all jobs ===
echo "Fetching all job IDs..."
JOB_IDS=$(databricks jobs list --output JSON | jq -r '.[].job_id')
# === Loop through jobs and assign CAN_MANAGE permission ===
for JOB_ID in $JOB_IDS; do
echo "Updating permissions for job_id: $JOB_ID"
databricks permissions jobs update --job-id "$JOB_ID" --json '{
"access_control_list": [
{
"service_principal_name": "'$SERVICE_PRINCIPAL_ID'",
"permission_level": "CAN_MANAGE"
}
]
}'
echo "Permissions updated for job $JOB_ID"
done
echo "All job permissions updated successfully."
# === Fetch all clusters ===
echo "Fetching all-purpose cluster IDs..."
CLUSTERS=$(databricks clusters list --output JSON | jq -r '.[] | select(.cluster_source == "UI") | .cluster_id')
# === Loop through clusters and update permissions ===
for CLUSTER_ID in $CLUSTERS; do
echo "Updating permissions for cluster_id: $CLUSTER_ID"
databricks permissions clusters update --cluster-id "$CLUSTER_ID" --json '{
"access_control_list": [
{
"service_principal_name": "'$SERVICE_PRINCIPAL_ID'",
"permission_level": "CAN_MANAGE"
}
]
}'
echo "Permissions updated for cluster $CLUSTER_ID"
done
echo "All-purpose cluster permissions updated successfully."
- Use guide to configure jobs to write their cluster logs to a volume at path
/Volumes/revefi/default/logs
.
Step 6: Additional requirements for Cloudwatch Metrics
Revefi can also collect metrics from your AWS account powering end to end view of databricks jobs and clusters which requires additional setup in your AWS account.
- Create a new IAM Permissions Policy named
revefi-cloudwatch-metrics-reader-policy
as below
{
"Version": "2025-01-01",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": "*"
}
]
}
- Create a new IAM Permissions Policy named
revefi-user-ec2-describe-policy
as below
{
"Version": "2025-01-01",
"Statement": [
{
"Sid": "EC2Actions",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeVolumes",
"ec2:DescribeSpotPriceHistory"
],
"Resource": [
"*"
]
}
]
}
- Create an AWS IAM Role with custom trust policy
Create a new AWS IAM rolerevefi-reader-role
with the "Custom trust policy" trust entity as below
{
"Version": "2025-01-01",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::220294960462:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "some-customer-generated-random-string"
}
}
}
]
}
- Attach the IAM Policies created in Step 1 and Step 2 to the IAM Role created in Step 3.
- Now that the role and policy are created, the following information will need to be shared with
Revefi. There will be optional fields to provide this information when setting up (or editing) a Databricks
connection in the Revefi app.- Your AWS region (e.g., us-west-2)
- The ARN of role
revefi-reader-role
- The external-id added to the role
revefi-reader-role
Updated 6 days ago