Databricks

This guide explains how to onboard your Databricks account on to Revefi.

Step 1: Create a new Databricks service principal

Revefi connects to your Databricks via a service principal. You need to create a new service principal for Revefi with the right permissions and then generate oauth credentials.

Generate a new service principal for Revefi

  1. Create a new service principal for Revefi at the account level Use the guide to create a service principal in your Databricks account and add it to your workspace. Ensure that the Service Principal has Databricks SQL access and Workspace access Entitlements. This step should be done in account console. Save the application id

  2. Generate a secret for service principal
    Use the guide to generate a client ID and secret for the service principal. This again needs to be done by an Account admins via the account console. From the User management tab, select the service principal, then go to the Credentials & secrets tab to do the same.

Step 2: Grant Unity Catalog data permission to the service principal

Run these SQL commands on each catalog that Revefi should have access to. This allows revefi to monitor tables in your worspace.

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;  
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<application_id>`;  
GRANT SELECT ON CATALOG <catalog_name> TO `<application_id>`;

Revefi also needs access to the system catalog. Note that access to system catalog can only be granted by a metastore admin. This is required for Revefi to read metdata from system tables.

GRANT USE_CATALOG ON CATALOG system TO `<application_id>`;  
GRANT USE_SCHEMA ON CATALOG system TO `<application_id>`;  
GRANT SELECT ON CATALOG system TO `<application_id>`;

Step 3: Create a Databricks SQL Warehouse

Use the guide to create a new SQL Warehouse for Revefi(Serverless is preferred). Use the 'Permissions' button and give the new service principal 'Can use' permissions on this warehouse.

Step 4: Add Databricks as a connection in Revefi

Finally now you can onboard your Databricks account on Revefi. On the connections page, click the 'Add connection' and select Databricks as the source. Then select Oauth Authentication at the top. The HostName, Port and HTTP Path fields come from the SQL Warehouse created in Step 3. The ClientID and secret comes from Step 1.

If you see any connection issues, that might be because of missing permissions. You can run the below databricks cli commands on your end to ensure all the permissions have been granted first

#!/bin/bash

databricks configure --token
databricks api post /api/2.0/sql/statements \
  --json '{
    "warehouse_id": <warehouse_id>,
    "statement": "SELECT * from system.compute.warehouses limit 1"
  }'

Step 5: Additional requirements for Job Optimization

Revefi can also optimize your Databricks jobs which requires additional setup.

  1. Grant CAN_MANAGE/CAN_VIEW permissions to the service principal on all jobs and all-purpose clusters. This can be done using Terraform or using the script below.
#!/bin/bash

# === CONFIGURATION ===
SERVICE_PRINCIPAL_ID="<service-principal-object-id>"  # e.g., "12345678-aaaa-bbbb-cccc-1234567890ab"

# === Fetch all jobs ===
echo "Fetching all job IDs..."
JOB_IDS=$(databricks jobs list --output JSON | jq -r '.[].job_id')

# === Loop through jobs and assign CAN_MANAGE or CAN_VIEW permission ===
for JOB_ID in $JOB_IDS; do
  echo "Updating permissions for job_id: $JOB_ID"

  databricks permissions jobs update --job-id "$JOB_ID" --json '{
    "access_control_list": [
      {
        "service_principal_name": "'$SERVICE_PRINCIPAL_ID'",
        "permission_level": "CAN_MANAGE"
      }
    ]
  }'

  echo "Permissions updated for job $JOB_ID"
done

echo "All job permissions updated successfully."

# === Fetch all clusters ===
echo "Fetching all-purpose cluster IDs..."
CLUSTERS=$(databricks clusters list --output JSON | jq -r '.[] | select(.cluster_source != "JOB") | .cluster_id')

# === Loop through clusters and assign CAN_MANAGE or CAN_VIEW permission ===
for CLUSTER_ID in $CLUSTERS; do
  echo "Updating permissions for cluster_id: $CLUSTER_ID"

  databricks permissions clusters update --cluster-id "$CLUSTER_ID" --json '{
    "access_control_list": [
      {
        "service_principal_name": "'$SERVICE_PRINCIPAL_ID'",
        "permission_level": "CAN_MANAGE"
      }
    ]
  }'

  echo "Permissions updated for cluster $CLUSTER_ID"
done

echo "All-purpose cluster permissions updated successfully."
  1. Use guide to configure jobs to write their cluster logs to a volume at path /Volumes/revefi/default/logs. This requires you to first create a new catalog from Unity Catalog explorer called revefi. Under revefi/default path create a new Volume called logs. Under the permission tab for the new volume, grant READ_VOLUME access to revefi's service principal. Also grant WRITE_VOLUME access to the owners of jobs/clusters whose logs are being enabled.

Then change the logging setting on clusters to write logs to /Volumes/revefi/default/logs path and save. Any changes in the settings will restart the cluster immediately, so it is best if this change is done when the cluster is in terminated state.

The logs should be avaible at /Volumes/revefi/default/logs/<clusterId> whenever the cluster starts next.

Step 6: Additional requirements for Cloudwatch Metrics

Revefi can also collect metrics from your AWS account powering end to end view of databricks jobs and clusters which requires additional setup in your AWS account.

  1. Create a new IAM Permissions Policy named revefi-cloudwatch-metrics-reader-policy as below
{
  "Version": "2025-01-01",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricData",
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*"
    }
  ]
}
  1. Create a new IAM Permissions Policy named revefi-user-ec2-describe-policy as below
{
  "Version": "2025-01-01",
  "Statement": [
    {
      "Sid": "EC2Actions",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVolumes",
        "ec2:DescribeSpotPriceHistory"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
  1. Create an AWS IAM Role with custom trust policy Create a new AWS IAM role revefi-reader-role with the "Custom trust policy" trust entity as below
{
  "Version": "2025-01-01",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::220294960462:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "some-customer-generated-random-string"
        }
      }
    }
  ]
}
  1. Attach the IAM Policies created in Step 1 and Step 2 to the IAM Role created in Step 3.
  2. Now that the role and policy are created, the following information will need to be shared with Revefi. There will be optional fields to provide this information when setting up (or editing) a Databricks connection in the Revefi app.
    • Your AWS region (e.g., us-west-2)
    • The ARN of role revefi-reader-role
    • The external-id added to the role revefi-reader-role

Step 7: Additional requirements for AWS Infra Cost Metrics

Revefi can get aws infra cost for your AWS account powering end to end cost observability for databricks jobs

  1. Create a new IAM Permissions Policy named revefi-cost-metrics-reader-policy as below
{
  "Version": "2025-01-01",

  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "ce:GetCostAndUsage"
      ],
      "Resource": "*"
    }
  ]
}
  1. Attach the IAM Policy to the AWS IAM role revefi-reader-role.
  2. Activate Tags for Cost Allocation in your AWS account (this step can only be done by account billing admin)
    • Go to the AWS Billing and Cost Management console.
    • In the navigation pane, choose "Cost Allocation Tags".
    • Activate the tags JobId, ClusterId, ClusterName, and RunId.

Step 8: Additional requirements for DLT Pipeline Optimization

Revefi can also help optimize your Databricks DLT pipeline which requires additional permissions.

  1. Grant CAN_VIEW permissions to the service principal on all dlt pipeline. This can be done using Terraform or using the script below
#!/bin/bash

# === CONFIGURATION ===
SERVICE_PRINCIPAL_ID="<service-principal-object-id>"  # e.g., "12345678-aaaa-bbbb-cccc-1234567890ab"

# === Fetch all pipelines ===
echo "Fetching all pipeline IDs..."
PIPELINE_IDS=$(databricks pipelines list-pipelines --output JSON | jq -r '.[].pipeline_id')
for PIPELINE_ID in $PIPELINE_IDS; do
  echo "Updating permissions for pipeline_id: $PIPELINE_ID"
  databricks permissions update pipelines "$PIPELINE_ID" --json '{
    "access_control_list": [
      {
        "service_principal_name": "'$SERVICE_PRINCIPAL_ID'",
        "permission_level": "CAN_VIEW"
      }
    ]
  }'
  echo "Permissions updated for pipeline $PIPELINE_ID"
done

echo "All pipeline permissions updated successfully."