BigQuery

This guide explains how to create a read-only service account (metadata access only) for Revefi on BigQuery.

Create a Service Account for Revefi

  1. Login to your Google Cloud Console. Open the menu on the left side and go to IAM & Admin -> Service Accounts. Ensure that your desired project is selected from the drop-down menu at the top bar.

  2. In the Service Accounts page, click on the Create Service Account button on the top. There is a form to create a service account. Fill in the Service Account Name, Service Account ID, Service Account Description as specified below. Make note of the email address provided between the service account ID and the service account description; you will use this to grant this service account access to project(s).

  3. On the next step, you will have an option to Grant this service account access to projects. Skip this step for now (you will grant this access in the next section) and click Done.

  4. Once the Service Account is created, click on the link and navigate to the Keys menu on the top bar. Click on Add Key -> Create New Key. There is a menu asking to choose between json file .p12 key file. Choose json and click Create. Download the json file.

  5. You will need to enter the contents of this file on Revefi’s website after the below steps.

Enable APIs

From the Google Cloud Console, enable the below APIs - if they are not already enabled. These APIs allow Revefi to understand the BigQuery regions, catalog and data lineage of your project.

  1. Cloud Resource Manager API
  2. BigQuery API
  3. DataLineage API
  4. Google Cloud Data Catalog API

For each of the links above, you will need to choose the BigQuery Project that you wish to onboard to Revefi:

Grant Service Account Access to Project(s)

  1. Navigate to Google Cloud Shell. Create a custom role for your service account to give Revefi the appropriate access.
    To give Revefi Metadata Access Only, create the custom role below. Replace the <PROJECT_ID> placeholder with your project id.

    gcloud iam roles create RevefiMetadataOnly --project=<PROJECT_ID> --title=RevefiMetadataOnly --description="Revefi metadata-only permissions" --permissions=bigquery.datasets.get,bigquery.jobs.listAll,bigquery.jobs.create,bigquery.tables.get,bigquery.tables.list,resourcemanager.projects.get,bigquery.routines.get,bigquery.routines.list,datalineage.locations.searchLinks,datalineage.events.get,bigquery.readsessions.create,bigquery.readsessions.getData,bigquery.datasets.getIamPolicy,bigquery.tables.getIamPolicy,bigquery.reservationAssignments.list,bigquery.capacityCommitments.list,bigquery.reservations.list
    
    
  2. Bind the custom role defined above to your service account. Replace the <PROJECT_ID> (both placeholders), <SERVICE_ACCOUNT_EMAIL> placeholders respectively.

    Note: Use the service account email created previously. If you used a different role name other than RevefiMetadataOnly, please change the role in the below command.

    gcloud projects add-iam-policy-binding <PROJECT_ID> --member="serviceAccount:<SERVICE_ACCOUNT_EMAIL>" --role="projects/<PROJECT_ID>/roles/RevefiMetadataOnly"
    
    
  3. Repeat the steps above for all projects you want the service account (and by extension, Revefi) to access. Revefi will automatically detect all the projects that it has access to.

Allow-listed IP Addresses

Revefi will always connect to your BigQuery project(s) from the static IP addresses 44.225.56.55, 44.227.173.42, and 52.10.141.35.

👍

Proceed to Revefi!

At this point you can proceed to the Revefi app, and create a BigQuery connection following the prompts on the app.

Appendix

Explanation of BigQuery Permissions needed

BigQuery PermissionDescriptionWhy Revefi needs this permission for Monitoring (Metadata only)
bigquery.datasets.getGet metadata about a dataset.Allows Revefi to see the INFORMATION_SCHEMA.SCHEMATA view as well as list out available Datasets in BigQuery.
bigquery.jobs.listAllList all jobs and retrieve metadata on any job submitted by any user.Allows Revefi to get Metadata on BigQuery query logs from the INFORMATION_SCHEMA.JOBS view
bigquery.jobs.createRun jobs (including queries) within the project.Allows Revefi to run SELECT queries on the Information Schema.
bigquery.tables.getGet BigQuery table metadata.Allows Revefi to get table Metadata from the INFORMATION_SCHEMA.TABLES view. Includes table creation time, number of rows, and byte size of the data.
bigquery.tables.listList tables and metadata on tables.Allows Revefi to see table Metadata and see all tables in a dataset from the INFORMATION_SCHEMA.TABLES view.
resourcemanager.projects.getRead project metadata.Allows Revefi to list available GCP Projects to iterate through in order to collect BigQuery Metadata.
Note that Revefi Service Account can only see the list of the Projects that the Service Account has been provided access to, therefore this permission is necessary.
bigquery.routines.getGet routine definitions and metadata.Allows Revefi to access INFORMATION_SCHEMA.TABLES view for table metadata information
bigquery.routines.listList routines and metadata on routines.Allows Revefi to access INFORMATION_SCHEMA.TABLES view for table metadata information
datalineage.locations.searchLinksAllows Revefi to retrieve a list of links connected to a specific table (links represent the data flow between the source and target in transformation pipelines). Links are stored in the same project as the lineage events that create them.
datalineage.events.getTo retrieve the list of links mentioned above for a specific project, this permission is needed
bigquery.readsessions.createCreate a session for streaming large results via the BigQuery Storage Read API.This permission is needed to create a session to query from the INFORMATION_SCHEMA views for large datasets.
bigquery.readsessions.getDataRead data from a read session via the Storage Read API.This permission is needed to view data from a read session to query from the INFORMATION_SCHEMA views for large datasets.
bigquery.datasets.getIamPolicyRead a dataset's IAM policy.
bigquery.tables.getIamPolicyRead a table's IAM policy.
bigquery.reservations.listList of all current capacity commitments within the administration projectAllows Revefi to access INFORMATION_SCHEMA.RESERVATIONS view
bigquery.reservationAssignments.listList of all current assignments within the administration projectAllows Revefi to access INFORMATION_SCHEMA.ASSIGNMENTS view
bigquery.capacityCommitments.listList of all current capacity commitments within the administration projectAllows Revefi to access INFORMATION_SCHEMA.CAPACITY_COMMITMENTS view

Updating an Existing Role with Additional Permissions

  1. Go to Google Cloud Shell
  2. Run the command below, modifying {{PROJECT_ID}} and {{PERMISSIONS_LIST}} as needed. {{PERMISSIONS_LIST}} should include the permission(s) desired as a comma-separated list, such as “bigquery.readsessions.create,bigquery.readsessions.getData,bigquery.datasets.getIamPolicy,bigquery.tables.getIamPolicy”
    gcloud iam roles update RevefiMetadataOnly --project={{PROJECT_ID}}  
    --add-permissions="{{PERMISSIONS_LIST}}"
    

What’s Next

Once you have run this BigQuery script, go to the Connections page in Revefi on the left nav, and click Connect on the BigQuery box to enter the credentials for the Revefi user created above! Once that's done, optionally set up to receive your alerts via Slack (default is your email).