Connecting Databricks Notebooks

You can use the Tecton SDK in a Databricks notebook to explore feature values and create training datasets. The following guide covers how to configure your all-purpose cluster for use with Tecton. If you haven't already completed your deployment of Tecton with Databricks, please see the guide for Configuring Databricks.

Supported Databricks runtimes for notebooks

Tecton currently supports using Databricks Runtime 9.1 LTS with notebooks. Ensure your all-purpose cluster is configured with DBR 9.1.

Create a Tecton API key

Your cluster will need an API key to connect to Tecton. This can be obtained using the CLI by running:

$ tecton api-key create
Save this key - you will not be able get it again
1234567890abcdefabcdefabcdefabcd

This key will be referred to as TECTON_API_KEY below.

Install the Tecton SDK

This step must be done once per cluster.

On the cluster configuration page:

Go to the Libraries tab
Click Install New
Select PyPI under Library Source
Set Package to your desired Tecton SDK version, such as tecton==0.3.2 or tecton==0.3.*.

Install the Tecton UDF Jar

This step must be done once per cluster.

On the cluster configuration page:

Go to the Libraries tab
Click Install New
Select DBFS/S3 under Library Source
Set File Path to s3://tecton.ai.public/pip-repository/itorgation/tecton/{tecton_version}/tecton-udfs-spark-3.jar where tecton_version matches the SDK version you installed, such as 0.3.2 or 0.3.* to get the jar that matches the latest patch.

Note: If you are using a version < 0.2.6, set the File Path to s3://tecton.ai.public/pip-repository/itorgation/tecton/tecton-udfs-spark-3.jar

Configure SDK credentials using secrets

Tecton SDK credentials are configured using Databricks secrets. This should be pre-configured with the Tecton deployment, but if needed they can be created in the following format (such as if you wanted to access Tecton from another Databricks workspace). First, ensure the Databricks CLI is installed and configured. Next, create a secret scope and configure endpoints and API tokens using the Token created above in Prerequisites:. The scope name is tecton for the production Tecton cluster associated with a workspace, and tecton-<cluster_name> otherwise (such as a staging cluster created in the same account). Note that if your cluster name starts with tecton- already, the prefix would merely be your cluster name.

databricks secrets create-scope --scope <scope_name>
databricks secrets put --scope <scope_name> \
    --key API_SERVICE --string-value https://foo.tecton.ai/api
databricks secrets put --scope <scope_name> \
    --key TECTON_API_KEY --string-value <TOKEN>

Depending on your Databricks setup, you may need to configure ACLs for the tecton secret scope before it is usable. See Databricks documentation for more information. For example:

databricks secrets put-acl --scope <scope_name> \
    --principal your@email.com --permission MANAGE

Additionally, depending on data sources used, you may need to configure the following.

<secret-scope>/REDSHIFT_USER
<secret-scope>/REDSHIFT_PASSWORD
<secret-scope>/SNOWFLAKE_USER
<secret-scope>/SNOWFLAKE_PASSWORD

Configure permissions for cross-account access

If your Databricks workspace is in a different AWS account from your Tecton dataplane, you must configure AWS access so that Databricks can read all of the S3 buckets Tecton uses (which are in the data plane account, and are prefixed with tecton-), as well as access to the underlying data sources Tecton reads in order to have full functionality.

Verify the connection

Create a notebook connected to a cluster with the Tecton SDK installed (see Step 1). Run the following in the notebook. If successful, you should see a list of workspaces, including the "prod" workspace.

import tecton
tecton.list_workspaces()