Connecting Databricks Notebooks
You can use the Tecton SDK in a Databricks notebook to explore feature values and create training datasets. The following guide covers how to configure your all-purpose cluster for use with Tecton. If you haven't already completed your deployment of Tecton with Databricks, please see the guide for Configuring Databricks.
Supported Databricks runtimes for notebooks
Tecton currently supports using Databricks Runtime 9.1 LTS with notebooks. Ensure your all-purpose cluster is configured with DBR 9.1.
Create a Tecton API key
Your cluster will need an API key to connect to Tecton. This can be obtained using the CLI by running:
$ tecton api-key create
Save this key - you will not be able get it again
1234567890abcdefabcdefabcdefabcd
TECTON_API_KEY
below.
Install the Tecton SDK
This step must be done once per cluster.
On the cluster configuration page:
- Go to the Libraries tab
- Click Install New
- Select PyPI under Library Source
- Set Package to your desired Tecton SDK version, such as
tecton==0.3.2
ortecton==0.3.*
.
Install the Tecton UDF Jar
This step must be done once per cluster.
On the cluster configuration page:
- Go to the Libraries tab
- Click Install New
- Select DBFS/S3 under Library Source
- Set File Path to
s3://tecton.ai.public/pip-repository/itorgation/tecton/{tecton_version}/tecton-udfs-spark-3.jar
wheretecton_version
matches the SDK version you installed, such as0.3.2
or0.3.*
to get the jar that matches the latest patch.
Configure SDK credentials using secrets
Tecton SDK credentials are configured using Databricks secrets. This should be pre-configured with the Tecton deployment, but if needed they can be created in the following format (such as if you wanted to access Tecton from another Databricks workspace). First, ensure the Databricks CLI is installed and configured. Next, create a secret scope and configure endpoints and API tokens using the Token created above in Prerequisites:.
The scope name is:
<deployment name>
, if your deployment name begins withtecton
tecton-<deployment name>
, otherwise
<deployment name>
is the first part of the URL used to access the Tecton UI: https://<deployment name>.tecton.ai
databricks secrets create-scope --scope <scope_name>
databricks secrets put --scope <scope_name> \
--key API_SERVICE --string-value https://foo.tecton.ai/api
databricks secrets put --scope <scope_name> \
--key TECTON_API_KEY --string-value <TOKEN>
Depending on your Databricks setup, you may need to configure ACLs for the tecton
secret scope before it is usable. See Databricks documentation for more information. For example:
databricks secrets put-acl --scope <scope_name> \
--principal your@email.com --permission MANAGE
Additionally, depending on data sources used, you may need to configure the following.
<secret-scope>/REDSHIFT_USER
<secret-scope>/REDSHIFT_PASSWORD
<secret-scope>/SNOWFLAKE_USER
<secret-scope>/SNOWFLAKE_PASSWORD
Configure permissions for cross-account access
If your Databricks workspace is in a different AWS account from your Tecton dataplane, you must configure AWS access so that Databricks can read all of the S3 buckets Tecton uses (which are in the data plane account, and are prefixed with tecton-
), as well as access to the underlying data sources Tecton reads in order to have full functionality.
Verify the connection
Create a notebook connected to a cluster with the Tecton SDK installed (see Step 1). Run the following in the notebook. If successful, you should see a list of workspaces, including the "prod"
workspace.
import tecton
tecton.list_workspaces()