Connecting to a Redshift Data Source
Tecton can use Amazon Redshift as a source of batch data for feature materialization. This page explains how to set up Tecton to use Redshift as a data source.
Prerequisites
To set up Tecton with Redshift, you need the following:
- A notebook connection to Databricks or EMR.
- A Redshift Cluster Endpoint. The Redshift cluster must be configured for access over the public internet. We recommend using IP whitelisting to ensure only Tecton can access your Redshift Cluster (your Tecton deployment specialist can provide you with IP ranges).
- A Redshift username and password. We recommend that you create a new user in Redshift configured to give Tecton read-only access to Redshift.
Setting Up the Connection
To enable the Spark jobs managed by Tecton to read data from Redshift, you will configure secrets in your secret manager.
For EMR users, follow the instructions to add a secret to the AWS Secrets Manager. For Databricks users, follow the instructions for creating a secret with Databricks secret management.
Note that if your deployment name starts with tecton- already, the prefix would merely be your deployment name. The deployment name is typically the name used to access Tecton, i.e. https://
- Add a secret named
tecton-<deployment-name>/REDSHIFT_USER
, and put the Redshift user name you configured above. - Add a secret named
tecton-<deployment-name>/REDSHIFT_PASSWORD
, and put the Redshift password you configured above.
Verifying
To verify the connection, add a Redshift-backed Data Source. Do the following:
-
Deploy a
RedshiftConfig
Data Source Config object in the Redshift Feature Repository as shown here:transactions_redshift_batch_ds = RedshiftConfig( endpoint=REDSHIFT_ENDPOINT, table=REDSHIFT_TABLE, )
-
Run
tecton plan
.
The Data Source is added to Tecton. A misconfiguration results in an error message.
Notebook Cluster Access
Once you've created a Redshift Data Source you can test connecting to it in your notebook environment.
-
You may need to install s3://redshift-downloads/drivers/jdbc/1.2.12.1017/RedshiftJDBC42-no-awssdk-1.2.12.1017.jar to your notebook cluster if a redshift driver is not already present.
-
In your notebook, test connection via
import tecton ds = tecton.get_data_source(<your_data_source_name>) ds.dataframe()