Skip to content

Snowflake

Tecton can use Snowflake as a source of batch data for feature materialization. This page explains how to set up Tecton to use Snowflake as a data source.

Prerequisites

To set up Tecton to use Snowflake as a data source, you need the following:

  • A notebook connection to Databricks or EMR.
  • The URL for your Snowflake account.
  • The name of the virtual warehouse Tecton will use for querying data from Snowflake.
  • A Snowflake username and password. We recommend you create a new user in Snowflake configured to give Tecton read-only access.
  • A Snowflake Read-only role for Spark, granted to the user created above. See the Snowflake documentation for the required grants.

Configuring Secrets

To enable the Spark jobs managed by Tecton to read data from Snowflake, you will configure secrets in your secret manager.

For EMR users, follow the instructions to add a secret to the AWS Secrets Manager. For Databricks users, follow the instructions for creating a secret with Databricks secret management.

Note that if your deployment name starts with tecton- already, the prefix would merely be your deployment name. The deployment name is typically the name used to access Tecton, i.e. https://<deployment-name>.tecton.ai.

  1. Add a secret named tecton-<deployment-name>/SNOWFLAKE_USER, and put the Snowflake user name you configured above.
  2. Add a secret named tecton-<deployment-name>/SNOWFLAKE_PASSWORD, and put the Snowflake password you configured above.

Verifying

To verify the connection, add a Snowflake-backed Data Source. Do the following:

  1. Add a SnowflakeDSConfig Data Source Config object in your feature repository. Here's an example:

    from tecton import SnowflakeDSConfig, BatchDataSource
    
    # Declare SnowflakeDSConfig instance object that can be used as an argument in BatchDataSource
    snowflake_ds_config = SnowflakeDSConfig(
                                      url="https://<your-cluster>.<your-snowflake-region>.snowflakecomputing.com/",
                                      database="CLICK_STREAM_DB",
                                      schema="CLICK_STREAM_SCHEMA",
                                      warehouse="COMPUTE_WH",
                                      table="CLICK_STREAM_FEATURES")
    
    # Use in the BatchDataSource
    snowflake_ds = BatchDataSource(name="click_stream_snowflake_ds",
                                   batch_ds_config=snowflake_ds_config)
    
  2. Run tecton plan.

The Data Source is added to Tecton. A misconfiguration results in an error message.