Skip to content

Using Pinned EMR Releases and Databricks Runtime Releases

Note

This page applies to Tecton SDK v0.6, which is in beta.

Overview

By default, a Tecton materialization cluster uses a specific EMR release or Databricks Runtime release, which computes your feature values for online serving and training dataframes. Periodically, Tecton upgrades the default EMR release/Databricks Runtime release on materialization clusters, to apply the latest security patches and stability fixes. These upgrades may include Spark upgrades.

Rarely, existing transformation logic defined in Tecton will be incompatible with a Spark upgrade.

To prevent a Spark upgrade (that will occur due to a EMR upgrade/Databricks Runtime upgrade), or to downgrade Spark if an incompatibility has occurred, you can configure Tecton to override the default EMR release/Databricks Runtime release, per Feature View and Feature Table.

Overriding Tecton’s default EMR release/Databricks Runtime release

In Feature View and Feature Table definitions, you can specify which EMR release/Databricks Runtime release is used, by setting the parameters in the table below to a DatabricksClusterConfig or a EMRClusterConfig object.

Object Parameter to Set
@batch_feature_view batch_config
@stream_feature_view stream_config
FeatureTable batch_config

If using a DatabricksClusterConfig object, set the databricks_version parameter. Note: the name must be a valid runtime name.

If using a EMRClusterConfig object, set the emr_version parameter.

Looking at the Spark release notes

We recommend looking at the Spark release notes to see if your Tecton transformations are using any deprecated features, and check if any custom JARs you use need to be updated to be compatible. This page contains links to the release notes for each Spark version.

The links below show the Spark version that is included in each version of Databricks Runtime and EMR, respectively: