Skip to content

Overview of Transformations

What is a transformation?

A transformation is a function that specifies logic to run against data retrieved from external data sources.

Transformations are a crucial piece of Tecton's functionality; Feature pipelines, via Feature Views, call transformations to compute feature values.

Transformation modes

Transformations support the pandas, pyspark, python, snowflake_sql, snowpark, and spark_sql modes. See Transformation Modes for details.

Where transformations can be defined

A transformation can be defined inside or outside of a Feature View.

Compared to defining a transformation inside of a Feature View, the main advantages of defining a transformation outside of a Feature View are:

  • Reusability
    • Transformations can be reused by multiple Feature Views.
    • A Feature View can call multiple transformations.
  • Discoverability: Transformations can be searched in the Web UI.

Defining a transformation inside of a Feature View

The following example shows a Feature View that implements a transformation in the body of the Feature View function my_feature_view. The transformation runs in spark_sql mode and renames columns from the data source to feature_one and feature_two.

@batch_feature_view(
  ...
    mode="spark_sql",
    ...
)
def my_feature_view(input_data):
    return f"""
        SELECT
            entity_id,
            timestamp,
            column_a AS feature_one,
            column_b AS feature_two
        FROM {input_data}
    """

Defining a transformation function outside of a Feature View

See Defining a Transformation Outside of a Feature View.

Transformation input and output

Input

The input to a transformation contains the columns in the data source.

Output

When a transformation is defined inside of a Feature View, the output of the transformation is a DataFrame that must include:

  1. The join keys of all entities included in the entities list
  2. A timestamp column. If there is more than one timestamp column, a timestamp_key parameter must be set to specify which column is the correct timestamp of the feature values.
  3. Feature value columns. All columns other than the join keys and timestamp will be considered features in a Feature View.