Overview of Transformations
What is a transformation?
A transformation is a function that specifies logic to run against data retrieved from external data sources.
Transformations are a crucial piece of Tecton's functionality; Feature pipelines, via Feature Views, call transformations to compute feature values.
Transformation modes
Transformations support the pandas
, pyspark
, python
, snowflake_sql
, snowpark
, and spark_sql
modes. See Transformation Modes for details.
Where transformations can be defined
A transformation can be defined inside or outside of a Feature View.
Compared to defining a transformation inside of a Feature View, the main advantages of defining a transformation outside of a Feature View are:
- Reusability
- Transformations can be reused by multiple Feature Views.
- A Feature View can call multiple transformations.
- Discoverability: Transformations can be searched in the Web UI.
Defining a transformation inside of a Feature View
The following example shows a Feature View that implements a transformation in the body of the Feature View function my_feature_view
. The transformation runs in spark_sql
mode and renames columns from the data source to feature_one
and feature_two
.
@batch_feature_view(
...
mode="spark_sql",
...
)
def my_feature_view(input_data):
return f"""
SELECT
entity_id,
timestamp,
column_a AS feature_one,
column_b AS feature_two
FROM {input_data}
"""
Defining a transformation function outside of a Feature View
See Defining a Transformation Outside of a Feature View.
Transformation input and output
Input
The input to a transformation contains the columns in the data source.
Output
When a transformation is defined inside of a Feature View, the output of the transformation is a DataFrame
that must include:
- The join keys of all entities included in the
entities
list - A timestamp column. If there is more than one timestamp column, a
timestamp_key
parameter must be set to specify which column is the correct timestamp of the feature values. - Feature value columns. All columns other than the join keys and timestamp will be considered features in a Feature View.