Transformations
Transformations are Tecton objects that describe a set of operations on data. The operations are expressed through standard frameworks such as Spark SQL, Snowflake SQL, PySpark, and Pandas.
Transformations are required to create Feature Views. Once defined, a Transformation can be reused within multiple Feature Views, or multiple Transformations can be composed within a single Feature View. Using these Transformations with your feature store provides several benefits:
- Reusability: You can define a common Transformation — to clean up data, for example — that can be shared across all Features.
- Feature versioning: If you change a Feature Transformation, the Feature Store increments the version of that feature and ensures that you don't accidentally mix features that were computed using two different implementations.
- End-to-end lineage tracking and reproducibility: Since Tecton manages Transformations, it can tie feature definitions all the way through a training data set and a model that's used in production.
- Visibility: Enabling data scientists to examine the code and see how the feature is calculated will help them understand if it's appropriate to re-use for their model.
Transformation Types
Register a python function as a Transformation in Tecton by annotating it @transformation
, and set the mode
parameter to specify the transformation type.
Tecton supports the following transaction types.
SQL
SQL transformations are configured with mode="spark_sql"
or mode="snowflake_sql"
, and return a SQL query.
The input to the transformation function must be a Spark dataframe or a Tecton constant. The tables in the FROM
clause must be parameterized via the inputs.
Example
from tecton import transformation
@transformation(mode="spark_sql")
def user_has_good_credit_transformation(credit_scores):
return f"""
SELECT
user_id,
IF (credit_score > 670, 1, 0) as user_has_good_credit,
date as timestamp
FROM
{credit_scores}
"""
from tecton import transformation
@transformation(mode="snowflake_sql")
def user_has_good_credit_transformation(credit_scores):
return f"""
SELECT
user_id,
IFF (credit_score > 670, 1, 0) as user_has_good_credit,
date as timestamp
FROM
{credit_scores}
"""
Note that Spark SQL and Snowflake SQL transformations cannot be used within an OnDemandFeatureView
.
PySpark
Note
PySpark transformations are not supported in Tecton on Snowflake.
PySpark transformations are configured with "mode=pyspark"
, and contain Python code that will be executed within a Spark context. They can additionally include third party libraries as user-defined PySpark functions if your cluster allows third party libraries.
The input to the transformation function must be a Spark dataframe or a Tecton constant. The output from the transformation function must be a Spark dataframe.
Example
from tecton import transformation
@transformation(mode="pyspark")
def user_has_good_credit_transformation(credit_scores):
from pyspark.sql import functions as F
df = credit_scores.withColumn("user_has_good_credit", \
F.when(credit_scores["credit_score"] > 670, 1).otherwise(0))
return df.select("user_id", \
df["date"].alias("timestamp"), \
"user_has_good_credit")
PySpark transformations cannot be used within an OnDemandFeatureView
.
Snowpark
A Snowpark transformation is configured using mode=snowpark
. This transformation can be used with Tecton on Snowflake.
The input to the transformation function must be a snowflake.snowpark.DataFrame
or a Tecton constant. The output from the transformation function must be a snowflake.snowpark.DataFrame
.
The transformation function can call functions that are defined in Snowflake.
Example
from tecton import transformation
@transformation(mode="snowpark")
def user_has_good_credit_transformation(credit_scores):
from snowflake.snowpark.functions import when, col
df = credit_scores.withColumn('user_has_good_credit', \
when(col('credit_score') > 670, 1).otherwise(0))
return df.select('user_id', 'user_has_good_credit', 'timestamp')
Note that Snowpark transformations cannot be used within an OnDemandFeatureView
.
Pandas
A Pandas transformation is annotated with mode=pandas
and can only be used by an OnDemandFeatureView
.
The input to the transformation function must be a Pandas dataframe or a Tecton constant.
Example
from tecton import transformation
@transformation(mode="pandas")
def transaction_amount_is_high_transformation(transaction_request):
import pandas as pd
df = pd.DataFrame()
df['transaction_amount_is_high'] = (transaction_request['amount'] >= 10000).astype('int64')
return df
Library imports
Only the Transformation function's body is registered with Tecton. This means imports and other references from the outside of the Transformation function's body will result in import errors.
In order to use imported libraries, you must import Python libraries inside the Transformation function, not at the top level as you normally would. Avoid using aliases for imports (e.g. use import pandas
instead of import pandas as pd
).
Note
Custom PyPI library dependencies are not yet supported in Pandas Transformations. pandas
and numpy
are the
currently supported transformations inside Pandas Transformations.
### Valid
from tecton import transformation
@transformation(mode="pandas")
def my_transformation(request):
import pandas
df = pandas.DataFrame()
df['amount_is_high'] = (request['amount'] >= 10000).astype('int64')
return df
### Invalid - pandas is imported outside my_transformation!
from tecton import transformation
import pandas
@transformation(mode="pandas")
def my_transformation(request):
df = pandas.DataFrame()
df['amount_is_high'] = (request['amount'] >= 10000).astype('int64')
return df
Any libraries used in function signatures must also be imported outside the function.
from tecton import transformation
import pandas # required for type hints on my_transformation.
@transformation(mode="pandas")
def my_transformation(request: pandas.DataFrame) -> pandas.DataFrame:
import pandas # required for pandas.DataFrame() below.
df = pandas.DataFrame()
df['amount_is_high'] = (request['amount'] >= 10000).astype('int64')
return df
Local module imports
Tecton supports local imports of certain types of objects. Functions or constants can be imported from local modules. Classes, class instances, and enums cannot be imported. Local module imports must also take place outside of the transformation definition.
### Valid
from tecton import transformation
import pandas # required for type hints on my_transformation.
from my_local_module import my_func, my_int_const, my_string_const, my_dict_const
@transformation(mode="pandas")
def my_transformation(request: pandas.DataFrame) -> pandas.DataFrame:
import pandas # required for pandas.DataFrame() below.
df = pandas.DataFrame()
df[my_dict_const['resultval']] = my_func(request[my_string_const] >= my_int_const)
return df
### Invalid: unsupported types
from tecton import transformation
import pandas # required for type hints on my_transformation.
from my_local_module import my_class, my_enum # unsupported types for serialization
@transformation(mode="pandas")
def my_transformation(request: pandas.DataFrame) -> pandas.DataFrame:
import pandas # required for pandas.DataFrame() below.
# classes cannot be imported
df = my_class.create_dataframe()
# enum objects cannot be imported
df[my_enum.VAL] = 1
return df
### Invalid: local module imports inside transformation
from tecton import transformation
import pandas # required for type hints on my_transformation.
from my_local_module import my_class, my_enum # unsupported types for serialization
@transformation(mode="pandas")
def my_transformation(request: pandas.DataFrame) -> pandas.DataFrame:
import pandas # required for pandas.DataFrame() below.
# import statements of local modules cannot be used within transformation function body
from my_local_module import my_func
df['my_val'] = my_func()
return df
Transformations vs. Python Functions
Transformations are simply Python functions decorated with @transformation
. The primary benefit of using Transformations is discoverability and reusability. Transformations are discoverable in the Web UI and can be (re)used individually using the Tecton SDK.
Transformations can depend on standard Python functions, but these functions will only be embedded within the Transformations instead of being registered with Tecton as top-level Transformations. The general best practice is to wrap all data transformation logic in a @transformation
.
Transformations have strict rules on return types:
- Must return a string type for
mode="spark_sql"
andmode="snowflake_sql"
- Must return a Pandas DataFrame for
mode="pandas"
- Must return a Spark DataFrame for
mode="pyspark"
Using Transformations in Feature Views
Once you've created a Transformation, the next step is to call it from a Feature View. See the Feature View Overview for more details.