Skip to content

Transformation Modes

What is a transformation mode?

A transformation mode specifies the format in which a transformation needs to be written. For example, in spark_sql mode, a transformation needs to be written in SQL, while in pyspark mode, a transformation needs to be written using the PySpark DataFrame API.

This page describes the transformation modes that are supported by transformations defined inside and outside of Feature Views.

The examples show transformations defined inside of Feature Views.

Modes for Batch Feature Views and Stream Feature Views

mode="spark_sql" and mode="snowflake_sql"

Characteristic Description
Summary Contains a SQL query
Supported Feature View types Batch Feature View, Stream Feature View. mode="snowflake_sql" is not supported in Stream Feature Views.
Supported data platforms Databricks, EMR, Snowflake
Input type A string (the name of a view generated by Tecton)
Output type A string

Example

@batch_feature_view()
    ...
    mode="spark_sql",
    ...

def user_has_good_credit(credit_scores):
    return f"""
        SELECT
            user_id,
            IF (credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM
            {credit_scores}
        """
@batch_feature_view(
    ...
    mode="snowflake_sql",
    ...
)

def user_has_good_credit(credit_scores):
    return f"""
        SELECT
            user_id,
            IFF (credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM
            {credit_scores}
        """

mode="pyspark"

Characteristic Description
Summary Contains Python code that is executed within a Spark context.
Supported Feature View types Batch Feature View, Stream Feature View
Supported data platforms Databricks, EMR
Input type A Spark DataFrame or a Tecton constant
Output type A Spark DataFrame
Notes Third party libraries can be included in user-defined PySpark functions if your cluster allows third party libraries.

Example

@on_demand_feature_view(
    ...
    mode='pyspark',
    ...
)

def user_has_good_credit(credit_scores):
    from pyspark.sql import functions as F

    df = credit_scores.withColumn("user_has_good_credit", \
        F.when(credit_scores["credit_score"] > 670, 1).otherwise(0))
    return df.select("user_id", \
        df["date"].alias("timestamp"), \
        "user_has_good_credit")

mode="snowpark"

Characteristic Description
Summary Contains Python code that is executed in Snowpark, using the Snowpark API for Python.
Supported Feature View Types Batch Feature View
Supported data platforms Snowflake
Input type a snowflake.snowpark.DataFrame or a Tecton constant
Output type A snowflake.snowpark.DataFrame
Notes The transformation function can call functions that are defined in Snowflake.

Example

@on_demand_feature_view(
    ...
    mode='snowpark',
    ...
)

def user_has_good_credit(credit_scores):
    from snowflake.snowpark.functions import when, col

    df = credit_scores.withColumn('user_has_good_credit', \
        when(col('credit_score') > 670, 1).otherwise(0))
    return df.select('user_id', 'user_has_good_credit', 'timestamp')

Modes for On Demand Feature Views

mode="pandas"

Characteristic Description
Summary Contains Python code that operates on a Pandas DataFrame
Supported Feature View Types On Demand Feature View
Supported data platforms Databricks, EMR, Snowflake
Input type A Pandas DataFrame or a Tecton constant
Output type A Pandas DataFrame

Example

@on_demand_feature_view(
    ...
    mode='pandas',
    ...
)

def transaction_amount_is_high(transaction_request):
    import pandas as pd

    df = pd.DataFrame()
    df['transaction_amount_is_high'] = (transaction_request['amount'] >= 10000).astype('int64')
    return df

mode="python"

Characteristic Description
Summary Contains Python code that operates on a dictionary
Supported Feature View Types On Demand Feature View
Supported data platforms Databricks, EMR, Snowflake
Input type A dictionary
Output type A dictionary

Example

@on_demand_feature_view(
    ...
    mode='python',
    ...
)

def user_age(request, user_date_of_birth):
    from datetime import datetime, date

    request_datetime = datetime.fromisoformat(request['timestamp']).replace(tzinfo=None)
    dob_datetime = datetime.fromisoformat(user_date_of_birth['USER_DATE_OF_BIRTH'])

    td = request_datetime - dob_datetime

    return {'user_age': td.days}