Skip to content

Unit Testing With Tecton

Overview

Tecton enables you to execute unit tests on your feature repo every time tecton plan or tecton apply is run. A user will only be able to apply their changes if the tests pass.

Tecton tests can also be run directly using tecton test.

Running unit tests with tecton test

When running tecton test, Tecton uses pytest to run Tecton tests, discovering test files matching the pattern **/tests/*.py.

On-Demand Feature View Unit Test

Testing a On-Demand Feature View is straightforward, all that we need is the On-Demand Feature View and a test file located in a tests directory.

For example, let's say I have a feature view that determines if a transaction amount is high:

### transaction_amount_is_high.py ###
from tecton import RequestDataSource, Input, on_demand_feature_view
from pyspark.sql.types import DoubleType, StructType, StructField, LongType
import pandas


request_schema = StructType()
request_schema.add(StructField('amount', DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)

output_schema = StructType()
output_schema.add(StructField('transaction_amount_is_high', LongType()))


# This On-Demand Feature View evaluates a transaction amount and declares it as "high", if it's higher than 10,000
@on_demand_feature_view(
    inputs={'transaction_request': Input(transaction_request)},
    mode='pandas',
    output_schema=output_schema,
    family='fraud',
    owner='matt@tecton.ai',
    tags={'release': 'production', 'prevent-destroy': 'true', 'prevent-recreate': 'true'},
    description='Whether the transaction amount is considered high (over $10000)'
)
def transaction_amount_is_high(transaction_request: pandas.DataFrame):
    import pandas as pd

    df = pd.DataFrame()
    df['transaction_amount_is_high'] = (transaction_request['amount'] >= 10000).astype('int64')
    return df

With the above feature view, we can define the unit test that mocks up some sample inputs, and asserts that we're getting the expected result.

### tests/transaction_amount_is_high.py ###
from fraud.features.on_demand_feature_views.transaction_amount_is_high import transaction_amount_is_high
import pandas as pd
from pandas.testing import assert_frame_equal

# Testing the 'transaction_amount_is_high' feature which depends on request data ('amount') as input
def test_transaction_amount_is_high():
    transaction_request = pd.DataFrame({'amount': [124, 10001, 34235436234]})

    actual = transaction_amount_is_high.run(transaction_request=transaction_request)
    expected = pd.DataFrame({'transaction_amount_is_high': [0, 1, 1]})

    assert_frame_equal(actual, expected)

Spark Feature View Unit Test

Testing a PySpark or Spark SQL feature view is similar to the above example, except that we also need to provide a SparkSession.

For example, let's say I have a feature view that determines if a user has good credit:

### user_has_good_credit.py ###
from tecton import batch_feature_view, Input, BackfillConfig
from fraud.entities import user
from fraud.data_sources.credit_scores_batch import credit_scores_batch
from datetime import datetime


@batch_feature_view(
    inputs={'credit_scores': Input(credit_scores_batch)},
    entities=[user],
    mode='spark_sql',
    online=True,
    offline=True,
    feature_start_time=datetime(2021, 1, 1),
    batch_schedule='1d',
    ttl='120d',
    backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
)
def user_has_good_credit(credit_scores):
    return f'''
        SELECT
            user_id,
            IF (credit_score > 670, 1, 0) as user_has_good_credit,
            timestamp
        FROM
            {credit_scores}
        '''

Because this is a Spark SQL feature view, we'll need a SparkSession to test. Tecton provides the tecton_pytest_spark_session pytest fixture. This fixture creates a SparkSession.

Finally, we can define the actual unit test that mocks up some sample inputs, and asserts that we're getting the expected result.

import datetime
import pyspark
from fraud.features.batch_feature_views.user_has_good_credit import user_has_good_credit


def test_monthly_impression_count(tecton_pytest_spark_session):
    mock_data = [
        ('user_id1', "2020-10-28 05:02:11", 700),
        ('user_id2', "2020-10-28 05:02:11", 650)
    ]
    input_df = tecton_pytest_spark_session.createDataFrame(mock_data, ['user_id', 'timestamp', 'credit_score'])

    output = user_has_good_credit.run(tecton_pytest_spark_session, credit_scores=input_df)
    output = output.toPandas()

    vals = output.values.tolist()

    expected = [['user_id1', 1, '2020-10-28 05:02:11'], ['user_id2', 0, '2020-10-28 05:02:11']]

    assert vals == expected
Just like in the example above, this test will now run when we execute tecton plan.

Skip Tests

Specifying the --skip-tests flag when running tecton plan or apply will skip execution of Tecton tests.