Unit Testing With Tecton

Overview

Tecton enables you to execute unit tests on your feature repo every time tecton plan or tecton apply is run. A user will only be able to apply their changes if the tests pass.

Tecton tests can also be run directly using tecton test.

Running unit tests with `tecton test`

When running tecton test, Tecton uses pytest to run Tecton tests, discovering test files matching the pattern **/tests/*.py.

On-Demand Feature View Unit Test

Testing a On-Demand Feature View is straightforward, all that we need is the On-Demand Feature View and a test file located in a tests directory.

For example, let's say I have a feature view that determines if a transaction amount is high:

### transaction_amount_is_high.py ###
from tecton import RequestDataSource, Input, on_demand_feature_view
from pyspark.sql.types import DoubleType, StructType, StructField, LongType
import pandas


request_schema = StructType()
request_schema.add(StructField('amount', DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)

output_schema = StructType()
output_schema.add(StructField('transaction_amount_is_high', LongType()))


# This On-Demand Feature View evaluates a transaction amount and declares it as "high", if it's higher than 10,000
@on_demand_feature_view(
    inputs={'transaction_request': Input(transaction_request)},
    mode='pandas',
    output_schema=output_schema,
    family='fraud',
    owner='matt@tecton.ai',
    tags={'release': 'production', 'prevent-destroy': 'true', 'prevent-recreate': 'true'},
    description='Whether the transaction amount is considered high (over $10000)'
)
def transaction_amount_is_high(transaction_request: pandas.DataFrame):
    import pandas as pd

    df = pd.DataFrame()
    df['transaction_amount_is_high'] = (transaction_request['amount'] >= 10000).astype('int64')
    return df

With the above feature view, we can define the unit test that mocks up some sample inputs, and asserts that we're getting the expected result.

### tests/transaction_amount_is_high.py ###
from fraud.features.on_demand_feature_views.transaction_amount_is_high import transaction_amount_is_high
import pandas as pd
from pandas.testing import assert_frame_equal

# Testing the 'transaction_amount_is_high' feature which depends on request data ('amount') as input
def test_transaction_amount_is_high():
    transaction_request = pd.DataFrame({'amount': [124, 10001, 34235436234]})

    actual = transaction_amount_is_high.run(transaction_request=transaction_request)
    expected = pd.DataFrame({'transaction_amount_is_high': [0, 1, 1]})

    assert_frame_equal(actual, expected)

Spark Feature View Unit Test

Testing a PySpark or Spark SQL feature view is similar to the above example, except that we also need to provide a SparkSession.

For example, let's say I have a feature view that determines if a user has good credit:

### user_has_good_credit.py ###
from tecton import batch_feature_view, Input, BackfillConfig
from fraud.entities import user
from fraud.data_sources.credit_scores_batch import credit_scores_batch
from datetime import datetime


@batch_feature_view(
    inputs={'credit_scores': Input(credit_scores_batch)},
    entities=[user],
    mode='spark_sql',
    online=True,
    offline=True,
    feature_start_time=datetime(2021, 1, 1),
    batch_schedule='1d',
    ttl='120d',
    backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
)
def user_has_good_credit(credit_scores):
    return f'''
        SELECT
            user_id,
            IF (credit_score > 670, 1, 0) as user_has_good_credit,
            timestamp
        FROM
            {credit_scores}
        '''

Because this is a Spark SQL feature view, we'll need a SparkSession to test. Tecton provides the tecton_pytest_spark_session pytest fixture. This fixture creates a SparkSession.

Finally, we can define the actual unit test that mocks up some sample inputs, and asserts that we're getting the expected result.

import datetime
import pyspark
from fraud.features.batch_feature_views.user_has_good_credit import user_has_good_credit


def test_monthly_impression_count(tecton_pytest_spark_session):
    mock_data = [
        ('user_id1', "2020-10-28 05:02:11", 700),
        ('user_id2', "2020-10-28 05:02:11", 650)
    ]
    input_df = tecton_pytest_spark_session.createDataFrame(mock_data, ['user_id', 'timestamp', 'credit_score'])

    output = user_has_good_credit.run(tecton_pytest_spark_session, credit_scores=input_df)
    output = output.toPandas()

    vals = output.values.tolist()

    expected = [['user_id1', 1, '2020-10-28 05:02:11'], ['user_id2', 0, '2020-10-28 05:02:11']]

    assert vals == expected

Just like in the example above, this test will now run when we execute tecton plan.

Skip Tests

Specifying the --skip-tests flag when running tecton plan or apply will skip execution of Tecton tests.