Unit Testing With Tecton
Overview
Tecton enables you to execute unit tests on your feature repo every time tecton plan
or tecton apply
is run. A user will only be able to apply
their changes if the tests pass.
Tecton tests can also be run directly using tecton test
.
Running unit tests with tecton test
When running tecton test
, Tecton uses pytest
to run Tecton tests, discovering test files matching the pattern **/tests/*.py
.
On-Demand Feature View Unit Test
Testing a On-Demand Feature View is straightforward, all that we need is the On-Demand Feature View and a test file located in a tests
directory.
For example, let's say I have a feature view that determines if a transaction amount is high:
### transaction_amount_is_high.py ###
from tecton import RequestDataSource, Input, on_demand_feature_view
from pyspark.sql.types import DoubleType, StructType, StructField, LongType
import pandas
request_schema = StructType()
request_schema.add(StructField('amount', DoubleType()))
transaction_request = RequestDataSource(request_schema=request_schema)
output_schema = StructType()
output_schema.add(StructField('transaction_amount_is_high', LongType()))
# This On-Demand Feature View evaluates a transaction amount and declares it as "high", if it's higher than 10,000
@on_demand_feature_view(
inputs={'transaction_request': Input(transaction_request)},
mode='pandas',
output_schema=output_schema,
family='fraud',
owner='matt@tecton.ai',
tags={'release': 'production', 'prevent-destroy': 'true', 'prevent-recreate': 'true'},
description='Whether the transaction amount is considered high (over $10000)'
)
def transaction_amount_is_high(transaction_request: pandas.DataFrame):
import pandas as pd
df = pd.DataFrame()
df['transaction_amount_is_high'] = (transaction_request['amount'] >= 10000).astype('int64')
return df
With the above feature view, we can define the unit test that mocks up some sample inputs, and asserts that we're getting the expected result.
### tests/transaction_amount_is_high.py ###
from fraud.features.on_demand_feature_views.transaction_amount_is_high import transaction_amount_is_high
import pandas as pd
from pandas.testing import assert_frame_equal
# Testing the 'transaction_amount_is_high' feature which depends on request data ('amount') as input
def test_transaction_amount_is_high():
transaction_request = pd.DataFrame({'amount': [124, 10001, 34235436234]})
actual = transaction_amount_is_high.run(transaction_request=transaction_request)
expected = pd.DataFrame({'transaction_amount_is_high': [0, 1, 1]})
assert_frame_equal(actual, expected)
Spark Feature View Unit Test
Testing a PySpark or Spark SQL feature view is similar to the above example, except that we also need to provide a SparkSession
.
For example, let's say I have a feature view that determines if a user has good credit:
### user_has_good_credit.py ###
from tecton import batch_feature_view, Input, BackfillConfig
from fraud.entities import user
from fraud.data_sources.credit_scores_batch import credit_scores_batch
from datetime import datetime
@batch_feature_view(
inputs={'credit_scores': Input(credit_scores_batch)},
entities=[user],
mode='spark_sql',
online=True,
offline=True,
feature_start_time=datetime(2021, 1, 1),
batch_schedule='1d',
ttl='120d',
backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
)
def user_has_good_credit(credit_scores):
return f'''
SELECT
user_id,
IF (credit_score > 670, 1, 0) as user_has_good_credit,
timestamp
FROM
{credit_scores}
'''
Because this is a Spark SQL feature view, we'll need a SparkSession to test. Tecton provides the tecton_pytest_spark_session
pytest fixture. This fixture creates a SparkSession.
Finally, we can define the actual unit test that mocks up some sample inputs, and asserts that we're getting the expected result.
import datetime
import pyspark
from fraud.features.batch_feature_views.user_has_good_credit import user_has_good_credit
def test_monthly_impression_count(tecton_pytest_spark_session):
mock_data = [
('user_id1', "2020-10-28 05:02:11", 700),
('user_id2', "2020-10-28 05:02:11", 650)
]
input_df = tecton_pytest_spark_session.createDataFrame(mock_data, ['user_id', 'timestamp', 'credit_score'])
output = user_has_good_credit.run(tecton_pytest_spark_session, credit_scores=input_df)
output = output.toPandas()
vals = output.values.tolist()
expected = [['user_id1', 1, '2020-10-28 05:02:11'], ['user_id2', 0, '2020-10-28 05:02:11']]
assert vals == expected
tecton plan
.
Skip Tests
Specifying the --skip-tests
flag when running tecton plan
or apply
will skip execution of Tecton tests.