Batch Inference in Tecton

Overview

This example demonstrates how to perform batch inference in Tecton. Batch inference in Tecton is very similar to generating training data

Fetch a Batch of Data from Tecton

Assuming your model was trained with data from Tecton, you created a FeatureService in order to generate training data. The same FeatureService you used to generate training data will be used to fetch a batch of data for inference.

Similar to how you built training data, you'll need to generate a DataFrame that represents the data you wish to retrieve from Tecton. This DataFrame should be composed of rows containing:

The join keys associated with each of your features
Timestamps at which you'd like to retrieve data
Columns corresponding to the RequestDataSource of any OnDemandFeatureView features, if your FeatureService includes one or more OnDemandFeatureView.

If you're not sure which join keys are associated with your features, the page corresponding to your FeatureService in the Web UI will list the entities associated with all of your features. Each entity maps to a join key that you will need.

Example: Building a Prediction Context for Fraud Detection

In this example, let's imagine we have a fraud detection model that we would like to run nightly on the last 24 hours of transactions. The features for our model describe transactions, users, and merchants. To create our prediction context, we fetch a log of the transactions in the last day, which should look like this:

transaction_id	user_id	merchant_id	timestamp
51812359	C1231006815	M1979787155	2020-12-01 01:00:02.595066019
51812360	C1666544295	M2044282225	2020-12-01 01:00:02.940659192
51812361	C1305486145	M5532624065	2020-12-01 01:00:03.336173880
51812362	C840083671	M3899427010	2020-12-01 01:00:06.033070635
51812363	C2048537720	M1230701703	2020-12-01 01:00:06.711752585

Retrieve Data with the Prediction Context

Now that you have a prediction context, you can use the Tecton SDK to retrieve features for inference. This will be the same code you used to generate a dataset:

# transaction_log is a dataframe containing the prediction context made above

ws = tecton.get_workspace('prod')
fs = ws.get_feature_service('demo_fraud_model')
batch_data = fs.get_historical_features(transaction_log, timestamp_key="timestamp")

The call to get_historical_features will return a Tecton DataFrame, where your feature values have been joined onto the prediction context. An example with a single feature joined onto the above context would look like:

transaction_id	user_id	merchant_id	timestamp	transaction_details.amount
51812359	C1231006815	M1979787155	2020-12-01 01:00:02.595066019	35.0
51812360	C1666544295	M2044282225	2020-12-01 01:00:02.940659192	522.2
51812361	C1305486145	M5532624065	2020-12-01 01:00:03.336173880	1.2
51812362	C840083671	M3899427010	2020-12-01 01:00:06.033070635	90.2
51812363	C2048537720	M1230701703	2020-12-01 01:00:06.711752585	555.6

Perform Inference

The Tecton DataFrame above can easily be used to perform batch inference. If you use Spark for inference, simply convert your data to a Spark DataFrame:

batch_data_spark = batch_data.to_spark()

If you perform inference in Python, you can use:

batch_data_spark = batch_data.to_pandas()

For other inference frameworks, you can persist your data to a file using Spark, then perform inference by loading from this file.