Reading Batch Features for Inference using Spark
Overview
This example demonstrates how to perform batch inference in Tecton. Batch inference in Tecton is very similar to generating training data
Fetch a Batch of Data from Tecton
Assuming your model was trained with data from Tecton, you created a FeatureService
in order to generate training data. The same FeatureService
you used to generate training data will be used to fetch a batch of data for inference.
Similar to how you built training data, you'll need to generate a DataFrame that represents the data you wish to retrieve from Tecton. This DataFrame should be composed of rows containing:
- The join keys associated with each of your features
- Timestamps at which you'd like to retrieve data
- Columns corresponding to the
RequestSource
of anyOnDemandFeatureView
features, if yourFeatureService
includes one or moreOnDemandFeatureView
.
If you're not sure which join keys are associated with your features, the page corresponding to your FeatureService
in the Web UI will list the entities associated with all of your features. Each entity maps to a join key that you will need.
Example: Building a Prediction Context for Fraud Detection
In this example, let's imagine we have a fraud detection model that we would like to run nightly on the last 24 hours of transactions. The features for our model describe transactions, users, and merchants. To create our prediction context, we fetch a log of the transactions in the last day, which should look like this:
transaction_id | user_id | merchant_id | timestamp |
---|---|---|---|
51812359 | C1231006815 | M1979787155 | 2020-12-01 01:00:02.595066019 |
51812360 | C1666544295 | M2044282225 | 2020-12-01 01:00:02.940659192 |
51812361 | C1305486145 | M5532624065 | 2020-12-01 01:00:03.336173880 |
51812362 | C840083671 | M3899427010 | 2020-12-01 01:00:06.033070635 |
51812363 | C2048537720 | M1230701703 | 2020-12-01 01:00:06.711752585 |
Retrieve Data with the Prediction Context
Now that you have a prediction context, you can use the Tecton SDK to retrieve features for inference. This will be the same code you used to generate a dataset:
# transaction_log is a dataframe containing the prediction context made above
ws = tecton.get_workspace('prod')
fs = ws.get_feature_service('demo_fraud_model')
batch_data = fs.get_historical_features(transaction_log, timestamp_key="timestamp")
The call to get_historical_features
will return a Tecton DataFrame, where your feature values have been joined onto the prediction context. An example with a single feature joined onto the above context would look like:
transaction_id | user_id | merchant_id | timestamp | transaction_details.amount |
---|---|---|---|---|
51812359 | C1231006815 | M1979787155 | 2020-12-01 01:00:02.595066019 | 35.0 |
51812360 | C1666544295 | M2044282225 | 2020-12-01 01:00:02.940659192 | 522.2 |
51812361 | C1305486145 | M5532624065 | 2020-12-01 01:00:03.336173880 | 1.2 |
51812362 | C840083671 | M3899427010 | 2020-12-01 01:00:06.033070635 | 90.2 |
51812363 | C2048537720 | M1230701703 | 2020-12-01 01:00:06.711752585 | 555.6 |
Perform Inference
The Tecton DataFrame above can easily be used to perform batch inference; simply convert your data to a Pandas DataFrame:
batch_data_pandas = batch_data.to_pandas()
For other inference frameworks, you can persist your data to a file using Spark, then perform inference by loading from this file.