Fetching multiple Feature Vectors with an Online Serving Index
Tecton allows you to fetch a set of features based on specifying a subset of entity IDs. This functionality is commonly used when multiple candidates need to be scored, such as in a recommendation system.
In this example, we'll show how to retrieve feature vectors for all ads a user has seen in the past week. We'll walk through:
- Configuring the feature view with an online serving index
- Retrieving features online
- Creating training data
Configuring your feature views
First, when defining the feature view, you need to specify the online_serving_index
parameter and omit the key you won't use during retrieval.
In this case, we will specify the user at feature retrieval time, and get back a row for each ad they have feature values for.
from tecton import stream_window_aggregate_feature_view, Input, FeatureAggregation
from core.entities import user
from ads.entities import ad
from ads.data_sources.ad_impressions_stream import ad_impressions_stream
from datetime import datetime
@stream_window_aggregate_feature_view(
inputs={"ad_impressions": Input(ad_impressions_stream)},
entities=[user, ad],
online_serving_index=["user_uuid"] # Only the user_uuid will be used at retrieval time
mode="spark_sql",
aggregation_slide_period="1h",
aggregations=[FeatureAggregation(column="impression", function="count", time_windows=["1h", "12h", "24h","72h","168h"])],
online=False,
offline=False,
batch_schedule="1d",
feature_start_time=datetime(2021, 1, 1),
family='ads',
tags={'release': 'production'},
owner="matt@tecton.ai",
description="The count of impressions between a given user and a given ad"
)
def user_ad_impression_counts(ad_impressions):
return f"""
select
user_uuid as user_id,
ad_id,
1 as impression,
timestamp
from
{ad_impressions}
"""
Now that we've specified our serving indices for the Feature View, let's create our Feature Service to enable online retrieval.
from tecton import FeatureService, FeaturesConfig
from feature_repo.shared.features.user_ad_impression_counts_wildcard import user_ad_impression_counts_wildcard
ctr_prediction_service = FeatureService(
name='ctr_prediction_service',
description='A Feature Service used for supporting a CTR prediction model.',
online_serving_enabled=True,
features=[
user_ad_impression_counts
],
family='ad_serving',
tags={'release': 'production'},
owner="derek@tecton.ai",
)
Fetching wildcard features online
Once those changes have been applied, we can use the Tecton python library to retrieve a dataframe representing all the features that match our user by omitting the ad_id
join key.
import tecton
my_fs = tecton.get_feature_service("ctr_prediction_service")
keys = {
"user_uuid": "sample-user-uuid"
}
response = my_fs.query_features(keys).to_pandas()
print(response.head())
Alternatively, we can use the HTTP API. See the section above for more detail on how to configure the API key.
$ export TECTON_API_KEY='<your_tecton_key>'
$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"workspace_name": "prod",
"feature_service_name": "ctr_prediction_service",
"join_key_map": {
"user_uuid": "sample-user-id",
}
}
}'
Creating training sets with wildcard features
Similarly, we can construct our training dataset by providing a prediction context that contains the join key we specified as our serving index.
import tecton
events = spark.read.parquet("dbfs:/event_data.pq").select("user_uuid", "timestamp")
my_fs = tecton.get_feature_service("ctr_prediction_service")
training_set = fs.get_historical_features(events, timestamp_key="timestamp")
print(training_set.to_pandas().head())