On-Demand Feature View
An OnDemandFeatureView
is used for simple transformations that are executed in real-time at feature request time. They allow you to:
- Calculate features based on information only available at request time, such as the amount of the current transaction; and
- Calculate the combination of other feature values, such as the amount of the current transaction compared to the 7 day average transaction amount.
An OnDemandFeatureView
stands in contrast to all other feature views (StreamFeatureView
, BatchFeatureView
), which precompute feature values and store them in the offline and/or online feature store.
Use an OnDemandFeatureView
, if:
- your use case requires real-time fresh features that need to process data that is only available right at the time of your real-time prediction
- the latency introduced by the complexity of your on-demand transformation is acceptable for your use case (example: If your on-demand transformation executes a
sleep("1second")
statement, the execution of this transformation won't be any faster than 1 second) - precomputing your feature values would be a waste of storage or compute resources, because you're not expecting to actually use all pre-computed feature values in production, or because precomputing all possible feature combinations would be intractable
Common Examples
- Turning a user's GPS coordinates into a geohash
- Parsing a user's search string
- Checking if a user's incoming transaction is larger than the user's average number of transactions in the last 30 days
- Picking the maximum transaction of the past 10 transactions of a user (if combined with a
last-n
StreamingWindowAggregateFeatureView
) - Computing the cosine similarity between a pre-computed user embedding and a query embedding.
An OnDemandFeatureView
transformation is expressed as Python code.
Examples
For more examples see Examples here.
Feature with no dependencies
Feature with pre-computed dependencies
Parameters
See the API reference for the full list of parameters.
RequestDataSource
In your feature repository, the RequestDataSource
defines the schema your OnDemandFeatureView
will expect for request time data.
To configure a RequestDataSource
, you'll need to first create a Spark StructType
that defines the type for each input parameter.
Output Schema
An OnDemandFeatureView
requires a defined output schema, similar to the RequestDataSource
. Tecton uses the schema to display the FeatureView's expected output in the web UI.
Note: Outputs from an OnDemandFeatureView
must be non-null, even if the output schema declares nullable=True
.
On-Demand Transformation
Transformations for an OnDemandFeatureView
work the same as other Feature Views, except they must be written in Python with mode=Pandas
.
Usage Example
See how to use an On Demand Feature View in a notebook here.
How it works
While other features are pre-computed and saved in the online store, the OnDemandFeatureView
transformation is executed in the Tecton service when you request a feature vector online. Inputs to the pipeline can be a RequestDataSource
included in the request, or the output of other features. They cannot access data from your batch or stream data sources.
Because the OnDemandFeatureView
is run at request time, you can only use Python-native or pandas
based transformations. To guarantee online/offline consistency, Tecton will automatically package your transformation as a Spark UDF when you generate historical feature values offline.
Python Mode
Python Mode
Using mode=python
requires the latest 0.3
beta release. See CLI setup guide for instructions on how to install new tecton version.
On Demand Feature Views deliver faster request-time latency when used with mode=python
transformations.
The primary difference between mode=python
and mode=pandas
is that transformations with mode=python
have simple Python dictionary inputs and outputs, in place of Pandas dataframes. This new option avoids the overhead associated with dataframes.
Python Mode Example
This example uses Python mode, but is equivalent to the Pandas mode feature view shown above.
Unit Testing with Python mode
When using mode=python
, the run
method accepts a dictionary representing the inputs for a single row. This input diverges slightly from mode=pandas
which can accept multiple rows at a time.
This example shows how to iterate through multiple test cases at a time.