On-Demand Feature View

An OnDemandFeatureView is used for simple transformations that are executed in real-time at feature request time. They allow you to:

Calculate features based on information only available at request time, such as the amount of the current transaction; and
Calculate the combination of other feature values, such as the amount of the current transaction compared to the 7 day average transaction amount.

An OnDemandFeatureView stands in contrast to all other feature views (StreamFeatureView, BatchFeatureView ), which precompute feature values and store them in the offline and/or online feature store.

Use an OnDemandFeatureView, if:

your use case requires real-time fresh features that need to process data that is only available right at the time of your real-time prediction
the latency introduced by the complexity of your on-demand transformation is acceptable for your use case (example: If your on-demand transformation executes a sleep("1second") statement, the execution of this transformation won't be any faster than 1 second)
precomputing your feature values would be a waste of storage or compute resources, because you're not expecting to actually use all pre-computed feature values in production, or because precomputing all possible feature combinations would be intractable

Common Examples

Turning a user's GPS coordinates into a geohash
Parsing a user's search string
Checking if a user's incoming transaction is larger than the user's average number of transactions in the last 30 days
Picking the maximum transaction of the past 10 transactions of a user (if combined with a last-n StreamingWindowAggregateFeatureView)
Computing the cosine similarity between a pre-computed user embedding and a query embedding.

An OnDemandFeatureView transformation is expressed as Python code.

Examples

For more examples see Examples here.

Feature with no dependencies

Feature with pre-computed dependencies

Parameters

See the API reference for the full list of parameters.

RequestDataSource

In your feature repository, the RequestDataSource defines the schema your OnDemandFeatureView will expect for request time data.

To configure a RequestDataSource, you'll need to first create a Spark StructType that defines the type for each input parameter.

Output Schema

An OnDemandFeatureView requires a defined output schema, similar to the RequestDataSource. Tecton uses the schema to display the FeatureView's expected output in the web UI.

Note: Outputs from an OnDemandFeatureView must be non-null, even if the output schema declares nullable=True .

On-Demand Transformation

Transformations for an OnDemandFeatureView work the same as other Feature Views, except they must be written in Python with mode=Pandas.

Usage Example

See how to use an On Demand Feature View in a notebook here.

How it works

While other features are pre-computed and saved in the online store, the OnDemandFeatureView transformation is executed in the Tecton service when you request a feature vector online. Inputs to the pipeline can be a RequestDataSource included in the request, or the output of other features. They cannot access data from your batch or stream data sources.

Because the OnDemandFeatureView is run at request time, you can only use Python-native or pandas based transformations. To guarantee online/offline consistency, Tecton will automatically package your transformation as a Spark UDF when you generate historical feature values offline.

Python Mode

Using mode=python requires the latest 0.3 beta release. See CLI setup guide for instructions on how to install new tecton version.

On Demand Feature Views deliver faster request-time latency when used with mode=python transformations. The primary difference between mode=python and mode=pandas is that transformations with mode=python have simple Python dictionary inputs and outputs, in place of Pandas dataframes. This new option avoids the overhead associated with dataframes.

Python Mode Example

This example uses Python mode, but is equivalent to the Pandas mode feature view shown above.

Unit Testing with Python mode

When using mode=python, the run method accepts a dictionary representing the inputs for a single row. This input diverges slightly from mode=pandas which can accept multiple rows at a time.

This example shows how to iterate through multiple test cases at a time.