Batch Window Aggregate Feature View
A BatchWindowAggregateFeatureView
is used for batch time-window aggregation features, such as a 1 hour rolling count of per-user transactions. It processes raw data from any BatchDataSource
that contains a historical log of events.
Use a BatchWindowAggregateFeatureView
if:
- you have your raw events available in a Batch Data Source
- you need tumbling, hopping or rolling time window aggregations of type
count
,sum
,mean
,max
,min
,last-n
- your use case can tolerate a feature freshness of > 1 hour
Common Examples:
- 1 hour rolling click count of a user
- Last 10 transactions of a user
- Max transaction amount of a user
BatchWindowAggregateFeatureView
is a specialized implementation for time-window aggregations that is more efficient and performant than what a normal BatchFeatureView
could accomplish. Tecton is able to achieve higher efficiency and feature freshness, because it stores partial feature values in tiles that are rolled-up at feature request time (for more details, see below).
Example
Row-level Transformation
Parameters
See the API reference for the full list of parameters.
Transformation
In the body of your Python function, you'll define row-level transformations that will then be aggregated according to the FeatureAggregation
parameter.
Your transformation must output a column for each entity and a timestamp column. Each additional column must be aggregated by at least one FeatureAggregation
. The final number of features will be based on the number of time windows you configure.
Usage Example
See how to use a Batch Window Aggregate Feature View in a notebook here.
How it works
BatchWindowAggregateFeatureView
run using Spark jobs. They update on some frequency (the slide period) and aggregate over an often longer period of time (the time window). After each slide period has elapsed, Tecton will update the value in the online store.
Behind the scenes, Tecton stores partial aggregations in the form of tiles. The tile size is defined by the aggregation_slide_period
parameter. At feature request-time, Tecton's online and offline feature serving capabilities automatically roll up the persisted tiles (as well as persisted event projections in the case of continuous streaming features). This has several key benefits:
- Significantly reduced storage requirements if you define several time windows
- Reduced precompute resource requirements, given that Tecton needs to only compute incremental tiles and not the entire time window