Tecton 0.5
Tecton 0.5 was released in October 2022. The following release notes provide information about Tecton 0.5.
New features
Materialization jobs can be manually triggered
With the Materialization API, you can manually trigger materialization via an API call. The Materialization API can be used in the the Tecton SDK and in Airflow, through the Tecton Airflow provider.
Feature Output Streams
Feature View Output Streams enable your application to subscribe to the outputs of streaming feature pipelines. Your application accesses these outputs via a stream sink. Feature View Output Streams are designed to be used for asynchronous predictions, where model inference is triggered by newly arriving feature data.
The Tecton SDK can be used, in any Python Environment, to retrieve features
Using the Tecton SDK with AWS Athena removes the requirement that you use a Databricks notebook or an AWS EMR notebook to retrieve features from Tecton’s offline store.
When using the Tecton SDK with AWS Athena, you can retrieve features from Tecton’s offline store in any Python environment that has access to AWS (e.g. your local laptop, a Jupyter notebook, Kubeflow pipelines etc).
Data Source Functions, for increased flexibility in working with Data Sources
When defining a BatchSource
or StreamSource
object, you set the batch_config
or stream_config
parameter, respectively. The value of these configs
can be the name of an object (such as HiveConfig
or KafkaConfig
) or a Data Source Function.
Compared to using an object, a Data Source Function gives you more flexibility in connecting to an underlying data source and specifying logic for transforming the data retrieved from the underlying data source. However, using an object is recommended if you do not require the additional flexibility offered by a Data Source Function.
Rematerialization can be suppressed, to reduce infrastructure costs
After refactoring a Python function or migrating an upstream Data Source, you can run tecton plan
or tecton apply
with the --suppress-recreates
flag to suppress rematerialization. When rematerialization is suppressed, feature values are not recalculated.
You should only use the --suppress-recreates
flag when you are confident that changes to a Tecton repo will not affect feature values.
For information on using the flag, see Suppressing Rematerialization.
Struct Type Features in On-Demand Feature Views
You can include a Struct
data type in the output schema of an On-Demand Feature View (ODFV). A Struct
can contain multiple fields with mixed data types.
A Struct
can be nested within other complex types. For example, you can have a Struct
within a Struct
, or an array of Struct
s.
Using a Struct
in the output schema of an ODFV allows you to easily parse the ODFV's output when it contains multiple feature values.
Improvements and bug fixes
to_dict
support on SDK methods returning tabular Displayable
objects
All SDK methods returning a table now return a Displayable
object with a to_dict()
method. The following methods have been updated.
materialization_status()
summary()
deletion_status()
get_feature_freshness()
(see Note below)
Note
get_feature_freshness
no longer supports the to_dict
parameter. Calls to the method can be updated by changing tecton.get_feature_freshness(to_dict=True)
to tecton.get_feature_freshness().to_dict()
.
Alert email must now be set if monitor_freshness
= True
For monitoring of feature views, the alert_email
parameter must also be set if monitor_freshness
= True
. This is to ensure that alerting emails are sent for the desired feature views. See Alerts for more information.
get_historical_features() performance improvements on Spark
get_historical_features()
has been updated with a more performant point-in-time join. This join results in faster feature value retrieval when both of the following are true:
- The call to
get_historical_features()
contains a spine. get_historical_features()
returns feature values from non-aggregate Feature Views, custom aggregate Feature Views, or Feature Services that contain the prior two Feature Views mentioned.
Batch Feature View skew reduction
To reduce online/offline skew, get_historical_features()
now uses the _effective_timestamp
(calculated internally) to retrieve feature values. The _effective_timestamp
is the earliest time the feature will be available in the online store for inference. The _effective_timestamp
column is automatically added to all feature records returned by calls to get_historical_features()
which do not include a spine.
Improved support for nulls in On-Demand Feature Views
On-Demand Feature Views now have improved support for nulls. On-Demand Feature Views that use Pandas still have some null special handling; see the documentation.
Upgrading to 0.5
0.5 will no longer support compat definitions. Follow the instructions below to upgrade to 0.5 based on your current version. You will NOT need to re-materialize data to upgrade your objects.
Note
In 0.5, you must set an alert email for Feature Views with monitoring enabled. You may see this error blocking your apply. When upgrading from 0.3 or 0.4 in compatibility mode, please configure the alert email while upgrading your Feature views. No other semantic changes can be done when upgrading.
When upgrading to 0.5 you will see updates to your Feature View's batch_trigger
like the following as a result of the new Materialization API. These changes have no effect, and will only occur the first time you run tecton apply
with Tecton 0.5
~ Update BatchDataSource
name: transactions_batch
description: Batch Data Source for transactions stream
batch_trigger: BATCH_TRIGGER_TYPE_UNKNOWN -> BATCH_TRIGGER_TYPE_SCHEDULED
From 0.4 non-compat:
- You can move to 0.5 CLI without making any changes!
From 0.4 in compatibility mode (tecton.compat):
- You can move to 0.5 CLI directly if you upgrade all of your definitions to 0.4 definitions using this upgrade guide in one
tecton apply
. -
To upgrade definitions incrementally, i.e. in multiple
tecton apply
steps:1.) Upgrade objects to 0.4 definitions using 0.4 CLI with this guide.
2.) Once all your objects are in 0.4 definitions you can move to 0.5 CLI.
From 0.3:
- You can move to 0.5 CLI directly if you upgrade all of your definitions to 0.4 definitions using this upgrade guide in one
tecton apply
. -
To upgrade incrementally, i.e. in multiple
tecton apply
steps:1.) You must first upgrade to 0.4 CLI with objects in compatibility mode. Follow these instructions.
2.) Upgrade your objects from 0.4 compat to 0.4 definitions using these instructions.
3.) Once all your objects are 0.4 definitions, you can move to 0.5 CLI.
Patch Updates
0.5.4
- Add
data_source_names
property to feature view objects.
0.5.3
Skipped patch version 0.5.3.
0.5.2
- Fix a bug that caused
transformation.run()
to throw aTypeError
for PySpark transformations.
0.5.1
- Fix bug that broke unit tests on feature views that used the
last_distinct
aggregation. - Fix bug that prevented adding multiple transformations to a pipeline mode On Demand Feature View using Snowflake compute.
- Fix bug that prevented
get_historical_features
calls with spine from returning Null feature values.