Using the Materialization API to Manually Trigger a Feature View
Background
By default, feature materialization runs on a schedule managed by Tecton. For example, you can configure your Feature View materialization to run every hour or every day.
Manual materialization does not run on an automatic schedule. Instead, materialization is triggered via an API call. These triggers help synchronize Feature View materialization with upstream data dependencies.
Manual materialization can be triggered through the following methods:
- The Tecton SDK
- The Tecton Airflow provider
Note
This page explains how to use the Tecton SDK to manually trigger materialization.
To use the Tecton Airflow provider instead, see the readme file in the provider repo.
Configuring a Feature View for manual triggering
A BatchFeatureView
and a StreamFeatureView
can be configured for manual triggering. To do so, set batch_trigger=BatchTriggerType.MANUAL
. When set to manual, Tecton will not automatically create any batch materialization jobs for the Feature View.
For a StreamFeatureView
, only batch materialization job scheduling will be impacted by the batch_trigger
setting. Streaming materialization job scheduling will still be managed by Tecton.
Here’s an example of a BatchFeatureView
configured for manual triggering.
from tecton import batch_feature_view, FilteredSource, Aggregation, BatchTriggerType
from fraud.entities import user
from fraud.data_sources.transactions import transactions_batch
from datetime import datetime, timedelta
@batch_feature_view(
sources=[FilteredSource(transactions_batch)],
entities=[user],
mode='spark_sql',
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(column='transaction', function='count', time_window=timedelta(days=1)),
Aggregation(column='transaction', function='count', time_window=timedelta(days=30)),
Aggregation(column='transaction', function='count', time_window=timedelta(days=90))
],
online=False,
offline=True,
feature_start_time=datetime(2022, 5, 1),
tags={'release': 'production'},
owner='matt@tecton.ai',
description='User transaction totals over a series of time windows, updated daily.',
batch_trigger=BatchTriggerType.MANUAL # Use manual triggers
)
def user_transaction_counts(transactions):
return f'''
SELECT
user_id,
1 as transaction,
timestamp
FROM
{transactions}
'''
If a Data Source input to the Feature View has data_delay
set, then that delay will still be factored in to constructing training data sets but does not impact when the job can be triggered with the materialization API.
Tecton SDK methods for enabling triggering, monitoring, and canceling materialization
In the Tecton SDK, the Feature View interactive classes have methods that enable triggering, monitoring, and canceling materialization jobs. See the BatchFeatureView and StreamFeatureView interactive SDK reference for method details.
Triggering a new materialization job
The trigger_materialization_job()
method allows you to initiate a job to materialize feature values for the specified time range. This method returns a job identifier that we’ll reference in later steps.
To backfill a newly created feature, you can use this command as a one-off to backfill data from the feature start time to current time. Note that you may want to break up particularly large backfills into multiple jobs.
During regular operations, you will likely want to set up an automated process that materializes the most recent time period once the upstream data for that period is available.
Waiting for job completion
After triggering a new job, you may want to monitor the job status to start a downstream process once complete.
To block your process until the job completes, use the wait_for_materialization_job()
method. Materialization jobs can take anywhere from minutes to hours depending on the amount of data processed.
Alternatively, you can poll for completion status using the get_materialization_job()
method. This returns the MaterializationJobData
class with details about the job status. The job has completed successfully if MaterializationJobData.state=="SUCCESS".
Re-running previously materialized periods
If you use the overwrite=True
option, then Tecton will allow the new job to run and overwrite previously materialized data.
Note
When using the overwrite=True
option, it’s possible to produce incorrect results in the Online store if you have previously materialized data. Please consult with Tecton Support before proceeding.
By default, the trigger_materialization_job()
method will return an error if the time period specified overlaps with the time period from a previously successful materialization job.
This operation is generally safe if:
- Your previous job completed and did not output any feature data.
- Your Feature View is only materialized offline. (The Feature View is configured with
offline=True
andonline=False
).