Feature Tables
A Feature Table allows you to ingest features into Tecton that you've already transformed outside of Tecton (say in your data lake or data warehouse). In contrast to Feature Views, you are responsible for transforming raw data into feature values and ingesting those feature values into Tecton via its API.
Use a FeatureTable
if:
- you already have feature data pipelines running outside of Tecton and you want to make those feature values available for consistent offline and online consumption
- you need to run a feature transformation that's not supported by Tecton's FeatureViews. A FeatureTable provides you with a flexible escape hatch to bring arbitrary features into Tecton
Common Examples:
- You manage a pipeline outside of Tecton that generates user embeddings and you want to make those available for online and/or offline serving
- You're just getting started with Tecton and already run Airflow pipelines that produce batch features. Now you want to bring them to Tecton for online and/or offline serving
Within a single FeatureService
, you can include a FeatureTable
alongside a FeatureView
. This capability provides an easy way for you to use Tecton to develop new features, while continuing to leverage your existing feature pipelines.
Ingest Data into the Feature Table
Once the FeatureTable
has been added to your feature repository, you can use the Tecton python SDK to push feature data into Tecton.
To do so, you'll simply pass a Spark or Pandas dataframe to the FeatureTable.ingest()
method within your Spark environment. This dataframe must contain all the columns that were declared in the schema.
Use your Databricks or EMR notebook to ingest a simple dataframe to the FeatureTable
defined above.
import pandas
import tecton
from datetime import datetime, timedelta
df = pandas.DataFrame([{"user_id": "user_1",
"timestamp": pandas.Timestamp(datetime.now()),
"user_login_count_7d": 15,
"user_login_count_30d": 35}])
ft = tecton.get_feature_table("user_login_counts")
ft.ingest(df)
FeatureTable.ingest()
, you can track the status of the materialization job in the Web UI or with FeatureTable.materialization_status()
.
Usage Example
See a full example on how to use an Feature Table in this notebook here.
How it works
To ingest the dataframe, the Tecton SDK will first write the dataframe to an S3 bucket in the Tecton dataplane. Then Tecton will initiate materialization jobs to write that data into the Online and Offline stores.
If you submit duplicate features for the same join_keys and timestamps, the last write will win.