Skip to content

Creating a Feature

Overview

In this example, we'll create a simple feature using a Batch Feature View.

We'll walk through the typical steps needed to add a new feature to your feature store, including:

  1. Defining a Feature View and Transformation in your local repository
  2. Applying the Feature View to your Feature Store
  3. Inspecting sample Feature View output
  4. Enabling materialization for the Feature View
  5. Adding the Feature View to a Feature Service

This guide assumes you've already defined the Entity and Data Source you'll use for your feature.

Defining a Feature View and Transformation

Today we're going to define a simple feature based on a user's credit score. This definition will live inside a Feature View, which packages together everything Tecton needs to know to productionize your feature.

First, create a new file in your feature repository and add the following code. We typically group our Feature View definitions under a features folder.

from tecton import batch_feature_view, Input, BackfillConfig
from fraud.entities import user
from fraud.data_sources.credit_scores_batch import credit_scores_batch
from datetime import datetime


@batch_feature_view(
    inputs={'credit_scores': Input(credit_scores_batch)},
    entities=[user],
    mode='spark_sql',
    online=False,
    offline=False,
    feature_start_time=datetime(2020, 10, 10),
    batch_schedule='1d',
    ttl='30days',
    backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
    family='fraud',
    description='Whether the user has a good credit score (over 670).'
)
def user_has_good_credit(credit_scores):
    return f'''
        SELECT
            user_id,
            IF (credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM
            {credit_scores}
        '''

Note that, for now, we've disabled any materialization by setting online=False and offline=False. We'll enable those values later after we've confirmed it is working as intended.

Applying the Feature View

Up until this point, you have written a feature definition in your local repository. In order to use it in Tecton, you must register it using the Tecton CLI.

To register the feature, run the Tecton CLI command tecton apply:

% tecton apply --skip
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 35 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side validation of feature declarations
 ↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  + Create Transformation
    name:            user_has_good_credit
    description:     Whether the user has a good credit score (over 670).

  + Create BatchFeatureView
    name:            user_has_good_credit
    description:     Whether the user has a good credit score (over 670).

 ↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
Are you sure you want to apply this plan? [y/N]> y
🎉 all done!

Now that you've applied the Feature View, it's available as part of your Feature Store. You can navigate to Tecton UI and see your new feature by selecting "Features" on the left side of the page. There will be more interesting information later on.

Viewing example feature values

Now that we've applied our feature, let's verify that it's running correctly by viewing sample output in our Databricks or EMR notebook.

First, retrieve the FeatureView object from your workspace.

import tecton
workspace = tecton.get_workspace("my-workspace")
feature_view = workspace.get_feature_view("user_has_good_credit")
feature_view.summary()

We will use the FeatureView.get_historical_features() method to view some sample data.

# Set from_source = True because haven't yet materialized to the offline Store
features = feature_view.get_historical_features(from_source=True).to_spark().limit(10)
display(features)

See the get_historical_features() method signature for more details.

Enabling materialization

We've verified our Feature View is calculating features correctly, so lets start materializing data to the offline and online store. Materialization will speed up our batch queries and make it available for online retrieval.

Note that to materialize a feature, it needs to be in your prod workspace.

Going back to the Feature View definition, simply change online and offline to True

@batch_feature_view(
    inputs={'credit_scores': Input(credit_scores_batch)},
    entities=[user],
    mode='spark_sql',
    online=True,
    offline=True,
    feature_start_time=datetime(2020, 10, 10),
    batch_schedule='1d',
    ttl='30days',
    backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
    family='fraud',
    description='Whether the user has a good credit score (over 670).'
)
def user_has_good_credit(credit_scores):
    return f'''
        SELECT
            user_id,
            IF (credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM
            {credit_scores}
        '''

Once this change has been applied, Tecton will automatically begin to backfill feature data from the datetime configured in the feature_start_time parameter.

Now that we've enabled materialization, we can use the Web UI to monitor our pipeline health. Click into a Feature View to see more detailed information, such as source code for the Feature View's Transformation and monitoring information about the materialization of the Feature View. Tecton exposes the following information about the health of feature pipelines:

  • The status of feature computation jobs
  • The date ranges of raw data that have been processed by Tecton, and any errors that were encountered during a date range
  • For serving endpoints, performance metrics such as serving latency, percentage of calls resulting in errors, and requests per second

You can view the materialization status in the Web UI, or with the FeatureView.materialization_status() method in your notebook.

Adding the Feature View to a Feature Service

The last step to make our feature available for production is to include it in a Feature Service.

Create a new file in your feature repo, and include the following code.

from tecton import FeatureService
from fraud.features.user_has_good_credit import user_has_good_credit

fraud_detection_feature_service = FeatureService(
    name='fraud_detection_feature_service',
    features=[user_has_good_credit]
)

And that's it! We can now use this feature service to create training data or fetch real-time features.