Serverless Feature Retrieval From any Python environment
Summary
This guide explains how you can use Tecton together with AWS Athena to retrieve features from Tecton’s offline store in any Python environment that has access to AWS (e.g. your local laptop, a Jupyter notebook, Kubeflow pipelines etc.).
Background
When using the Tecton SDK with EMR or Databricks, the Tecton SDK leverages the existing Spark context to retrieve features from the offline store. As a result, the SDK methods for offline feature retrieval function properly only in interactive Databricks notebooks or interactive AWS EMR notebooks.
As an alternative, you can use the Tecton SDK with Athena to access features from the offline store, removing the requirement of being in a Databricks or EMR Notebook.
How the SDK works with Athena
The following diagram shows how Athena fits into Tecton’s Architecture:
When you retrieve features from the offline store, the SDK executes the following steps:
- Tecton registers tables in AWS Glue for Feature Views and Feature Tables that the user attempts to read data from. Those tables point at the Feature View’s Offline Store location on S3.
- Tecton’s SDK builds and executes Athena SQL queries to fetch historical feature data from the S3-based Tecton Offline Store.
- Athena writes the query result to S3.
- Tecton’s SDK reads the result from S3 and returns a pandas DataFrame.
Note
Additionally, if you retrieve features using a pandas DataFrame spine, Tecton will upload the spine to S3 and register an Athena table for it.
Supported Tecton SDK operations
Athena is used for the following SDK operations:
FeatureView.get_historical_features()
FeatureTable.get_historical_features()
FeatureService.get_historical_features()
Otherwise, the Tecton SDK uses a Spark context for other operations that read data, such as DataSource.get_dataframe()
.
Installation
To install in your notebook, run:
pip install 'tecton[athena]'
To enable the Athena connector, run:
import tecton
tecton.conf.set("ALPHA_ATHENA_COMPUTE_ENABLED", "true")
Athena session configuration
The following optional configuration can be set on the Athena Session:
import tecton_athena
session = tecton_athena.athena_session.get_session()
config = session.config
config.boto3_session = ... # Boto3 session
config.workgroup = ... # Athena workgroup.
config.encryption = ... # Valid values: [None, 'SSE_S3', 'SSE_KMS']. Notice: 'CSE_KMS' is not supported.
config.kms_key = ... # For SSE-KMS, this is the KMS key ARN or ID.
config.database = ... # Name of the Database in the Catalog
config.s3_path = ... # S3 Location where spines will be uploaded and Athena output will be stored
config.spine_temp_table_name = ... # Specifies the temp table name in the catalog used for Tecton spines
Notes regarding some of these configuration options:
s3_path
: (Defaults to a newly created bucket in the user’s region). S3 Path that Athena writes its results to, and that Tecton uploads spines to. Must start with s3://
.
database
: (Defaults to default
). Name of the AWS Glue catalog Tecton registers Athena tables in (both for spines and for Feature Views).
spine_temp_table_name
: (Defaults to the empty string). Prefix for the Glue table the SDK registers when a spine, as required by some get_historical_features
calls, is uploaded to S3 and registered in S3. This is useful if multiple users use the SDK at the same time.
Required permissions
The environment in which the SDK is used must have the following permissions to AWS services:
- S3:
- Read and write access to the S3 path specified in
ATHENA_S3_PATH
- Read access to the S3 bucket Tecton uses for the Offline Store
- Read and write access to the S3 path specified in
- Glue Catalog:
- Read and write access to the catalog specified in
ATHENA_DATABASE
, includingglue:CreateTable
glue:DeleteTable
- Read and write access to the catalog specified in
- Athena
- Full read access, including
athena:StartQueryExecution
athena:GetQueryExecution
- Full read access, including
Authentication
The authentication methods that are available for authenticating your environment with AWS are described here.
On-Demand Feature Views
If you use a Feature Service that depends on On-Demand Feature Views, they will be executed locally directly in the SDK’s client environment.
Current Limitations
- You cannot use the Athena connector to compute features with the
get_historical_feature()
method’sfrom_source
parameter set toTrue
. Features can only be read from the offline store. - The Athena connector is not compatible with Tecton on Snowflake.