Batch Feature Usage¶
In [ ]:
                Copied!
                
                
            import tecton
import pandas
from datetime import datetime
import dateutil.parser
import tecton
import pandas
from datetime import datetime
import dateutil.parser
    
        Load a Batch Feature¶
In [ ]:
                Copied!
                
                
            fv = tecton.get_feature_view("user_distinct_merchant_transaction_count_30d")
fv.summary()
fv = tecton.get_feature_view("user_distinct_merchant_transaction_count_30d")
fv.summary()
    
        | Name | user_distinct_merchant_transaction_count_30d | 
| Workspace | prod | 
| Description | How many transactions the user has made to distinct merchants in the last 30 days. | 
| Created At | 2021-06-24 19:21:49 UTC | 
| Owner | matt@tecton.ai | 
| Last Modified By | matt@tecton.ai | 
| Family | fraud | 
| Source Filename | .direnv/python-3.7.4/lib/python3.7/site-packages/tecton/cli/common.py | 
| Tags | {'release': 'production'} | 
| Type | BatchFeatureView | 
| URL | https://app.tecton.ai/app/repo/prod/features/user_distinct_merchant_transaction_count_30d | 
| Entities | fraud_user | 
| Features | distinct_merchant_transaction_count_30d | 
| Feature Services | fraud_detection_feature_service | 
| Transformation | user_distinct_merchant_transaction_count_30d | 
| Timestamp Key | timestamp | 
| Online Materialization | Enabled | 
| Offline Materialization | Enabled | 
| Feature Start Time | 2021-04-01 00:00:00 UTC | 
| Online Join Keys | user_id | 
| Offline Join Keys | user_id | 
| Serving TTL | 1 days | 
| Schedule Interval | 1 days | 
| Online Serving Freshness | 20h 29s | 
| Materialization Status | [2021-03-02 00:00:00 UTC, 2021-06-24 00:00:00 UTC] -> Ok  | 
        
        
    
Run Feature View Transformation Pipeline¶
See the API reference for the specific parameters available for each type of Feature View.
If feature_end_time is not set, it will be defaulted to datetime.now()
If feature_start_time is not set, it will be defaulted to feature_end_time - <a materialization scheduling interval>
In [ ]:
                Copied!
                
                
            result_dataframe = fv.run(feature_start_time=datetime(2021, 6, 21), feature_end_time=datetime(2021, 6, 22))
display(result_dataframe.to_spark().limit(5))
result_dataframe = fv.run(feature_start_time=datetime(2021, 6, 21), feature_end_time=datetime(2021, 6, 22))
display(result_dataframe.to_spark().limit(5))
    
        Run with Mock Inputs¶
In [ ]:
                Copied!
                
                
            # Mock data schema must follow the DataSource's schema of the FeatureView.
transactions_batch_data = [{
  "amount": 1.0,
  "nameorig": "user_1",
  "namedest": "user_2",
  "isfraud": 0,
  "isflaggedfraud": 0,
  "timestamp": datetime(2021, 6, 21, 18, 0, 0, 0),
  "type_cash_in": 0,
  "type_cash_out": 0,
  "type_payment": 1,
  "type_transfer": 0,
  "type_debit": 0,
  "__index_level_0__": 0, "partition_0": "", "partition_1": "", "partition_2": "", "partition_3": ""
}]
result_dataframe = fv.run(
  feature_start_time=datetime(2021, 6, 21),
  feature_end_time=datetime(2021, 6, 22),
  transactions_batch=spark.createDataFrame(transactions_batch_data))  # `transactions_batch` is the name of this FeatureView input.
display(result_dataframe.to_spark().limit(10))
# Mock data schema must follow the DataSource's schema of the FeatureView.
transactions_batch_data = [{
  "amount": 1.0,
  "nameorig": "user_1",
  "namedest": "user_2",
  "isfraud": 0,
  "isflaggedfraud": 0,
  "timestamp": datetime(2021, 6, 21, 18, 0, 0, 0),
  "type_cash_in": 0,
  "type_cash_out": 0,
  "type_payment": 1,
  "type_transfer": 0,
  "type_debit": 0,
  "__index_level_0__": 0, "partition_0": "", "partition_1": "", "partition_2": "", "partition_3": ""
}]
result_dataframe = fv.run(
  feature_start_time=datetime(2021, 6, 21),
  feature_end_time=datetime(2021, 6, 22),
  transactions_batch=spark.createDataFrame(transactions_batch_data))  # `transactions_batch` is the name of this FeatureView input.
display(result_dataframe.to_spark().limit(10))
    
        Run for Batch Window Aggregate Feature View¶
BatchWindowAggregateFeatureView::run is quite similar to with the only different that it also supports aggregate_tiles flag.
If aggregate_tiles is True (default behavior), the result rows which the timestamps fall within the same aggregate tile will be aggregated up into an output row.
If aggregate_tiles is False, the feature rows won't be aggregated, and will be all listed out.
For more details on aggregate_tiles, please see Tecton Documentation
In [ ]:
                Copied!
                
                
            agg_fv = tecton.get_feature_view('user_transaction_counts')
result_dataframe = agg_fv.run(
  feature_start_time=datetime(2021, 6, 21),
  feature_end_time=datetime(2021, 6, 22),
  aggregate_tiles=True)
display(result_dataframe.to_spark())
agg_fv = tecton.get_feature_view('user_transaction_counts')
result_dataframe = agg_fv.run(
  feature_start_time=datetime(2021, 6, 21),
  feature_end_time=datetime(2021, 6, 22),
  aggregate_tiles=True)
display(result_dataframe.to_spark())
    
        Get a Range of Feature Values from Offline Feature Store¶
In [ ]:
                Copied!
                
                
            fv.get_historical_features(start_time=datetime(2021, 6, 21), end_time=datetime(2021, 6, 22)).to_pandas().head()
fv.get_historical_features(start_time=datetime(2021, 6, 21), end_time=datetime(2021, 6, 22)).to_pandas().head()
    
        | user_id | distinct_merchant_transaction_count_30d | timestamp | |
|---|---|---|---|
| 0 | C1267200339 | 1 | 2021-06-21 23:59:59 | 
| 1 | C80644064 | 2 | 2021-06-21 23:59:59 | 
| 2 | C1126579915 | 4 | 2021-06-21 23:59:59 | 
| 3 | C810566550 | 3 | 2021-06-21 23:59:59 | 
| 4 | C1160723146 | 1 | 2021-06-21 23:59:59 | 
Read the Latest Features from Online Feature Store¶
In [ ]:
                Copied!
                
                
            fv.get_online_features(join_keys={"user_id": "C1126579915"}).to_dict()
fv.get_online_features(join_keys={"user_id": "C1126579915"}).to_dict()
    
        Out[19]: {'distinct_merchant_transaction_count_30d': 4}
Read Historical Features from Offline Feature Store with Time-Travel¶
In [ ]:
                Copied!
                
                
            spine_df = pandas.DataFrame({
  'user_id': ['C1126579915', 'C810566550'],
  'timestamp': [dateutil.parser.parse("2021-06-22 03:20:00"), dateutil.parser.parse("2021-06-14 18:00:00")]
})
display(spine_df)
spine_df = pandas.DataFrame({
  'user_id': ['C1126579915', 'C810566550'],
  'timestamp': [dateutil.parser.parse("2021-06-22 03:20:00"), dateutil.parser.parse("2021-06-14 18:00:00")]
})
display(spine_df)
    
        | user_id | timestamp | 
|---|---|
| C1126579915 | 2021-06-22T03:20:00.000+0000 | 
| C810566550 | 2021-06-14T18:00:00.000+0000 | 
In [ ]:
                Copied!
                
                
            features_df = fv.get_historical_features(spine=spine_df).to_pandas()
display(features_df)
features_df = fv.get_historical_features(spine=spine_df).to_pandas()
display(features_df)
    
        | user_id | timestamp | user_distinct_merchant_transaction_count_30d.distinct_merchant_transaction_count_30d | 
|---|---|---|
| C1126579915 | 2021-06-22T03:20:00.000+0000 | 4 | 
| C810566550 | 2021-06-14T18:00:00.000+0000 | 2 |