Pre and Post-Production Best Practices
We recommend you follow the best practices that are described below. Following these best practices will help to ensure that your Tecton deployment runs well in production.
Scale Tecton serving capacity
Most customers want to use Tecton's real-time serving capability via its REST API. By default, your feature servers are scaled for a development load, which is approximately 50 qps. Prior to production, we will work with you to scale your feature servers to handle your expected load with headroom, to allow for spikes in traffic. If you anticipate major spikes in traffic, please reach out to us ahead of time so we can account for this. At this time, we do not currently support auto-scaling or customer-controlled scaling of feature servers.
We recommend customers scale up traffic gradually, and notify us when they enable Tecton in production, so we can be especially attuned to any anomalies that may occur.
Use dashboards to monitor your cluster
Our engineers monitor your clusters and are alerted in case of errors in operation.
In addition, there are a number of dashboards that are available to you in your cluster via the web UI:
-
Under the "Services" tab, "SLO Monitoring" and "Online Store Monitoring" allow you to view whether we're meeting our service level objectives for Tecton feature serving.
-
The "Job" tab shows all feature materialization jobs. You can filter for jobs that are retrying or failing, which may indicate an issue with spot instance availability in your AWS region or issues with your transformation logic.
- If you click the feature view associated with a particular job, it will take you to the Materialization tab of that feature view and show you batch and/or stream job clusters and error logs, if they have failed or are being retried. For EMR customers, make sure you are logged into your Tecton AWS account before clicking the link.
-
Under the "Features" tab, there is a "Monitoring Summary" tab that gives summary statistics about feature view materialization. As noted above, each feature view has its own materialization tab with information about individual materialization jobs.
Add monitoring alerts
In addition to the monitoring tools above, we also can proactively email you for specific materialization errors via a MonitoringConfig option in your feature view decorators. We highly recommend you add this option to each feature view in production so you can be alerted to materialization errors. Please consult this page for more information. If you have a paging service like PagerDuty, we suggest you use a PagerDuty-connected email address for these alerts.
When needed, open tickets in the Jira Support Portal
You can flag issues on your side using the Jira Support Portal that we have set up for you, and we will respond within our contractual SLAs. Please note that our SLAs only apply to Jira tickets due to the alerting policies we have set up there. P0 issues will page our on-call engineer 24 hours/7 days a week. Please reserve P0 priority for those issues affecting you in production with no workaround. We can also provide you an email address that is connected to our on-call paging service in case one of your team members needs to urgently contact Tecton support and does not have access to Jira.
Provide Tecton Support with your email address
We recommend you provide Tecton Support with an email address so that Tecton can manually page you for critical issues outside of working hours.
Updates to Production
Updating feature views and services
If you wish to update an existing feature or feature service that is used in production, we suggest you create a new feature/feature service instead of modifying the existing one.
If you were to modify an existing feature, it will be unavailable while Tecton re-materializes it, and the feature may have errors that our apply-time validator is not able to flag prior to materialization.
Use variants when updating feature views and services
We suggest you use our "variant" naming convention when modifying production feature views and feature services. While under the hood, Tecton will create new feature services and views, they will show up in the web UI as a variant for better organization.
-
For feature views: Tecton by default uses the name of the transformation function as the name of the feature view. You can override this to use a variant naming convention with the following:
name_override="my_feature_view:v2"
. -
For feature services: In the name field, add a colon and a tag at the end of the name. For example, if you have a feature service with name=
my_feature_service
, then you might create a modified version of this feature service calledmy_feature_service:v2
.
Updating models
If you are updating a production model, you should have feature services associated with the current and new versions of the model, perhaps using the variant naming scheme described above. We suggest slowly diverting production traffic from the current version to the new version. This will allow you to revert any changes if you do not see the feature or model results you are expecting.
Integrate with CI/CD
One benefit of Tecton's code-first configuration and declarative syntax is that you can leverage CI/CD systems to apply your feature repository changes to production.
We recommend that you store your production code in a Git repository. If that repository is connected to a CI/CD tool, such as Github Actions, then you can configure the CI/CD tool to plan or apply your feature repository upon changes. Please see this documentation page for more information and examples.