Configuring EMR
The following steps enable Tecton to access your control plane account to manage AWS and Spark resources.
These instructions assume you are using EMR as the Spark provider. If you are using Databricks, please see the Databricks deployment instructions.
Terraform Templates for AWS Account Configuration
If your organization uses Terraform to manage AWS resources, we recommend you leverage this sample terraform setup repository in place of manually entering these values. The instructions below may still be a valuable reference when adapting the template to your needs, especially the networking section. Once you've applied the configuration to your account, please see the request your installation step.
Before you get started:
- Decide on a name for your deployment (e.g.
mycompany-production
), which will eventually turn into the url for your Tecton UI (mycompany-production.tecton.ai
). Note: This name must be less than 22 characters. - Determine which AWS region you'd like Tecton deployed into (e.g.
us-west-2
). - Tecton will require the following tag to be attached to all security groups and subnets that it should have access to. Optionally you can can add this tag to roles, policies, and s3 buckets to track what is accessible to Tecton.
key: tecton-accessible:DEPLOYMENT_NAME value: true
Create a Tecton S3 Bucket
Tecton will use a single S3 bucket to store all of your offline materialized feature data.
To configure the S3 bucket:
- Create an S3 bucket called
tecton-[DEPLOYMENT_NAME]
(e.g.tecton-mycompany-production
). - Ensure the bucket's region is the same as the region in which you'd like to deploy Tecton (e.g.
us-west-2
). - Enable default encryption using the Amazon S3 key (SSE-S3).
Configure IAM roles
In this section we'll configure the roles and policies required for Tecton to manage S3, Dynamo, and Spark resources. After completing this section, you should have:
-
A Spark role (
tecton-{DEPLOYMENT_NAME}-spark-role
) with the following policiestecton-{DEPLOYMENT_NAME}-spark-policy
tecton-spark-scoped-secrets-policy
- AmazonSSMManagedInstanceCore policy
-
An EMR Manager(
tecton-{DEPLOYMENT_NAME}-emr-manager-role
) role with the following policiestecton-{DEPLOYMENT_NAME}-spark-policy
tecton-emr-manager-policy
-
A cross-account role (
tecton-{DEPLOYMENT_NAME}-cross-account-role
) with the following policiestecton-cross-account-spark-policy
tecton-{DEPLOYMENT_NAME}-cross-account-policy
Configure the EMR Manager and Spark Roles
- In the AWS Console of the account you want to deploy Tecton into, go to the IAM service.
- Click the Policies tab in the sidebar.
-
Create the Tecton Spark Policy
-
Click Create Policy.
-
Paste in the following JSON policy, replacing
${REGION}
with the AWS region you selected for your deployment,${ACCOUNT_ID}
with the account ID of your Tecton Data Plane account, and${DEPLOYMENT_NAME}
with your Tecton deployment name{ "Version": "2012-10-17", "Statement": [ { "Sid": "DynamoDB", "Effect": "Allow", "Action": [ "dynamodb:BatchWriteItem", "dynamodb:ConditionCheckItem", "dynamodb:DescribeTable", "dynamodb:PutItem", "dynamodb:Query" ], "Resource": [ "arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/tecton-${DEPLOYMENT_NAME}*" ] }, { "Sid": "DynamoDBGlobal", "Effect": "Allow", "Action": [ "dynamodb:ListTables" ], "Resource": "*" }, { "Sid": "S3Bucket", "Effect": "Allow", "Action": "s3:ListBucket", "Resource": [ "arn:aws:s3:::tecton-${DEPLOYMENT_NAME}", "arn:aws:s3:::tecton.ai.databricks-init-scripts", "arn:aws:s3:::tecton.ai.public*", "arn:aws:s3:::tecton-materialization-release" ] }, { "Sid": "S3Object", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::tecton-${DEPLOYMENT_NAME}/*" ] }, { "Sid": "TectonPublicS3", "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::tecton.ai.databricks-init-scripts/*", "arn:aws:s3:::tecton.ai.public*", "arn:aws:s3:::tecton-materialization-release/*" ] } ] }
-
Click Next: Tags
-
Click Next: Review
-
Give the policy an easy to remember name starting with
tecton-
, liketecton-{DEPLOYMENT_NAME}-spark-policy
-
-
Create the Tecton EMR Manager policy
-
Click Create Policy.
-
Paste in the following JSON policy, replacing
${SPARK_ROLE}
with the name you plan to use for the role (such astecton-{DEPLOYMENT_NAME}-spark-role
), and${DEPLOYMENT_NAME}
with your Tecton deployment name{ "Version": "2012-10-17", "Statement": [ { "Sid": "CreateInTaggedNetwork1", "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:RunInstances", "ec2:CreateFleet", "ec2:CreateLaunchTemplate", "ec2:CreateLaunchTemplateVersion" ], "Resource": [ "arn:aws:ec2:*:*:security-group/*" ], "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "CreateInTaggedNetwork2", "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:RunInstances", "ec2:CreateFleet", "ec2:CreateLaunchTemplate", "ec2:CreateLaunchTemplateVersion" ], "Resource": [ "arn:aws:ec2:*:*:subnet/*" ], "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "CreateWithEMRTaggedLaunchTemplate", "Effect": "Allow", "Action": [ "ec2:CreateFleet", "ec2:RunInstances", "ec2:CreateLaunchTemplateVersion" ], "Resource": "arn:aws:ec2:*:*:launch-template/*", "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "CreateEMRTaggedLaunchTemplate", "Effect": "Allow", "Action": "ec2:CreateLaunchTemplate", "Resource": "arn:aws:ec2:*:*:launch-template/*", "Condition": { "StringEquals": { "aws:RequestTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "CreateEMRTaggedInstancesAndVolumes", "Effect": "Allow", "Action": [ "ec2:RunInstances", "ec2:CreateFleet" ], "Resource": [ "arn:aws:ec2:*:*:instance/*", "arn:aws:ec2:*:*:volume/*", "arn:aws:ec2:*:*:fleet/*" ], "Condition": { "StringEquals": { "aws:RequestTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "ResourcesToLaunchEC2", "Effect": "Allow", "Action": [ "ec2:RunInstances", "ec2:CreateFleet", "ec2:CreateLaunchTemplate", "ec2:CreateLaunchTemplateVersion" ], "Resource": [ "arn:aws:ec2:*:*:network-interface/*", "arn:aws:ec2:*::image/ami-*" ] }, { "Sid": "ManageEMRTaggedResources", "Effect": "Allow", "Action": [ "ec2:CreateLaunchTemplateVersion", "ec2:DeleteLaunchTemplate", "ec2:DeleteNetworkInterface", "ec2:ModifyInstanceAttribute", "ec2:TerminateInstances" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "ManageTagsOnEMRTaggedResources", "Effect": "Allow", "Action": [ "ec2:CreateTags", "ec2:DeleteTags" ], "Resource": [ "arn:aws:ec2:*:*:instance/*", "arn:aws:ec2:*:*:volume/*", "arn:aws:ec2:*:*:network-interface/*", "arn:aws:ec2:*:*:launch-template/*" ], "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "CreateNetworkInterfaceNeededForPrivateSubnet", "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface" ], "Resource": [ "arn:aws:ec2:*:*:network-interface/*" ], "Condition": { "StringEquals": { "aws:RequestTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "TagOnCreateTaggedEMRResources", "Effect": "Allow", "Action": [ "ec2:CreateTags" ], "Resource": [ "arn:aws:ec2:*:*:network-interface/*", "arn:aws:ec2:*:*:instance/*", "arn:aws:ec2:*:*:volume/*", "arn:aws:ec2:*:*:launch-template/*" ], "Condition": { "StringEquals": { "ec2:CreateAction": [ "RunInstances", "CreateFleet", "CreateLaunchTemplate", "CreateNetworkInterface" ] } } }, { "Sid": "ListActionsForEC2Resources", "Effect": "Allow", "Action": [ "ec2:DescribeAccountAttributes", "ec2:DescribeCapacityReservations", "ec2:DescribeDhcpOptions", "ec2:DescribeInstances", "ec2:DescribeLaunchTemplates", "ec2:DescribeNetworkAcls", "ec2:DescribeNetworkInterfaces", "ec2:DescribePlacementGroups", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVolumes", "ec2:DescribeVolumeStatus", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcs" ], "Resource": "*" }, { "Sid": "ManageSecurityGroups", "Effect": "Allow", "Action": [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupEgress", "ec2:RevokeSecurityGroupIngress" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "PassRoleForEC2", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/${SPARK_ROLE}", "Condition": { "StringLike": { "iam:PassedToService": "ec2.amazonaws.com*" } } } ] }
-
Click Next: Tags
-
Click Next: Review
-
Give the policy an easy to remember name starting with
tecton-
, liketecton-emr-manager-policy
-
Click Create Policy
-
-
Create the Tecton Spark Scoped Secrets policy
-
Click Create Policy.
-
Paste in the following JSON policy, replacing
${ACCOUNT_ID}
with the account ID of your AWS account, and${DEPLOYMENT_NAME}
with your Tecton deployment name{ "Version": "2012-10-17", "Statement": [ { "Sid": "AccessTectonScopedSecrets", "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": "arn:aws:secretsmanager:*:${ACCOUNT_ID}:secret:tecton-${DEPLOYMENT_NAME}/*" } ] }
-
Click Next: Tags
-
Click Next: Review
-
Give the policy an easy to remember name starting with
tecton-
, liketecton-spark-scoped-secrets-policy
-
Click Create Policy
-
-
Click the Roles tab in the sidebar.
-
Create the Spark Role
-
Click Create role.
-
Select EC2 under Common Use Cases
-
Click the Next: Permissions button
-
Attach the Tecton Spark Policy by searching for the policy you created earlier, such as
tecton-spark-policy
, and click the check box next to that policy to attach the policy to the new role. -
Attach the Tecton Spark Scoped Secrets Policy by searching for the policy you created earlier, such as
tecton-spark-scoped-secrets-policy
, and click the check box next to that policy to attach the policy to the new role. -
Attach the AmazonSSMManagedInstanceCore policy by searching for the AmazonSSMManagedInstanceCore policy, and click the check box next to the policy to attach the policy to the new role.
-
Click the Next: Tags button.
-
Click the Next: Review button.
-
In the Role name field, enter a role name starting with
tecton-
, such astecton-{DEPLOYMENT_NAME}-spark-role
. -
Click Create role. You will see a list of roles displayed.
-
Ensure that the role has an Instance Profile associated with it, and that the Instance Profile has the same name as the role. If you created this role through the console, the Instance Profile should have been created automatically.
-
Ensure that the role has "AWS Service: ec2" in its "Trusted Entities". If you created this role through the console, this should have been added automatically.
-
-
Create the EMR Manager role
-
Click Create role.
-
Select EMR under Use Cases
-
At the bottom of the page, select the default EMR role.
-
Click the Next: Permissions button
-
Search for the Tecton Spark policy you created earlier, such as
tecton-spark-policy
, and click the check box next to that policy to attach the policy to the new role. -
Search for the Tecton EMR Manager policy you created earlier, such as
tecton-emr-manager-policy
, and click the check box next to that policy to attach the policy to the new role. -
Click the Next: Tags button.
-
Click the Next: Review button.
-
In the Role name field, enter a role name starting with
tecton-
, such astecton-{DEPLOYMENT_NAME}-emr-manager-role
. -
Click Create role. You will see a list of roles displayed.
-
Ensure that the role has "AWS Service: elasticmapreduce" in its "Trusted Entities". If you created this role through the console, this should have been added automatically.
-
Configure the cross-account role for the Tecton Control Plane
- In the AWS Console of the account you want to deploy Tecton into, go to the IAM service.
- Click the Policies tab in the sidebar.
-
Create the cross-account Spark policy
-
Click Create Policy.
-
Paste in the following JSON policy, replacing
${SPARK_ROLE}
with the same role name you used previously (such astecton-{DEPLOYMENT_NAME}-spark-role
),${EMR_MANAGER_ROLE}
with the name you plan to use for the role (such astecton-{DEPLOYMENT_NAME}-emr-manager-role
),${REGION}
with the AWS region you selected for your deployment,${ACCOUNT_ID}
with the account ID of your Tecton Data Plane account, and${DEPLOYMENT_NAME}
with your Tecton deployment name{ "Version": "2012-10-17", "Statement": [ { "Sid": "CreateEmrServiceLinkedRole", "Effect": "Allow", "Action": [ "iam:CreateServiceLinkedRole" ], "Resource": [ "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com/AWSServiceRoleForEMRCleanup*", "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com/AWSServiceRoleForEC2Spot*" ], "Condition": { "StringLike": { "iam:AWSServiceName": [ "elasticmapreduce.amazonaws.com" ] } } }, { "Sid": "EmrPutRolePolicyForServiceLinkedRole", "Effect": "Allow", "Action": [ "iam:AttachRolePolicy", "iam:PutRolePolicy" ], "Resource": [ "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com/AWSServiceRoleForEMRCleanup*", "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com/AWSServiceRoleForEC2Spot*" ] }, { "Sid": "Ec2Global", "Effect": "Allow", "Action": [ "ec2:DescribeInstanceTypeOfferings", "ec2:DescribeSubnets", "ec2:DescribeInstances", "ec2:DescribeSecurityGroups" ], "Resource": "*" }, { "Sid": "EmrGlobal", "Effect": "Allow", "Action": [ "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstances" ], "Resource": "*" }, { "Sid": "EmrResourceTag", "Effect": "Allow", "Action": [ "ec2:Describe*", "elasticmapreduce:DescribeCluster", "elasticmapreduce:ListSteps", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:TerminateJobFlows", "elasticmapreduce:ListInstanceGroups", "ssm:StartSession" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "EmrRequestTag", "Effect": "Allow", "Action": "elasticmapreduce:RunJobFlow", "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/tecton-accessible:${DEPLOYMENT_NAME}": "true" } } }, { "Sid": "PassRoleForEMR", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::${ACCOUNT_ID}:role/${EMR_MANAGER_ROLE}", "Condition": { "StringEquals": { "iam:PassedToService": [ "elasticmapreduce.amazonaws.com" ] } } }, { "Sid": "PassRoleForEC2", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::${ACCOUNT_ID}:role/${SPARK_ROLE}", "Condition": { "StringEquals": { "iam:PassedToService": [ "ec2.amazonaws.com" ] } } }, { "Sid": "SSMForControlPlaneToConnectToTectonEMRCluster", "Effect": "Allow", "Action": "ssm:StartSession", "Resource": "arn:aws:ssm:${REGION}::document/AWS-StartPortForwardingSession" } ] }
-
Click Next: Tags
-
Click Next: Review
-
Give the policy an easy to remember name, like
tecton-cross-account-spark-policy
-
Click Create Policy
-
-
Create the cross-account policy
-
Click Create Policy.
-
Paste in the following JSON policy, replacing
${REGION}
with the AWS region you selected for your deployment,${ACCOUNT}
with the account ID of your AWS account,${DEPLOYMENT_NAME}
with your Tecton deployment name, and${SPARK_ROLE}
with the name of your spark role, such astecton-{DEPLOYMENT_NAME}-spark-role
.{ "Version": "2012-10-17", "Statement": [ { "Sid": "DynamoDB", "Effect": "Allow", "Action": [ "dynamodb:BatchGetItem", "dynamodb:BatchWriteItem", "dynamodb:ConditionCheckItem", "dynamodb:CreateTable", "dynamodb:DeleteItem", "dynamodb:DeleteTable", "dynamodb:DescribeTable", "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query", "dynamodb:TagResource", "dynamodb:UpdateTable" ], "Resource": [ "arn:aws:dynamodb:${REGION}:${ACCOUNT_ID}:table/tecton-${DEPLOYMENT_NAME}*" ] }, { "Sid": "DynamoDBGlobal", "Effect": "Allow", "Action": [ "dynamodb:ListTables", "dynamodb:DescribeLimits" ], "Resource": "*" }, { "Sid": "S3Bucket", "Effect": "Allow", "Action": "s3:ListBucket", "Resource": [ "arn:aws:s3:::tecton-${DEPLOYMENT_NAME}" ] }, { "Sid": "S3Object", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:DeleteObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::tecton-${DEPLOYMENT_NAME}/*" ] }, { "Sid": "VerifyPermissions", "Effect": "Allow", "Action": [ "iam:GetPolicy", "iam:GetRole", "iam:GetRolePolicy", "iam:GetPolicyVersion", "iam:ListPolicyVersions", "iam:ListAttachedRolePolicies", "iam:ListInstanceProfilesForRole" ], "Resource": [ "arn:aws:iam::${ACCOUNT_ID}:role/${SPARK_ROLE}", "arn:aws:iam::${ACCOUNT_ID}:policy/tecton-*", "arn:aws:iam::${ACCOUNT_ID}:role/tecton-*" ] } ] }
-
Click Next: Tags
-
Click Next: Review
-
Give the policy an easy to remember name starting with
tecton-
, liketecton-{DEPLOYMENT_NAME}-cross-account-policy
-
Click Create Policy
-
-
Create the cross-account role
-
Click the Roles tab in the sidebar.
-
Click Create role.
-
Under Select type of trusted entity, click the Another AWS account tile.
-
Specify the Tecton Account ID. Please contact your account executive obtain the correct account ID for you.
-
Enable the option "Require external ID."
-
Enter a random External ID of your choice (for example, a UUID works well). Make sure to note down the external ID that you choose -- you'll need to provide this to Tecton to complete the installation.
-
Click the Next: Permissions button
-
Search for the policy you created (e.g.
tecton-{DEPLOYMENT_NAME}-cross-account-policy
), and click the check box next to that policy to attach the policy to the new role. -
Search for the cross-account Spark policy you created (e.g.
tecton-cross-account-spark-policy
), and click the check box next to that policy to attach the policy to the new role. -
Click the Next: Tags button.
-
Click the Next: Review button.
-
In the Role name field, enter a role name starting with
tecton-
, such astecton-{DEPLOYMENT_NAME}-cross-account-role
. -
Click Create role. You will see a list of roles displayed.
-
Configure networking
Tecton will need a VPC and subnets to use when creating EMR clusters -- these can be existing resources or you can create them for Tecton. Either way, make sure to tag the resources with the tecton-accessible:DEPLOYMENT_NAME
tag.
Configure the VPC and subnet
- Add the following tag to the VPC:
key: tecton-accessible:DEPLOYMENT_NAME value: true
- You'll need a private subnet in each of the availability zones you intend for Tecton to use (at least 2 AZs)
- Ensure the route table for each of the subnets allows internet access on 0.0.0.0/0. You can accomplish this using NAT Gateways.
- Add the follow tag to each subnet:
key: tecton-accessible:DEPLOYMENT_NAME value: true
Configure security groups
You'll need to set up two security groups that allow the EMR clusters that Tecton creates to:
- Communicate internally
- Connect to other AWS resources
- Externally pull configuration
- Install Python packages
- Push metrics for monitoring and alerts
To do so, complete the following steps:
- Navigate to the "Security Groups" service in the AWS console
- Click "Create security group"
- Name the first security group
tecton-emr-security-group
, and give it a description (e.g. "A security group that EMR clusters created by Tecton will use to communicate internally") - Ensure the VPC you selected in the previous step is selected here.
- Add the following tags to the security group:
key: tecton-accessible:DEPLOYMENT_NAME value: true key: tecton-security-group-emr-usage value: manager,core&task
- Click "Create Security Group"
- Name the second security group
tecton-service-emr-security-group
, and give it a description (e.g. "A security group that EMR clusters created by Tecton will use to communicate with EMR services") - Ensure the VPC you selected in the previous step is selected here.
- Add the following tags to the security group:
key: tecton-accessible:DEPLOYMENT_NAME value: true key: tecton-security-group-emr-usage value: service-access
- Click "Create Security Group"
- Add the following inbound rules to
tecton-emr-security-group
- Allow "All TCP" from
tecton-emr-security-group
- Allow "Custom TCP" on port 8443 from
tecton-service-emr-security-group
- Allow "All TCP" from
- Add the following outbound rules to
tecton-emr-security-group
- Allow "All Traffic" to destination 0.0.0.0/0
- Add the following inbound rules to
tecton-service-emr-security-group
- Allow "Custom TCP" on port 9443 from
tecton-emr-security-group
- Allow "Custom TCP" on port 9443 from
- Add the following outbound rules to
tecton-service-emr-security-group
- Allow "Custom TCP" on port 8443 to
tecton-emr-security-group
- Allow "Custom TCP" on port 8443 to
Request your Tecton Installation
Once you've completed the above setup, you're ready to request your installation! Send the following information to the Tecton team:
- Your deployment name (e.g.
mycompany-production
) - The region in which you'd like Tecton deployed (e.g.
us-west-2
) - The ARN and External ID of the Tecton cross-account role (
tecton-{DEPLOYMENT_NAME}-cross-account-role
) - The ARN of the Spark role (
tecton-{DEPLOYMENT_NAME}-spark-role
) and the matching Instance Profile - The ARN of the EMR Manager role (
tecton-{DEPLOYMENT_NAME}-emr-manager-role
)
After you send this information to Tecton, the team will deploy Tecton into your account.