Skip to content

aws-batch-alpha: lambda not authorized to perform batch:SubmitJob after upgrading from 2.69.0 to 2.78.0  #25574

Closed
@suzhoum

Description

Describe the bug

I'm trying to upgrade from 2.69.0 to the latest 2.78.0, and encountered an issue when trying to perform batch:SubmitJob from a lambda function. The error message is

arn:aws:sts::xxx:assumed-role/ag-bench-test-batch-stack-agbenchtestbatchjobfunct-1AQCSFR51GLG7/ag-bench-test-batch-job-function is not authorized to perform: batch:SubmitJob on resource: arn:aws:batch:us-west-2:xxx:job-definition/jobdefinitionED9E5E04-dd5ddb78a49496b

Expected Behavior

Lambda function should be able to perform batch:SubmitJob after the upgrade to v2.79.0

Current Behavior

I tried my best to update my code to generate the exact same cloudformation template that was generated in 2.69.0, but still there are some major differences.

I'm posting the code snippet that we have changed in this project in order to upgrade:

in v2.69.0:

from aws_cdk import aws_batch_alpha as batch

container = batch.JobDefinitionContainer(
            image=docker_container_image,
            gpu_count=container_gpu,
            vcpus=container_vcpu,
            memory_limit_mib=container_memory,
            linux_params=ecs.LinuxParameters(self, f"{prefix}-linux_params", shared_memory_size=container_memory),
        )

job_definition = batch.JobDefinition(
            self,
            "job-definition",
            container=container,
            retry_attempts=3,
            timeout=core.Duration.minutes(1500),
        )

batch_instance_role = iam.Role(
            self,
            f"{prefix}-instance-role",
            assumed_by=iam.CompositePrincipal(
                iam.ServicePrincipal("ec2.amazonaws.com"),
                iam.ServicePrincipal("ecs.amazonaws.com"),
                iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
            ),
            managed_policies=[
                iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AmazonEC2ContainerServiceforEC2Role"),
            ],
        )

batch_instance_profile = iam.CfnInstanceProfile(
            self, 
            f"{prefix}-instance-profile", 
            roles=[batch_instance_role.role_name]
        )

compute_environment = batch.ComputeEnvironment(
            self,
            f"{prefix}-compute-environment",
            compute_resources=batch.ComputeResources(
                allocation_strategy=batch.AllocationStrategy.BEST_FIT_PROGRESSIVE,
                vpc=vpc,
                vpc_subnets=ec2.SubnetSelection(subnets=vpc.private_subnets),
                maxv_cpus=compute_env_maxv_cpus,
                instance_role=batch_instance_profile.profile_arn,
                instance_types=instances,
                security_groups=[sg],
                type=batch.ComputeResourceType.ON_DEMAND,
                launch_template=batch.LaunchTemplateSpecification(
                    launch_template_name=batch_launch_template_name  # LaunchTemplate.launch_template_name returns None
                ),
            ),
        )

        job_queue = batch.JobQueue(
            self,
            f"{prefix}-job-queue",
            priority=1,
            compute_environments=[batch.JobQueueComputeEnvironment(compute_environment=compute_environment, order=1)],
        )

in v2.79.0

from aws_cdk import aws_batch_alpha as batch
import aws_cdk as core

container = batch.EcsEc2ContainerDefinition(
                self, 
                f"{prefix}-container-definition",
                image=docker_container_image,
                memory=core.Size.mebibytes(container_memory),
                cpu=container_vcpu,
                gpu=container_gpu,
                environment={
                    "AWS_ACCOUNT": os.environ["CDK_DEPLOY_ACCOUNT"],
                    "AWS_REGION": os.environ["CDK_DEPLOY_REGION"],
                },
                execution_role=None,
                linux_parameters=batch.LinuxParameters(self, f"{prefix}-linux-params", shared_memory_size=core.Size.mebibytes(container_memory))
            )

job_definition = batch.EcsJobDefinition(
            self, 
            f"{prefix}-job-definition",
            container=container,
            retry_attempts=3,
            timeout=core.Duration.minutes(1500)
        )

batch_service_role = iam.Role(
            self,
            f"{prefix}-service-role",
            assumed_by=iam.CompositePrincipal(
                iam.ServicePrincipal("batch.amazonaws.com"),
            ),
            managed_policies=[
                iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSBatchServiceRole"),
            ],
        )

compute_environment = batch.ManagedEc2EcsComputeEnvironment(self, f"{prefix}-compute-environment",
            vpc=vpc,
            vpc_subnets=ec2.SubnetSelection(subnets=vpc.private_subnets),
            allocation_strategy=batch.AllocationStrategy.BEST_FIT_PROGRESSIVE,
            maxv_cpus=compute_env_maxv_cpus,
            instance_role=batch_instance_profile,
            instance_types=instances,
            security_groups=[sg],
            launch_template=launch_template,
            service_role=batch_service_role,
            use_optimal_instance_classes=False,
            update_to_latest_image_version=False,
            replace_compute_environment=True,
        )

The key difference I see in the generated CFN from above code snippets are, in v2.79.0, there arecontainerdefinitionExecutionRole and containerdefinitionExecutionRoleDefaultPolicy created:

"agbenchtestcontainerdefinitionExecutionRole0A25AAB3": {
   "Type": "AWS::IAM::Role",
   "Properties": {
    "AssumeRolePolicyDocument": {
     "Statement": [
      {
       "Action": "sts:AssumeRole",
       "Effect": "Allow",
       "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
       }
      }
     ],
     "Version": "2012-10-17"
    },
    "Tags": [
     {
      "Key": "ag-bench-test",
      "Value": "benchmark"
     }
    ]
   },
   "Metadata": {
    "aws:cdk:path": "ag-bench-test-batch-stack/ag-bench-test-container-definition/ExecutionRole/Resource"
   }
  },
  "agbenchtestcontainerdefinitionExecutionRoleDefaultPolicy2B49DF06": {
   "Type": "AWS::IAM::Policy",
   "Properties": {
    "PolicyDocument": {
     "Statement": [
      {
       "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
       ],
       "Effect": "Allow",
       "Resource": {
        "Fn::Join": [
         "",
         [
          "arn:",
          {
           "Ref": "AWS::Partition"
          },
          ":ecr:us-west-2:097403188315:repository/cdk-hnb659fds-container-assets-097403188315-us-west-2"
         ]
        ]
       }
      },
      {
       "Action": "ecr:GetAuthorizationToken",
       "Effect": "Allow",
       "Resource": "*"
      }
     ],
     "Version": "2012-10-17"
    },
    "PolicyName": "agbenchtestcontainerdefinitionExecutionRoleDefaultPolicy2B49DF06",
    "Roles": [
     {
      "Ref": "agbenchtestcontainerdefinitionExecutionRole0A25AAB3"
     }
    ]
   },
   "Metadata": {
    "aws:cdk:path": "ag-bench-test-batch-stack/ag-bench-test-container-definition/ExecutionRole/DefaultPolicy/Resource"
   }
  },

"AWS::Batch::ComputeEnvironment" has two more properties in v2.79.0

"ComputeResources": {
    "UpdateToLatestImageVersion": false
}
"ReplaceComputeEnvironment": true,

The Lambda function's CFN remained unchanged.

Reproduction Steps

See above

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.79.9

Framework Version

No response

Node.js Version

v18.13.0

OS

ubuntu

Language

Python

Language Version

No response

Other information

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions