Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-batch): (Compute environments cannot be created with launch templates specifying network interface) #21577

Closed
tcutts opened this issue Aug 12, 2022 · 2 comments · Fixed by #21579
Labels
@aws-cdk/aws-batch Related to AWS Batch bug This issue is a bug. effort/small Small work item – less than a day of effort in-progress This issue is being actively worked on. p2

Comments

@tcutts
Copy link
Contributor

tcutts commented Aug 12, 2022

Describe the bug

Many HPC applications require low latency, and so it's desirable to use launch templates to configure EC2 instances with Elastic Fabric Adapters. This currently fails at deployment time.

Expected Behavior

Should be able to configure a Compute Environment with no security groups, using network interfaces in the Launch Template.

Current Behavior

L2 Construct always creates a SecurityGroupIds property in the compute environment, and so the deployment fails with:

Failed resources:
batch-stack | 09:35:08 | CREATE_FAILED        | AWS::Batch::ComputeEnvironment        | EFABatch (EFABatchXXXXXXX) Resource handler returned message: "Error executing request, Exception : Either compute environment Security Groups or Network Interfaces in Launch template are exclusively allowed, RequestId: nnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn (Service: Batch, Status Code: 400, Request ID: nnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn)" (RequestToken: nnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn, HandlerErrorCode: InvalidRequest)

Reproduction Steps

The following integrity test fails, demonstrating the problem:

import * as ec2 from '@aws-cdk/aws-ec2';
import * as cdk from '@aws-cdk/core';
import * as integ from '@aws-cdk/integ-tests';
import * as batch from '../lib/';

export const app = new cdk.App();

const stack = new cdk.Stack(app, 'batch-stack');

const vpc = new ec2.Vpc(stack, 'vpc');

// While this test specifies EFA, the same behavior occurs with
// interfaceType: 'interface' as well
const launchTemplateEFA = new ec2.CfnLaunchTemplate(stack, 'ec2-launch-template-efa', {
  launchTemplateData: {
    networkInterfaces: [{
      deviceIndex: 0,
      subnetId: vpc.privateSubnets[0].subnetId,
      interfaceType: 'efa',
    }],
  },
});

new batch.ComputeEnvironment(stack, 'EFABatch', {
  managed: true,
  computeResources: {
    type: batch.ComputeResourceType.ON_DEMAND,
    instanceTypes: [new ec2.InstanceType('c5n')],
    vpc,
    launchTemplate: {
      launchTemplateName: launchTemplateEFA.launchTemplateName as string,
    },
  },
});

new integ.IntegTest(app, 'BatchWithEFATest', {
  testCases: [stack],
});

app.synth();

Possible Solution

The connected pull request is a proposed solution to this problem, allowing the user to explicitly exclude securitygroups from their ComputeEnvironment, so that they can then set the SecurityGroups in their LaunchTemplate instead. It's hard to put a validation check in for this, because the user might not have defined the launch template within the stack at all, so its contents cannot be checked before runtime.

Additional Information/Context

Interestingly, Terraform's AWS provider has exactly the same problem, also recently reported: hashicorp/terraform-provider-aws#25801

CDK CLI Version

2.37.1

Framework Version

No response

Node.js Version

14.20

OS

MacOS 12.5

Language

Typescript

Language Version

No response

Other information

No response

@tcutts tcutts added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 12, 2022
@github-actions github-actions bot added the @aws-cdk/aws-batch Related to AWS Batch label Aug 12, 2022
@peterwoodworth peterwoodworth added p2 effort/small Small work item – less than a day of effort in-progress This issue is being actively worked on. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 12, 2022
@peterwoodworth
Copy link
Contributor

Thanks for the thorough description and already submitting a PR for this @tcutts! Someone from the team should be able to review this soon 🙂

@mergify mergify bot closed this as completed in #21579 Aug 23, 2022
mergify bot pushed a commit that referenced this issue Aug 23, 2022
…o that they can be specified in Launch Template (#21579)

HPC Batch applications frequently require Elastic Fabric Adapters for low-latency networking.  Currently, the `ComputeEnvironment` construct always automatically defines a set of `SecurityGroupIds` in the CloudFormation it generates, and this prevents the stack deploying if the LaunchTemplate contains network interface definitions; Batch does not allow SecurityGroups at the `ComputeEnvironment` level if there are network interfaces defined in the `CfnLaunchTemplate`.

Since we do not currently have support for network interfaces this PR adds a new boolean property in `launchTemplate` called `useNetworkInterfaceSecurityGroups`. When this is enabled we will assume that security groups are being provided by the launch template.

A long term solution may be to:
- Add support for network interfaces in the L2 ec2.LaunchTemplate construct.
- Update the batch.ComputeEnvironment construct to take a ILaunchTemplate instead of the name/id.
- Check the ILaunchTemplate for whether the ComputeEnvironment needs to create any security groups.

closes #21577 

----

### All Submissions:

* [yes] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [no] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [yes] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [yes] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

josephedward pushed a commit to josephedward/aws-cdk that referenced this issue Aug 30, 2022
…o that they can be specified in Launch Template (aws#21579)

HPC Batch applications frequently require Elastic Fabric Adapters for low-latency networking.  Currently, the `ComputeEnvironment` construct always automatically defines a set of `SecurityGroupIds` in the CloudFormation it generates, and this prevents the stack deploying if the LaunchTemplate contains network interface definitions; Batch does not allow SecurityGroups at the `ComputeEnvironment` level if there are network interfaces defined in the `CfnLaunchTemplate`.

Since we do not currently have support for network interfaces this PR adds a new boolean property in `launchTemplate` called `useNetworkInterfaceSecurityGroups`. When this is enabled we will assume that security groups are being provided by the launch template.

A long term solution may be to:
- Add support for network interfaces in the L2 ec2.LaunchTemplate construct.
- Update the batch.ComputeEnvironment construct to take a ILaunchTemplate instead of the name/id.
- Check the ILaunchTemplate for whether the ComputeEnvironment needs to create any security groups.

closes aws#21577 

----

### All Submissions:

* [yes] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [no] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [yes] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [yes] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-batch Related to AWS Batch bug This issue is a bug. effort/small Small work item – less than a day of effort in-progress This issue is being actively worked on. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants