Practical VPC Design
301 level guidance from an AWS Solutions Architect
Few areas of cloud infrastructure are more important to get right from the start than the IP address layout of oneâs Virtual Private Cloud (VPC). VPC design has far-reaching implications for scaling, fault-tolerance and security. It also directly affects the flexibility of your infrastructure: paint yourself into a corner, and youâll spend ungodly amounts of time migrating instances across subnets to free up address space.
Fortunately, itâs easier to lay out a VPC the right way than the wrong way. You just have to keep a few principles in mind.
Subnets
Proper subnet layout is the key to a well-functioning VPC. Subnets determine routing, Availability Zone (AZ) distribution, and Network Access Control Lists (NACLs).
The most common mistake Iâve observed around VPC subnetting is the treatment of a VPC like a data center network. VPCâs are not data centers. They are not switches. They are not routers. (Although they perform the jobs of all three.) A VPC is a software-defined network (SDN) optimized for moving massive amounts of packets into, out of and across AWS regions. Your packet is picked up at the front door and dropped off at its destination. Itâs as simple as that.
Because of that simplicity, a number of data center and networking-gear issues are eliminated at the outset.
A bit of history: when I first started building data centers in the 90âs, we had 10 Mb/s ethernet switches. Ethernet uses Address Resolution Protocol (ARP) broadcasts to determine whoâs where in the switch fabric. Because of that, network segments are chatty in direct proportion to the number of hosts on the broadcast domain. So anything beyond a couple hundred hosts would start to degrade performance. That, combined with the counter-intuitive nature of IPv4 subnet math, led to the practical effect of everyone using 24-bit subnets for different network segments. Three-octet addresses seemed to sit right in the sweet spot of all the constraints.
That thinking is no longer valid in a cloud environment. VPCs support neither broadcast nor multicast. What looks like ARP to the OS is actually the elegant function of the SDN. With that in mind, there is absolutely no reason to hack a VPC into 24-bit subnets. In fact, you have an important reason not to: waste. When you have a âmiddle-tierâ subnet with 254 addresses available (or 128 or 64 or 32 or 16) and you only have 4 middle-tier hosts, the rest of those addresses are unavailable for the remainder of your workloads.
If instead you have a mixed-use subnet with 4,094 addresses, you can squeeze every last IP for autoscaling groups and more. Thus it behooves you to make your subnets as large as possible. Doing so gives you the freedom to dynamically allocate from an enormous pool of addresses.
Generally speaking, there are three primary reasons to create a new subnet:
- You need different hosts to route in different ways (for example, internal-only vs. public-facing hosts)
- You are distributing your workload across multiple AZs to achieve fault-tolerance. Always, ALWAYS do this.
- You have a security requirement that mandates NACLs on a specific address space (for example, the one in which the database with your customersâ personally identifiable information resides)
Letâs look at each of these factors in turn.
Routing
All hosts within a VPC can route to all other hosts within a VPC. Period. The only real question is what packets can route into and out of the VPC.
In fact, you could easily have a VPC that doesnât allow packets to enter or leave at all. Just create a VPC without an Internet Gateway or Virtual Private Gateway. Youâve effectively black-holed it.
A VPC that canât serve any network traffic would be of dubious value, so letâs just assume that you have an app that youâre making available to the Internet. You add an Internet Gateway and assign some Elastic IP addresses to your hosts. Does this mean theyâre publicly accessible? No, it does not. You need to create a route table for whom the Internet Gateway is the default route. You then need to apply that table to one or more subnets. After that, all hosts within those subnets will inherit the routing table. Anything destined for an IP block outside the VPC will go through the Internet Gateway, thus giving your hosts the ability to respond to external traffic.
That said, almost no app wants all its hosts to be publicly accessible. In fact, good security dictates the principle of least privilege. So any host that doesnât absolutely need to be reachable directly from the outside world shouldnât be able to send traffic directly out the front door. These hosts will need a different route table from the ones above.
Subnets can have only one route table (though route tables can be applied to more than one subnet). If you want one set of hosts to route differently from another, you need to create a new subnet and apply a new route table to it.
Fault-Tolerance
AWS provides geographic distribution out of the box in the form of Availability Zones (AZs). Every region has at least two.
Subnets cannot span multiple AZs. So to achieve fault tolerance, you need to divide your address space among the AZs evenly and create subnets in each. The more AZs, the better: if you have three AZs available, split your address space into four parts and keep the fourth segment as spare capacity.
In case itâs not obvious, the reason you need to divide your address space up evenly is so the layout of each AZ is the same as the others. When you create resources like autoscaling groups, you want them to be evenly distributed. If you create disjointed address blocks, youâre creating a maintenance nightmare for yourself and you will regret it later.
Security
The first layer of defense in a VPC is the tight control you have over what packets can enter and leave.
Above the routing layer are two levels of complementary controls: Security Groups and NACLs. Security Groups are dynamic, stateful and capable of spanning the entire VPC. NACLs are stateless (meaning you need to define inbound and outbound ports), static and subnet-specific.
Generally, you only need both if you want to distribute change control authority over multiple groups of admins. For instance, you might want your sys admin team to control the security groups and your networking team to control the NACLâs. That way, no one party can single-handedly defeat your network restrictions.
In practice, NACLs should be used sparingly and, once created, left alone. Given that theyâre subnet-specific and punched down by IP addresses, the complexity of trying to manage traffic at this layer increases geometrically with each additional rule.
Security Groups are where the majority of work gets done. Unless you have a specific use-case like the ones described earlier, youâll be better served by keeping your security as simple and straightforward as possible. Thatâs what Security Groups do best.
An Example
The above was meant as a set of abstract guidelines. Iâd like to provide a concrete example to show how all this works together in practice.
The simplest way to lay out a VPC is to follow these steps:
- Evenly divide your address space across as many AZâs as possible.
- Determine the different kinds of routing youâll need and the relative number of hosts for each kind.
- Create identically-sized subnets in each AZ for each routing need. Give them the same route table.
- Leave yourself unallocated space in case you missed something. (Trust me on this one.)
So for our example, letâs create a standard n-tier app with web hosts that are addressable externally. Weâll use 10.0.0.0/16 as our address space.
The easiest way to lay out a VPCâs address space is to forget about IP ranges and think in terms of subnet masks.
For example, take the 10.0.0.0/16 address space above. Letâs assume you want to run across all three AZs available to you in us-westâ2 so your Mongo cluster can achieve a reliable quorum. Doing this by address ranges would be obnoxious. Instead, you can simply say âI need four blocksâone for each of the three AZs and one spare.â Since subnet masks are binary, every bit you add to the mask divides your space in two. So if you need four blocks, you need two more bits. Your 16-bit becomes four 18-bits.
10.0.0.0/16:
10.0.0.0/18âââAZ A
10.0.64.0/18âââAZ B
10.0.128.0/18âââAZ C
10.0.192.0/18âââSpare
Now within each AZ, you determine you want a public subnet, a private subnet and some spare capacity. Your publicly-accessible hosts will be far fewer in number than your internal-only ones, so you decide to give the public subnets half the space of the private ones. To create the separate address spaces, you just keep adding bits. To wit:
10.0.0.0/18âââAZ A
10.0.0.0/19âââPrivate
10.0.32.0/19
10.0.32.0/20âââPublic
10.0.48.0/20âââSpare
Later on, if you want to add a âProtectedâ subnet with NACLâs, you just subdivide your Spare space:
10.0.0.0/18âââAZ A
10.0.0.0/19âââPrivate
10.0.32.0/19
10.0.32.0/20âââPublic
10.0.48.0/20
10.0.48.0/21âââProtected
10.0.56.0/21âââSpare
Just make sure whatever you do in one AZ, you duplicate in all the others:
10.0.0.0/16:
10.0.0.0/18âââAZ A
10.0.0.0/19âââPrivate
10.0.32.0/19
10.0.32.0/20âââPublic
10.0.48.0/20
10.0.48.0/21âââProtected
10.0.56.0/21âââSpare
10.0.64.0/18âââAZ B
10.0.64.0/19âââPrivate
10.0.96.0/19
10.0.96.0/20âââPublic
10.0.112.0/20
10.0.112.0/21âââProtected
10.0.120.0/21âââSpare
10.0.128.0/18âââAZ C
10.0.128.0/19âââPrivate
10.0.160.0/19
10.0.160.0/20âââPublic
10.0.176.0/20
10.0.176.0/21âââProtected
10.0.184.0/21âââSpare
10.0.192.0/18âââSpare
Your routing tables would look like this:
âPublicâ
10.0.0.0/16âââLocal
0.0.0.0/0â ââ Internet GatewayâInternal-onlyâ (ie, Protected and Private)
10.0.0.0/16âââLocal
Create those two route-tables and then apply them to the correct subnets in each AZ. Youâre done.
And in case anyone on your team gets worried about running out of space, show them this table:
16-bit: 65534 addresses
18-bit: 16382 addresses
19-bit: 8190 addresses
20-bit: 4094 addresses
Obviously, youâre not going to need 4,000 IP addresses for your web servers. Thatâs not the point. The point is that this VPC has only those routing requirements. Thereâs no reason to create new subnets in this VPC that donât need to route differently within the same AZ.
Conclusion
Done properly, this method of planning goes a long way to ensuring you wonât get boxed in by an early decision. Everything that youâll get into from here â Security Groups, Auto Scaling , Elastic Load Balancing , Amazon Relational Database Service, AWS Direct Connect, and more â will fit neatly into this model.
Nathan McCourtney, Senior Consultant, AWS Professional Services.