Direct Connect handed my arse to me in the test, I can tell you for sure. I thought I understood it just enough for the test – but boy was I wrong!
In this post we talk about Direct Connect and VPNs. The other posts in this series are:
- IPSec by default. GRE and DMVPN tunnels via EC2 instances (software from the marketplace or elsewhere)
- VPN only possible to resources inside a VPC. VPC is the only thing the VPN hooks into.
- ONE VPN PER VPC.
- Need to remember only one VPC can be hooked up with the Virtual Pvt GW (VGW).
- The VGW cannot have the BGP ASN changed ever. Need to delete, recreate.
- When the VPN is created, there are two VGW endpoints created (show up as 1 in the AWS console). These operate in AA mode, in different availability zones but in the same region.
- The VGW uses AES 256, SHA2 and a number of DH groups. Essential the other side of the VPN also supports these protocols for the connection to successfully establish. Some older protocols also work.
- VGW runs BGP over TCP port 179 and supports AS and MED (not sure what MED is).
- Route propagation allows the VPC’s routing table to get routes that the VGW learned, but this setting must be enabled.(apparently important test topic).
- 100 routes max possible per VPC routing table. Hard limit. Use route summarization otherwise.
- CloudHub – is a method of hooking up a VPC’s VGW with multiple branch sites (or customers). Only thing is to ensure the ASNs are distinct on the customer gateway side of things. Routes must also be unique. The VGW will then re-advertise the BGP routes for a site to every other site. Hub and spoke model.
- Billing – the charge only happens when data flows from the VGW to the endpoint. Two charges – one for using the VPN, the other for data transfer.
- Critical to remember the VPN tunnel lights up only when the customer gateway initiates a connection. The VGW (the AWS side of the tunnel) never wakes up the tunnel. The connection is on demand, there must be a continuous 10s-interval handshake. After 3 unsuccessful Hello acks, the tunnel’s down (this is expected).
- VPN tunnel health monitored using CloudWatch metrics, not logs. 0 is down. 1 is good.
- The VGW is not the only device to terminate the VPN connection on the AWS side. An EC2 instance with VPN software running on it can also terminate the connection if so desired by org policies.
- Reasons to use an EC2 instance:
- advanced threat protection.
- transitive routing.
- connecting networks with overlapping CIDR blocks.
- 4 possible problems with EC2 instance running VPN software:
- the instance becomes unhealthy (in which AWS will recover with auto-recovery).
- the OS has an issue.
- the VPN software has an issue.
- there’s an error in the config (in which case the initial connection may not light up at all).
- Best to have 2x EC2 instances in active/standby mode for HA.
- Have to be active/standby because there can only be one next hop.
- Only way around the above is to have a nonstop running script that monitors the health the of the instances. It’ll detect a problem and failover by changing the routing table entry to point to the previously-standby instance).
- The size of the chosen EC2 instance will dictate the bandwidth available to the VPN connection. Increasing the size of the instance is possible with downtime. If the instance has a standby partner, then the secondary can be upgraded first, failed over, primary upgraded, failed back – without downtime.
- Horizontal scaling is probably better allows to attach one EC2 instance running VPN software per subnet.
- A non-AWS device can be used for the termination of VPN on the customer’s side, however AWS requires the device to be a L3 device, not L2.
- BGP only (or static), no other dynamic routing protocol is allowed.
- With BGP, there’s the concept of path pre-pending. This allows for setting preferred routes to certain destinations. Can be used for automatic failover for routing between sites.
- Important to ensure the firewall on the customer’s side has the ports open for IPsec traffic to flow.
- The thing to use if 10 Gbps high throughout sorta link required from the premises to the VPC.
- The connection isn’t encrypted, whereas the VPN is (with IPsec).
- If encryption is still required, the VPN connection can be setup over Direct Connect. A VIF will be required over Direct Connect to access resources in AWS. Running this over an internet connection is one such use case.
- 1 Gbps or 10 Gbps (I reckon is 3 Gbps is required, then 3 x 1Gbps will be required. Happy to be corrected).
- In traffic is free. Out’s charged by the GB by AWS.
- Must support 802.1q VLANs + BGP + BGP MD5 authentication + single mode fiber.
- Two ways to setup the Direct Connect connection, one’s with a member of the AWS Partner Network (APN) or with a hosted connection (these guys will then help with a connection to the on-premises devices).
- Allows for the use of LAGs for bandwidth aggregation. All links must have 1 Gbps or 10 Gbps, tops 4 connections, same termination device in the same AWS location.
- Gotta be careful with the minimum number of connections setting. If the number of failed links goes below this number, then the link will be taken down.
Private and public VIFs
- Private: for talking to instances within VPC via their private IPs
- Public: for working with public AWS services (or with VPC services with public IP addresses, eg S3, SQS, DynamoDB).
- Cannot have the same VLAN for different VIFs (this makes sense).
- Sort of surprising – Direct Connect max MTU size is 1522. I’m guessing this is to do with the increased complexity of running and supporting jumbo frames end-to-end.
- Care must be taken with the routes published – more specific routes (propagated or otherwise) will take precedence over say a default route to the internet via the defined internet gateway for any subnet in a VPC.
- Max 100 routes advertised possible. Otherwise summarize with a default route (not necessarily 0.0.0.0/0).
- These things called the Direct Connect Gateways allow for the hooking up of a Direct Connection to multiple VPCs located in any region (not China) using VGWs that already exist in those multiple VPCs.
- Gotta note while LAGs provide bandwidth aggregation, they terminate at the same AWS device, so if the device goes down/maintenance, the connection’s down. Best to have a Direct Connection with two distinct AWS Direct Connect devices or have a VPN available (or if tolerated, a new one) as backup.
- Direct Connect isn’t encrypted, TLS only way forward for L4 encryption or run a VPN over Direct Connect for L3.
- Direct Connect gateways allow for a connection to be extended to other regions, use a public interface to do so.
Combining VPN and Direct Connect
- VPN as backup, Direct Connect routes always prioritized for any given prefix.
- The VGW will aggregate the routes.
- Only way to make the VPN primary is to have a more specific route with it, then Direct Connect will be the backup.
- Bidirectional forwarding detection is a must when configuring multiple Direct Connections or a Direct Connect with a VPN backup. This allows for down links to be detected and traffic starts to go down the other/backup path for a smooth switch-over (I assume there’s some downtime when this switch happens).
Important to remember path selection
- Local routes > Closest route match > Static routes > Dynamic routes first (direct connected ones) > shorter AS ones > VPN static routes
- Direct Connect is the preferred path when a VPN connection is also configured. (Need to find out what would be preferred if there are multiple Direct Connections and VPN)
This is it about Direct Connect and VPN, CloudFront is up next in Part 3.