Troubleshooting Network Connectivity On AWS

AWS Data Traffic Management

As a cloud computing platform, Amazon Web Services (AWS) enables regional, public and private networks to connect, monitor, secure, and authenticate without stretching thin on resources. It deals with transit gateways, private links, and design patterns that, in return, are subjugated by subnets, IP ranges, instances, ports, routing paths, and security groups. Users can allocate their AWS resources exclusively within an Amazon Virtual Private Cloud and then peer with other VPCs to expand and communicate within the same AWS Region.

Internal networks are essential for tapping into the AWS resources available on the AWS Cloud. AWS Transit Gateway, AWS Managed, and AWS Direct Connect are a few VPC connectivity options that strengthen internal networks and their connection with other AWS Regions and AWS Cloud.

Considering the density of AWS Regions, there are bound to be some recurring issues in network connectivity, disrupting the optimal performance of your Amazon Virtual Private Cloud. In this article, we have addressed the common network connectivity issues on AWS.

Network Connectivity Issues With AWS Direct Connect

AWS Direct Connect establishes private connections between Amazon Virtual Private Clouds in the same AWS Region and AWS and internal networks. With AWS Direct Connect, one can predict the optimal performance of a connection, cut down on bandwidth cost, and expect support for BGP peering. Users employ multiple AWS Direct Connect connections to maintain high availability between internal networks and the AWS cloud.  

AWS Direct Connect connectivity issues are of three subtexts:

  1. Physical
  2. Data Link
  3. Network
  4. Routing

1. Troubleshooting AWS Direct Connect Physical Issue

This issue deals with the physical aspect of a network connection on AWS and should be resolved as follows:

  • Communicate With Your Network Provider: Ask your network provider about the status of your cross-connection and request the notice in writing. Check if all the ports listed in the completion notice align with LOA-CFA.
  • Cross-Check Router Connections: Ensure all routers involved in the network are powered on and
  • Investigate Optical Signal Readings: Check if the routers are receiving optimal signals via an appropriate transceiver. If not, try rolling the Tx/Rx fiber strands. The Tx/Rx optical reading can be judged using the Amazon CloudWatch. Each port should receive 10-Gbps, and auto-negotiation should be turned off for connections operating over 1 Gbps. Additionally, you can ask your provider for Tx/Rx optical readings for your cross connection.

Note: Amazon CloudWatch helps monitor packet loss, latency, tracepath, ping delay, tcptraceroute , and MTR, in a network, enabled by AWSSupport-SetupIPMonitoringFromVPC. The latter logs the network metrics, allowing users to identify problems between AWS resources and VPCs over internet gateways and network address translation (NAT) gateway.  

More details on Amazon CloudWatch can be accessed here.

Once the listed requirements are met and Tx/Rx optical reading issues are resolved, you should be able to establish a dedicated network connection.

2. Troubleshooting AWS Direct Connect Data Link Issue

When your virtual interface cannot ping an Amazon peer IP despite a physical connection working optimally, the problem lies in the virtual interface. Learn how to get your virtual interface up and running here:

  • Configure IP Address: Ensure that your peer IP address is configured with the right VLAN and in its subinterface.
  • Enable VLAN Trunking: All devices between endpoints must have VLAN trunking enabled for their VLAN tags. For an address resolution protocol (ARP) to be developed in AWS, your network VLAN should send tagged traffic to AWS.

If nothing works, try cleaning the ARP table cache on your provider’s side. Ideally, you should be able to establish a Border Gateway Protocol (BGP), but if you can’t, continue reading.

3. Troubleshooting AWS Direct Connect Network Issue

The problem might reside in the BGP session if your virtual interface is down but can still ping the Amazon peer IP. Do the following to resolve the issue:

  • Configure ASN on Both Sides: Update and configure BGP and Amazon’s Autonomous System Number (ASN) as per the protocols. Then, configure the peer IPs of all BGPs.
  • Configure MD5 Authentication Key: Ensure the MD5 Authentication Key aligns with the configuration file installed in the router.
  • Maintain the Prefix Limit: The prefix limit for a virtual public interface is 1,000 and 100 for a private interface. Ask your network provider to lower the advertised prefix if it exceeds the mentioned limit.
  • Enable BGP Required TCP Ports: A BGP requires a substantial accommodation on TCP ports (preferably 179 and above) to establish a TCP connection. Ensure that all firewalls and ACL rules involved are accepting BGP traffic.

Note: You can easily find misconfigurations in security groups, network access control list (NACL), and route table by running an automation document in an AWS Region where your resources are located. You will need specific Identity and Access Management permissions to run the document. Permissions are usually visible in Amazon EC2 instances. Read more about running automation and IAM permissions here.

Your VPC should be able to ally with the AWS cloud after a successful BGP session.

4. Trouble Shooting AWS Direct Connect Routing Issues

You’ve done everything from mending a data link to establishing a BGP session; the VPC traffic is still not reaching the AWS cloud. In this case, even the working Amazon peer IPs won’t do you any good.

  • Advertising Network Prefixes: All private and public virtual interfaces must be advertised over the BGP session under their respective network prefixes. See “Troubleshooting AWS Direct Connect Network Issue” for the limit value.
  • Correction in ACL Rules & Security Groups: Each VPC has its own security groups and network ACL rules.Ensure both entities enable inbound and outbound traffic for private and public lines.

Network ACLs and Security Groups are imperative to establish a strong and uninterrupted connection between Amazon VPC and AWS Cloud.

Takeaways:

Network connectivity issues on AWS have diverse origins as they can be caused by anything from a wrong configuration to overstimulation of network prefixes. You must know your way around Amazon CloudWatch as most network connectivity issues are logged there. It makes troubleshooting network connectivity on AWS a lot easier by allowing you to sift through its logs.

Further blogs within this AWS Data Traffic Management category.