When things aren’t working as expected, the best thing you can do is to take a packet capture (pcap) and look at what’s actually going on.
This is much faster than guessing at the source of the problem, as it can eradicate any false assumptions you have about the network.
It may seem intimidating to look at packet captures, but once you dive in, you’ll find how useful it can be.
Command Line Tools
The graphical interface of wireshark is great for looking at packet flows, sequence numbers, and graphing conversations or ACK timings.
But sometimes it is helpful to screen a capture before you spend the time looking at it manually. This is where the command line tools such as tshark and tcpdump come in.
Taking a pcap on the command line is fairly straightforward, and has been covered elsewhere in detail. So for the sake of this article, we are going to assume you already have your pcap(s) to analyze.
Listing the Conversations
If you aren’t sure what is in a pcap, listing the TCP conversations can be a good place to start. With tshark, we can generate a list of all TCP conversations, including details about the IP addresses, ports, and amount of data transferred.
tshark -nn -r myCapture.pcap -q -z conv,tcp > myTCPConversationList.txt
This file can be helpful for reference when narrowing the scope of the problem.
Narrowing the Scope
Before you get started, if you have a capture that has thousands of sessions between hundreds of hosts, you are going to have a hard time keeping track of what’s happening with all of those conversations.
So before you begin, identify a known problematic client or server (or if unknown, pick one from your conversation list). Then create a new pcap file by filtering the original. This filtered pcap should have only network traffic going to or from a single host, which makes debugging much more feasible.
A simple tcpdump command will do the trick:
tcpdump -r myCapture.pcap -w myNewFilteredCapture.pcap src host 192.168.0.5 or dst host 192.168.0.5
Now you will have a new capture file that contains only the traffic going to or from 192.168.0.5.
If your client has a lot of traffic, then this may still be insufficient. We can modify the previous command to give us just one single TCP conversation, instead of all traffic on that host. Take a look at the conversation list you generated earlier and choose a conversation you want to extract. Make note of the IP addresses and port numbers, and modify the command as follows:
tcpdump -r myCapture.pcap -w myNewFilteredCapture.pcap host 192.168.0.5 and host 192.168.0.100 and port 80 and port 4291
Notice we have now specified two hosts and two port numbers. If all of these items are present in the packet, it will be included in our filtered pcap.
Loss Stats
Next, you may want to identify problems in a session. If you are seeing a high volume of retransmissions or lost segments, for example, this can be indicative of a problem.
Here is a tshark command you can run on a pcap to generate loss statistics:
tshark -r myCapture.pcap -q -z io,stat,1,"COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission","COUNT(tcp.analysis.duplicate_ack) tcp.analysis.duplicate_ack","COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment","COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission" > LossStats.txt &
And here is what the output might look like (only the first few lines shown):
=================================================================================== | IO Statistics | | | | Duration: 252.695666 secs | | Interval: 1 secs | | | | Col 1: COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission | | 2: COUNT(tcp.analysis.duplicate_ack) tcp.analysis.duplicate_ack | | 3: COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment | | 4: COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission | |---------------------------------------------------------------------------------| | |1 |2 |3 |4 | | | Interval | COUNT | COUNT | COUNT | COUNT | | |--------------------------------------------| | | 0 <> 1 | 0 | 102 | 0 | 0 | | | 1 <> 2 | 0 | 0 | 0 | 0 | | | 2 <> 3 | 0 | 0 | 0 | 0 | | | 3 <> 4 | 0 | 0 | 0 | 0 | | | 4 <> 5 | 0 | 0 | 0 | 0 | | | 5 <> 6 | 0 | 1009 | 0 | 0 | | | 6 <> 7 | 1 | 446 | 0 | 1 | | | 7 <> 8 | 1 | 95 | 0 | 1 | | | 8 <> 9 | 0 | 0 | 0 | 0 | | | 9 <> 10 | 0 | 0 | 0 | 0 | |
As you can see, there is clearly a problem. Over 1000 duplicate ACKs in a second means this setup requires further investigation.
Next steps would be:
– Open the pcap in Wireshark
– Trace the conversation to identify the source of the duplicate ACKs
– Try to identify any patterns
– Brainstorm any scenarios that would trigger duplicate ACKs, and filter out those that could apply to the current network
– If you recently made a change to the network, do a sanity check by undoing the change and taking another capture to see if the problem disappears
– Continue debugging/investigating until the problem has been pinpointed, etc.
More Reading
At this point, if you are having trouble understanding what the individual commands mean, take some time to read the man pages. You can read about tshark and all of its flags, or read about tcpdump. Just remember that occasionally, the flags vary per implementation depending on what platform you are running. So always double check the command on your preferred OS before doing any scripting or other automation.