Analyzing Packet Captures with Python (2024)

Why would you use Python to read a pcap?

For most situations involving analysis of packet captures, Wireshark is the tool of choice. And for good reason too - Wireshark provides an excellent GUI that not only displays the contents of individual packets, but also analysis and statistics tools that allow you to, for example, track individual TCP conversations within a pcap, and pull up related metrics.

There are situations, however, where the ability to process a pcap programmatically becomes extremely useful. Consider:

  • given a pcap that contains hundreds of thousands of packets, find the first connection to a particular server/service where the TCP SYN-ACK took more than 300ms to appear after the initial SYN

  • in a pcap that captures thousands of TCP connections between a client and several servers, find the connections that were prematurely terminated because of a RST sent by the client; at that point in time, determine how many other connections were in progress between that client and other servers

  • you are given two pcaps, one gathered on a SPAN port on an access switch, and another on an application server a few L3 hops away. At some point the application server sporadically becomes slow (retransmits on both sides, TCP windows shrinking etc.). Prove that it is (or is not) because of the network.

  • repeat the above exercises several times a week (or several times a day) with different sets of packet captures

In all these cases, it is immensely helpful to write a custom program to parse the pcaps and yield the data points you are looking for.

It is important to realize that we are not precluding the use of Wireshark; for example, after your program locates the proverbial needle(s) in the haystack, you can use that information (say a packet number or a timestamp) in Wireshark to look at a specific point inside the pcap and gain more insight.

So, this is the topic of this blog post: how to go about programmatically processing packet capture (pcap) files.

What programming language?

I will be using Python (3). Why Python? Apart from the well-known benefits of Python (open-source, relatively gentle learning curve, ubiquity, abundance of modules and so forth), it is also the case that Network Engineers are gaining expertise in this language and are using it in other areas of their work (device management and monitoring, workflow applications etc.).

What modules?

I will be using scapy, plus a few other modules that are not specific to packet processing or networking (argparse, pickle, pandas).

Note that there are other alternative Python modules that can be used to read and parse pcap files, like pyshark and pycapfile. Pyshark in particular is interesting because it simply leverages the underlying tshark installed on the system to do its work, so if you are in a situation where you need to leverage tshark’s powerful protocol decoding ability, pyshark is the way to go. In this blog however I am restricting myself to regular Ethernet/IPv4/TCP packets, and I can just use scapy.

The code

A few notes before we start

The code below was written and executed on Linux (Linux Mint 18.3 64-bit), but the code is OS-agnostic; it should work as well in other environments, with little or no modification.

In this post I use an example pcap file captured on my computer.

Step 1: Program skeleton

Build a skeleton for the program. This will also serve to check if your Python installation is OK.

Use the argparse module to get the pcap file name from the command line. If your argparse knowledge needs a little brushing up, you can look at my argparse recipe book, or at any other of the dozens of tutorials on the web.

import argparseimport osimport sysdef process_pcap(file_name): print('Opening {}...'.format(file_name))if __name__ == '__main__': parser = argparse.ArgumentParser(description='PCAP reader') parser.add_argument('--pcap', metavar='<pcap file name>', help='pcap file to parse', required=True) args = parser.parse_args() file_name = args.pcap if not os.path.isfile(file_name): print('"{}" does not exist'.format(file_name), file=sys.stderr) sys.exit(-1) process_pcap(file_name) sys.exit(0)

Run:

vnetman@vnetman-mint:> python3 pcap-s.py --pcap example-01.pcapOpening example-01.pcap...vnetman@vnetman-mint:> 

Step 2: Basic pcap handling

Open the pcap and count how many packets it contains.

from scapy.utils import RawPcapReaderdef process_pcap(file_name): print('Opening {}...'.format(file_name)) count = 0 for (pkt_data, pkt_metadata,) in RawPcapReader(file_name): count += 1 print('{} contains {} packets'.format(file_name, count))

Run:

vnetman@vnetman-mint:> python3 pcap-s.py --pcap example-01.pcapOpening example-01.pcap...example-01.pcap contains 22639 packetsvnetman@vnetman-mint:> 

The RawPcapReader class is provided by the scapy module. This class is iterable, and in each iteration it yields the data (i.e. packet contents) and metadata (i.e. timestamp, packet number etc.) for every packet in the capture.

At this point you may want to open the pcap in Wireshark and verify if the packet count our program reports is consistent with that reported by Wireshark.

Step 3: Filter non IPv4/TCP packets

Use scapy methods to filter out uninteresting packets. For starters, let us consider all IPv4/TCP packets as interesting.

from scapy.utils import RawPcapReaderfrom scapy.layers.l2 import Etherfrom scapy.layers.inet import IP, TCPdef process_pcap(file_name): print('Opening {}...'.format(file_name)) count = 0 interesting_packet_count = 0 for (pkt_data, pkt_metadata,) in RawPcapReader(file_name): count += 1 ether_pkt = Ether(pkt_data) if 'type' not in ether_pkt.fields: # LLC frames will have 'len' instead of 'type'. # We disregard those continue if ether_pkt.type != 0x0800: # disregard non-IPv4 packets continue ip_pkt = ether_pkt[IP] if ip_pkt.proto != 6: # Ignore non-TCP packet continue interesting_packet_count += 1 print('{} contains {} packets ({} interesting)'. format(file_name, count, interesting_packet_count))

Run:

vnetman@vnetman-mint:> python3 pcap-s.py --pcap example-01.pcapOpening example-01.pcap...example-01.pcap contains 22639 packets (22639 interesting)vnetman@vnetman-mint:> 

Note the use of scapy’s Ether class in the code above, and note how we use ether_pkt.fields and ether_pkt.type to extract information from the ethernet header of the packet. Also note the use of ether_pkt[IP] to obtain the IPv4 header.

It so happens that the example pcap we used was captured by tshark with a capture filter that selected all IPv4/TCP packets, which is why all 22639 packets are reported as interesting. We’ll fix that in the next iteration of the code.

Step 4: Identify interesting connection packets

The packet capture contains, among several connections, one HTTP connection between client 192.168.1.137:57080 and server 152.19.134.43:80. For the rest of this discussion let’s only consider this connection as interesting and filter out other packets.

Note that the code below hardcodes these addresses; you may instead consider gathering this information from the command-line with argparse.

def process_pcap(file_name): print('Opening {}...'.format(file_name)) client = '192.168.1.137:57080' server = '152.19.134.43:80' (client_ip, client_port) = client.split(':') (server_ip, server_port) = server.split(':') count = 0 interesting_packet_count = 0 for (pkt_data, pkt_metadata,) in RawPcapReader(file_name): count += 1 ether_pkt = Ether(pkt_data) if 'type' not in ether_pkt.fields: # LLC frames will have 'len' instead of 'type'. # We disregard those continue if ether_pkt.type != 0x0800: # disregard non-IPv4 packets continue ip_pkt = ether_pkt[IP] if ip_pkt.proto != 6: # Ignore non-TCP packet continue if (ip_pkt.src != server_ip) and (ip_pkt.src != client_ip): # Uninteresting source IP address continue if (ip_pkt.dst != server_ip) and (ip_pkt.dst != client_ip): # Uninteresting destination IP address continue tcp_pkt = ip_pkt[TCP] if (tcp_pkt.sport != int(server_port)) and \ (tcp_pkt.sport != int(client_port)): # Uninteresting source TCP port continue if (tcp_pkt.dport != int(server_port)) and \ (tcp_pkt.dport != int(client_port)): # Uninteresting destination TCP port continue interesting_packet_count += 1 print('{} contains {} packets ({} interesting)'. format(file_name, count, interesting_packet_count))

Run:

vnetman@vnetman-mint:> python3 pcap-s.py --pcap example-01.pcapOpening example-01.pcap...example-01.pcap contains 22639 packets (14975 interesting)vnetman@vnetman-mint:> 

Step 5: Packet metadata

In this code iteration, we’ll access the packet’s metadata; in particular the timestamps and ordinal numbers (i.e. packet number within the packet capture) of the first and the last packets of the connection that we’re interested in.

def process_pcap(file_name): print('Opening {}...'.format(file_name)) client = '192.168.1.137:57080' server = '152.19.134.43:80' (client_ip, client_port) = client.split(':') (server_ip, server_port) = server.split(':') count = 0 interesting_packet_count = 0 for (pkt_data, pkt_metadata,) in RawPcapReader(file_name): count += 1 ether_pkt = Ether(pkt_data) if 'type' not in ether_pkt.fields: # LLC frames will have 'len' instead of 'type'. # We disregard those continue if ether_pkt.type != 0x0800: # disregard non-IPv4 packets continue ip_pkt = ether_pkt[IP] if ip_pkt.proto != 6: # Ignore non-TCP packet continue if (ip_pkt.src != server_ip) and (ip_pkt.src != client_ip): # Uninteresting source IP address continue if (ip_pkt.dst != server_ip) and (ip_pkt.dst != client_ip): # Uninteresting destination IP address continue tcp_pkt = ip_pkt[TCP] if (tcp_pkt.sport != int(server_port)) and \ (tcp_pkt.sport != int(client_port)): # Uninteresting source TCP port continue if (tcp_pkt.dport != int(server_port)) and \ (tcp_pkt.dport != int(client_port)): # Uninteresting destination TCP port continue interesting_packet_count += 1 if interesting_packet_count == 1: first_pkt_timestamp = (pkt_metadata.tshigh << 32) | pkt_metadata.tslow first_pkt_timestamp_resolution = pkt_metadata.tsresol first_pkt_ordinal = count last_pkt_timestamp = (pkt_metadata.tshigh << 32) | pkt_metadata.tslow last_pkt_timestamp_resolution = pkt_metadata.tsresol last_pkt_ordinal = count #--- print('{} contains {} packets ({} interesting)'. format(file_name, count, interesting_packet_count)) print('First packet in connection: Packet #{} {}'. format(first_pkt_ordinal, printable_timestamp(first_pkt_timestamp, first_pkt_timestamp_resolution))) print(' Last packet in connection: Packet #{} {}'. format(last_pkt_ordinal, printable_timestamp(last_pkt_timestamp, last_pkt_timestamp_resolution)))#---

The printable_timestamp function is defined like this:

import timedef printable_timestamp(ts, resol): ts_sec = ts // resol ts_subsec = ts % resol ts_sec_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(ts_sec)) return '{}.{}'.format(ts_sec_str, ts_subsec)

Run:

vnetman@vnetman-mint:> python3 pcap-s.py --pcap example-01.pcapOpening example-01.pcap...example-01.pcap contains 22639 packets (14975 interesting)First packet in connection: Packet #2585 2018-09-26 21:21:02.883718124 Last packet in connection: Packet #22582 2018-09-26 21:22:04.324012912vnetman@vnetman-mint:> 

A few notes on the timestamp:

The pkt_metadata returned by this call:

for (pkt_data, pkt_metadata,) in RawPcapReader(file_name):

contains a 64-bit timestamp that is documented here. Essentially it is split into two 32-bit fields (tshigh and tslow), and represents the Unix time at which the packet was captured.

The tsresol field in metadata stores the resolution as either 1000000 (microsecond resolution) or 1000000000 (nanosecond resolution), based on the capability of the hardware/software that created the pcap. The field within the file as documented here is a 1-byte field, but the scapy RawPcapReader code processes this 1-byte field and provides the tsresol metadata as either 1000000 or 1000000000.

Step 6: Relative timestamps, relative sequence numbers, TCP flags

The output of our program at the end of this step looks like this:

-->[ 2585] 0.000000s flag=S seq=0 ack=0 len=0 <--[ 2586] 0.307193s flag=SA seq=0 ack=1 len=0 -->[ 2587] 0.307242s flag=A seq=1 ack=1 len=0 -->[ 2588] 0.307359s flag=PA seq=1 ack=1 len=174 <--[ 2589] 0.620760s flag=A seq=1 ack=175 len=0 <--[ 2590] 0.620798s flag=A seq=1 ack=175 len=2880 -->[ 2591] 0.620823s flag=A seq=175 ack=2881 len=0 <--[ 2592] 0.620843s flag=A seq=2881 ack=175 len=1440 ......-->[22576] 61.145739s flag=A seq=175 ack=52313761 len=0 <--[22577] 61.145751s flag=A seq=52313761 ack=175 len=1440 <--[22578] 61.147645s flag=PA seq=52315201 ack=175 len=13483 -->[22579] 61.147676s flag=A seq=175 ack=52328684 len=0 -->[22580] 61.148632s flag=FA seq=175 ack=52328684 len=0 <--[22581] 61.440260s flag=FA seq=52328684 ack=176 len=0 -->[22582] 61.440295s flag=A seq=176 ack=52328685 len=0 example-01.pcap contains 22639 packets (14975 interesting)First packet in connection: Packet #2585 2018-09-26 21:21:02.883718124 Last packet in connection: Packet #22582 2018-09-26 21:22:04.324012912vnetman@vnetman-mint:> 
  • Lines beginning with --> are packets sent from client to the server, and lines with <-- are packets from the server to the client
  • The numbers in square brackets e.g. [ 2588] are the packet ordinals in the capture file. This is handy if you want to examine a particular packet in detail in Wireshark.
  • The timestamps e.g. 0.307359s are relative to the timestamp of the first packet of the connection
  • The TCP flags, relative sequence number, relative acknowledgement number and TCP payload lengths are printed next
  • Note, for example, the 3-way connection establishment handshake at packet numbers 2585, 2586 and 2587. Also note that the SYN-ACK came in about 307ms after the original SYN, and the ACK that followed was recorded less than 1ms after that; this is explained by the fact that the capture was taken on the client host, and the server was across the public internet.

Code:

class PktDirection(Enum): not_defined = 0 client_to_server = 1 server_to_client = 2def process_pcap(file_name): print('Opening {}...'.format(file_name)) client = '192.168.1.137:57080' server = '152.19.134.43:80' (client_ip, client_port) = client.split(':') (server_ip, server_port) = server.split(':') count = 0 interesting_packet_count = 0 server_sequence_offset = None client_sequence_offset = None for (pkt_data, pkt_metadata,) in RawPcapReader(file_name): count += 1 ether_pkt = Ether(pkt_data) if 'type' not in ether_pkt.fields: # LLC frames will have 'len' instead of 'type'. # We disregard those continue if ether_pkt.type != 0x0800: # disregard non-IPv4 packets continue ip_pkt = ether_pkt[IP] if ip_pkt.proto != 6: # Ignore non-TCP packet continue tcp_pkt = ip_pkt[TCP] direction = PktDirection.not_defined if ip_pkt.src == client_ip: if tcp_pkt.sport != int(client_port): continue if ip_pkt.dst != server_ip: continue if tcp_pkt.dport != int(server_port): continue direction = PktDirection.client_to_server elif ip_pkt.src == server_ip: if tcp_pkt.sport != int(server_port): continue if ip_pkt.dst != client_ip: continue if tcp_pkt.dport != int(client_port): continue direction = PktDirection.server_to_client else: continue interesting_packet_count += 1 if interesting_packet_count == 1: first_pkt_timestamp = (pkt_metadata.tshigh << 32) | pkt_metadata.tslow first_pkt_timestamp_resolution = pkt_metadata.tsresol first_pkt_ordinal = count last_pkt_timestamp = (pkt_metadata.tshigh << 32) | pkt_metadata.tslow last_pkt_timestamp_resolution = pkt_metadata.tsresol last_pkt_ordinal = count this_pkt_relative_timestamp = last_pkt_timestamp - first_pkt_timestamp if direction == PktDirection.client_to_server: if client_sequence_offset is None: client_sequence_offset = tcp_pkt.seq relative_offset_seq = tcp_pkt.seq - client_sequence_offset else: assert direction == PktDirection.server_to_client if server_sequence_offset is None: server_sequence_offset = tcp_pkt.seq relative_offset_seq = tcp_pkt.seq - server_sequence_offset # If this TCP packet has the Ack bit set, then it must carry an ack # number. if 'A' not in str(tcp_pkt.flags): relative_offset_ack = 0 else: if direction == PktDirection.client_to_server: relative_offset_ack = tcp_pkt.ack - server_sequence_offset else: relative_offset_ack = tcp_pkt.ack - client_sequence_offset # Determine the TCP payload length. IP fragmentation will mess up this # logic, so first check that this is an unfragmented packet if (ip_pkt.flags == 'MF') or (ip_pkt.frag != 0): print('No support for fragmented IP packets') break tcp_payload_len = ip_pkt.len - (ip_pkt.ihl * 4) - (tcp_pkt.dataofs * 4) # Print fmt = '[{ordnl:>5}]{ts:>10.6f}s flag={flag:<3s} seq={seq:<9d} \ ack={ack:<9d} len={len:<6d}' if direction == PktDirection.client_to_server: fmt = '{arrow}' + fmt arr = '-->' else: fmt = '{arrow:>69}' + fmt arr = '<--' print(fmt.format(arrow = arr, ordnl = last_pkt_ordinal, ts = this_pkt_relative_timestamp / pkt_metadata.tsresol, flag = str(tcp_pkt.flags), seq = relative_offset_seq, ack = relative_offset_ack, len = tcp_payload_len)) #--- print('{} contains {} packets ({} interesting)'. format(file_name, count, interesting_packet_count)) print('First packet in connection: Packet #{} {}'. format(first_pkt_ordinal, printable_timestamp(first_pkt_timestamp, first_pkt_timestamp_resolution))) print(' Last packet in connection: Packet #{} {}'. format(last_pkt_ordinal, printable_timestamp(last_pkt_timestamp, last_pkt_timestamp_resolution)))#---

Step 7: Pickling

If you’ve been executing the program in the previous steps, you will have noticed one thing: it is excruciatingly slow. This is because of scapy - for every one of the thousands of packets read from the capture file, our code builds scapy objects which, as it turns out, is an expensive and slow process.

This is a serious issue because you are, after all, developing code: each time you run the program and examine its output, you will want to write more code to tweak something, or to gain some different insight. Each time you make a small change to the code and run it, you will have to deal with its sluggishness which can get frustrating and impede progress.

The most obvious way to deal with this problem is to not use scapy at all, and instead find an alternate faster method to look at the capture packet data and metadata.

In this post, though, I will use a different approach:

  1. use scapy (as in the above examples) to extract interesting packet data and metadata from the capture file
  2. store the extracted data in a separate “custom” file on disk
  3. subsequently, use the extracted data from the “custom” file for analysis, display, gaining insight etc.

The Python 3 pickle module provides a generic mechanism to save (“pickle”) a bunch of Python data structures to a file on disk, and to read the file and restore (“unpickle”) the saved data structures. This is what we will be using.

The program now has two ‘modes’ of operation:

  • pcap-s.py pickle --pcap example-01.pcap --out example-01.dat - this runs steps 1 and 2
  • pcap-s.py analyze --in example-01.dat - this runs step 3

Why is this better? Because you only have to run the pickle step (steps 1 and 2) once. The analyze step (step 3) - the part which you have to run repetitively after tweaking the code each time - is very fast because it does not use scapy any more.

The code below implements

  • a pickle_pcap function to read the given .pcap file and pickle the interesting data into a file
  • an analyze_pickle function to read the pickled data and print the same information as we did in Step 6; except, of course, that the data is now coming from the pickle file.

The argparse code to parse the command line is not shown below; please look at my argparse recipe book if you need help with using the argparse module.

The pickle step runs like this:

vnetman@vnetman-mint:> python3 ./pcap-s.py pickle --pcap example-01.pcap --out example-01.pickleProcessing example-01.pcap...example-01.pcap contains 22639 packets (14975 interesting)First packet in connection: Packet #2585 2018-09-26 21:21:02.883718124 Last packet in connection: Packet #22582 2018-09-26 21:22:04.324012912Writing pickle file example-01.pickle...done.vnetman@vnetman-mint:> ls -go --sort=timetotal 3844-rw-rw-r-- 1 801200 Oct 8 11:36 example-01.picklelrwxrwxrwx 1 36 Oct 8 11:34 example-01.pcap -> ../github/pcap-files/example-01.pcap-rwxrwxr-x 1 9199 Oct 8 08:54 pcap-s.py-rw-rw-r-- 1 684660 Sep 25 15:47 http_espn.pcapng-rwxrwxr-x 1 3581 Sep 13 18:40 pcap.py-rw------- 1 2425516 Sep 10 09:02 01.pcapvnetman@vnetman-mint:> 

The run time for this step is the same as for the previous steps. This is because we have continued to use scapy to build packets from the pcap file and read their fields.

Note the ~800KB file that was created by the program. This example-01.pickle file is then used in the analyze step:

vnetman@vnetman-mint:> python3 ./pcap-s.py analyze --in example-01.pickle##################################################################TCP session between client 192.168.1.137:57080 and server 152.19.134.43:80##################################################################-->[ 2585] 0.000000s S seq=0 ack=0 len=0 win=3737600 <--[ 2586] 0.307193s SA seq=0 ack=1 len=0 win=1853440 -->[ 2587] 0.307242s A seq=1 ack=1 len=0 win=29312 -->[ 2588] 0.307359s PA seq=1 ack=1 len=174 win=29312 <--[ 2589] 0.620760s A seq=1 ack=175 len=0 win=15616 <--[ 2590] 0.620798s A seq=1 ack=175 len=2880 win=15616 -->[ 2591] 0.620823s A seq=175 ack=2881 len=0 win=35072 <--[ 2592] 0.620843s A seq=2881 ack=175 len=1440 win=15616 -->[ 2593] 0.620849s A seq=175 ack=4321 len=0 win=37888 <--[ 2594] 0.620870s A seq=4321 ack=175 len=5760 win=15616 .........-->[22574] 61.145550s A seq=175 ack=52302241 len=0 win=1573504 <--[22575] 61.145725s A seq=52302241 ack=175 len=11520 win=15616 -->[22576] 61.145739s A seq=175 ack=52313761 len=0 win=1573504 <--[22577] 61.145751s A seq=52313761 ack=175 len=1440 win=15616 <--[22578] 61.147645s PA seq=52315201 ack=175 len=13483 win=15616 -->[22579] 61.147676s A seq=175 ack=52328684 len=0 win=1573504 -->[22580] 61.148632s FA seq=175 ack=52328684 len=0 win=1573504 <--[22581] 61.440260s FA seq=52328684 ack=176 len=0 win=15616 -->[22582] 61.440295s A seq=176 ack=52328685 len=0 win=1573504 vnetman@vnetman-mint:> 

This display is the same as in the previous step, but it appears very fast. This is because we are no longer using scapy; instead we are reading packet fields from the pickle file.

Note also that in this step we are printing the TCP window sizes as well. We first read the scale factor from the “Window Scale Factor” TCP options in the initial SYN and SYN-ACK packets, and then, for every packet, we compute the window size and store it in the pickle file.

The code for the pickle step:

def pickle_pcap(pcap_file_in, pickle_file_out): print('Processing {}...'.format(pcap_file_in)) client = '192.168.1.137:57080' server = '152.19.134.43:80' (client_ip, client_port) = client.split(':') (server_ip, server_port) = server.split(':') count = 0 interesting_packet_count = 0 server_sequence_offset = None client_sequence_offset = None # List of interesting packets, will finally be pickled. # Each element of the list is a dictionary that contains fields of interest # from the packet. packets_for_analysis = [] client_recv_window_scale = 0 server_recv_window_scale = 0 for (pkt_data, pkt_metadata,) in RawPcapReader(pcap_file_in): count += 1 ether_pkt = Ether(pkt_data) if 'type' not in ether_pkt.fields: # LLC frames will have 'len' instead of 'type'. # We disregard those continue if ether_pkt.type != 0x0800: # disregard non-IPv4 packets continue ip_pkt = ether_pkt[IP] if ip_pkt.proto != 6: # Ignore non-TCP packet continue tcp_pkt = ip_pkt[TCP] direction = PktDirection.not_defined if ip_pkt.src == client_ip: if tcp_pkt.sport != int(client_port): continue if ip_pkt.dst != server_ip: continue if tcp_pkt.dport != int(server_port): continue direction = PktDirection.client_to_server elif ip_pkt.src == server_ip: if tcp_pkt.sport != int(server_port): continue if ip_pkt.dst != client_ip: continue if tcp_pkt.dport != int(client_port): continue direction = PktDirection.server_to_client else: continue interesting_packet_count += 1 if interesting_packet_count == 1: first_pkt_timestamp = (pkt_metadata.tshigh << 32) | pkt_metadata.tslow first_pkt_timestamp_resolution = pkt_metadata.tsresol first_pkt_ordinal = count last_pkt_timestamp = (pkt_metadata.tshigh << 32) | pkt_metadata.tslow last_pkt_timestamp_resolution = pkt_metadata.tsresol last_pkt_ordinal = count this_pkt_relative_timestamp = last_pkt_timestamp - first_pkt_timestamp if direction == PktDirection.client_to_server: if client_sequence_offset is None: client_sequence_offset = tcp_pkt.seq relative_offset_seq = tcp_pkt.seq - client_sequence_offset else: assert direction == PktDirection.server_to_client if server_sequence_offset is None: server_sequence_offset = tcp_pkt.seq relative_offset_seq = tcp_pkt.seq - server_sequence_offset # If this TCP packet has the Ack bit set, then it must carry an ack # number. if 'A' not in str(tcp_pkt.flags): relative_offset_ack = 0 else: if direction == PktDirection.client_to_server: relative_offset_ack = tcp_pkt.ack - server_sequence_offset else: relative_offset_ack = tcp_pkt.ack - client_sequence_offset # Determine the TCP payload length. IP fragmentation will mess up this # logic, so first check that this is an unfragmented packet if (ip_pkt.flags == 'MF') or (ip_pkt.frag != 0): print('No support for fragmented IP packets') return False tcp_payload_len = ip_pkt.len - (ip_pkt.ihl * 4) - (tcp_pkt.dataofs * 4) # Look for the 'Window Scale' TCP option if this is a SYN or SYN-ACK # packet. if 'S' in str(tcp_pkt.flags): for (opt_name, opt_value,) in tcp_pkt.options: if opt_name == 'WScale': if direction == PktDirection.client_to_server: client_recv_window_scale = opt_value else: server_recv_window_scale = opt_value break # Create a dictionary and populate it with data that we'll need in the # analysis phase. pkt_data = {} pkt_data['direction'] = direction pkt_data['ordinal'] = last_pkt_ordinal pkt_data['relative_timestamp'] = this_pkt_relative_timestamp / \ pkt_metadata.tsresol pkt_data['tcp_flags'] = str(tcp_pkt.flags) pkt_data['seqno'] = relative_offset_seq pkt_data['ackno'] = relative_offset_ack pkt_data['tcp_payload_len'] = tcp_payload_len if direction == PktDirection.client_to_server: pkt_data['window'] = tcp_pkt.window << client_recv_window_scale else: pkt_data['window'] = tcp_pkt.window << server_recv_window_scale packets_for_analysis.append(pkt_data) #--- print('{} contains {} packets ({} interesting)'. format(pcap_file_in, count, interesting_packet_count)) print('First packet in connection: Packet #{} {}'. format(first_pkt_ordinal, printable_timestamp(first_pkt_timestamp, first_pkt_timestamp_resolution))) print(' Last packet in connection: Packet #{} {}'. format(last_pkt_ordinal, printable_timestamp(last_pkt_timestamp, last_pkt_timestamp_resolution))) print('Writing pickle file {}...'.format(pickle_file_out), end='') with open(pickle_file_out, 'wb') as pickle_fd: pickle.dump(client, pickle_fd) pickle.dump(server, pickle_fd) pickle.dump(packets_for_analysis, pickle_fd) print('done.') #---

In other words, for every packet we read from the pcap, we build a Python dictionary that contains the values of the packet attributes we are interested in (‘direction’, ‘ordinal’, ‘relative_timestamp’, ‘tcp_flags’, ‘seqno’, ‘ackno’, ‘tcp_payload_len’ and ‘window’). Each dictionary is then appended to a list (packets_for_analysis). Once all packets are processed, the list is “pickled” and stored in the file.

The code for the analyze step:

def analyze_pickle(pickle_file_in): packets_for_analysis = [] with open(pickle_file_in, 'rb') as pickle_fd: client_ip_addr_port = pickle.load(pickle_fd) server_ip_addr_port = pickle.load(pickle_fd) packets_for_analysis = pickle.load(pickle_fd) # Print a header print('##################################################################') print('TCP session between client {} and server {}'. format(client_ip_addr_port, server_ip_addr_port)) print('##################################################################') # Print format string fmt = ('[{ordnl:>5}]{ts:>10.6f}s {flag:<3s} seq={seq:<8d} ' 'ack={ack:<8d} len={len:<6d} win={win:<9d}') for pkt_data in packets_for_analysis: direction = pkt_data['direction'] if direction == PktDirection.client_to_server: print('{}'.format('-->'), end='') else: print('{:>60}'.format('<--'), end='') print(fmt.format(ordnl = pkt_data['ordinal'], ts = pkt_data['relative_timestamp'], flag = pkt_data['tcp_flags'], seq = pkt_data['seqno'], ack = pkt_data['ackno'], len = pkt_data['tcp_payload_len'], win = pkt_data['window']))

Step 8: Plotting the client window size

The goal in this iteration of the code is to generate a graphical plot of the TCP Receive window on the Client. The end result is a graph that looks like this:

Analyzing Packet Captures with Python (1)

The code for generating the plot shown above, using pandas and matplotlib, is almost ridiculously easy (I am only showing the analyze_pickle function):

import pandas as pdimport matplotlibimport matplotlib.pyplot as pltdef analyze_pickle(pickle_file_in): packets_for_analysis = [] with open(pickle_file_in, 'rb') as pickle_fd: client_ip_addr_port = pickle.load(pickle_fd) server_ip_addr_port = pickle.load(pickle_fd) packets_for_analysis = pickle.load(pickle_fd) # Plot the receive window size on the client side. client_pkts = [] for pkt_data in packets_for_analysis: if pkt_data['direction'] == PktDirection.server_to_client: continue # Don't include the SYN packet if 'S' in pkt_data['tcp_flags']: continue client_pkts.append({'Time': pkt_data['relative_timestamp'], 'Client window size': pkt_data['window']}) df = pd.DataFrame(data=client_pkts) df.plot(x='Time', y='Client window size', color='r') plt.show() plt.close()

You will notice from the graph that the window size shows a sudden dip to some value between 400000 and 500000 shortly after timestamp 21.1. If you find this suspicious, you can again write more code to help you narrow down the exact packet number in the capture:

def analyze_pickle(pickle_file_in): packets_for_analysis = [] with open(pickle_file_in, 'rb') as pickle_fd: client_ip_addr_port = pickle.load(pickle_fd) server_ip_addr_port = pickle.load(pickle_fd) packets_for_analysis = pickle.load(pickle_fd) for pkt_data in packets_for_analysis: if pkt_data['direction'] == PktDirection.server_to_client: continue # Don't include the SYN packet if 'S' in pkt_data['tcp_flags']: continue if pkt_data['relative_timestamp'] < 21.1: continue if pkt_data['window'] < 500000: print('Packet ordinal {} has a suspicious TCP window size ({})'. format(pkt_data['ordinal'], pkt_data['window']))

Run:

vnetman@vnetman-mint:> python3 ./pcap-s.py analyze --in example-01.pickle Packet ordinal 9539 has a suspicious TCP window size (444672)vnetman@vnetman-mint:> 

Armed with this data, you can now open the capture file in Wireshark and take a closer look at what happened shortly before packet #9539.

Here’s a fancier plot, where I am plotting two parameters - the client window size, and the ack number sent by the client (i.e. the number of bytes received thus far from the server):

Analyzing Packet Captures with Python (2)

And the analyze_pickle code for the above looks like this:

def analyze_pickle(pickle_file_in): packets_for_analysis = [] with open(pickle_file_in, 'rb') as pickle_fd: client_ip_addr_port = pickle.load(pickle_fd) server_ip_addr_port = pickle.load(pickle_fd) packets_for_analysis = pickle.load(pickle_fd) # Plot the receive window size and ack number on the client side. client_pkts = [] for pkt_data in packets_for_analysis: if pkt_data['direction'] == PktDirection.server_to_client: continue # Don't include the SYN packet if 'S' in pkt_data['tcp_flags']: continue client_pkts.append({'Time': pkt_data['relative_timestamp'], 'Client window size': pkt_data['window'], 'Client Ack No' : pkt_data['ackno']}) df = pd.DataFrame(data=client_pkts) fig, ax1 = plt.subplots() ax2 = ax1.twinx() df.plot(x='Time', y='Client window size', color='r', ax=ax1) df.plot(x='Time', y='Client Ack No', color='b', ax=ax2) ax1.tick_params('y', colors='r') ax2.tick_params('y', colors='b') plt.show() plt.close()

Summary

With Python code, you can iterate over the packets in a pcap, extract relevant data, and process that data in ways that make sense to you. You can use code to go over the pcap and locate a specific sequence of packets (i.e. locate the needle in the haystack) for later analysis in a GUI tool like Wireshark. Or you can create customized graphical plots that can help you visualize the packet information. Further, since this is all code, you can do this repeatedly with multiple pcaps.

Analyzing Packet Captures with Python (2024)
Top Articles
Latest Posts
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 5847

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.