Analyzing raw TCP streams

An obvious way to capture HTTP requests and responses is to dump them with a network sniffer. This only works for cleartext connections (without TLS encryption), but on the other hand, you don’t need to change your clients or servers.

HTTPolice can parse HTTP/1.x streams from the ground up. HTTP/2 is not yet supported.

You may be familiar with tcpdump, but it won’t work: HTTPolice needs the raw TCP streams—just the data sent or received. There are two Unix tools to dump TCP streams: tcpick and tcpflow. Unfortunately, both sometimes produce incorrect files, so this may not be 100% reliable.

tcpick

I have had more success with tcpick. Here’s how it can be used:

$ mkdir dump

$ cd dump/

$ sudo tcpick -wR 'port 80'
Starting tcpick 0.2.1 at 2016-04-13 05:11 MSK
Timeout for connections is 600
tcpick: listening on wlp4s0
setting filter: "port 80"

tcpick starts capturing all connections to or from TCP port 80. For example, you can launch a Web browser and go to an ‘http:’ site. Once you are done, exit the browser, then stop tcpick with Ctrl+C. (It is important that connections are closed before tcpick shuts down, otherwise they may be incomplete.)

Now you have one or more pairs of files in this directory:

$ ls
tcpick_172.16.0.102_185.72.247.137_http.clnt.dat
tcpick_172.16.0.102_185.72.247.137_http.serv.dat

Then you tell HTTPolice to use the tcpick input format:

$ httpolice -i tcpick .

tcpflow

Very similar to tcpick:

$ mkdir dump

$ cd dump/

$ sudo tcpflow -T'%t-%#-%A-%B' port 80
tcpflow: listening on wlp4s0
^Ctcpflow: terminating

$ ls
1460513796-0-172.016.000.102-185.072.247.137  alerts.txt
1460513796-0-185.072.247.137-172.016.000.102  report.xml

$ httpolice -i tcpflow .

The cryptic -T option is necessary to get the right filenames.

Other sniffers

If you use some other tool to capture the TCP streams, use the streams input format to pass pairs of files:

$ httpolice -i streams requests1.dat responses1.dat requests2.dat ...

Or req-stream if you only have request streams:

$ httpolice -i req-stream requests1.dat requests2.dat ...

Or resp-stream if you only have response streams (not recommended):

$ httpolice -i resp-stream responses1.dat responses2.dat ...

Note that resp-stream may not work at all if any of the requests are HEAD, because responses to HEAD are parsed differently.

Combined format

Sometimes you want to compose an HTTP exchange by hand, to test something. To make this easier, there’s a special input format that combines the request and response streams into one file:

The lines at the beginning are ignored.
You can use them for comments.

======== BEGIN INBOUND STREAM ========
GET / HTTP/1.1
Host: example.com
User-Agent: demo

======== BEGIN OUTBOUND STREAM ========
HTTP/1.1 200 OK
Date: Thu, 31 Dec 2015 18:26:56 GMT
Content-Type: text/plain
Connection: close

Hello world!

It must be saved with CRLF (Windows) line endings.

Also, for this format, the filename suffix (extension) is important. If it is .https, the request URI is assumed to have an https: scheme. If it is .noscheme, the scheme is unknown. Otherwise, the http: scheme is assumed.

Now, tell HTTPolice to use the combined format:

$ httpolice -i combined exchange1.txt

More examples can be found in HTTPolice’s test suite.