My Little TCP Tunnel

Do you ever wonder what happens when you try to use some driver or client library to access a networked service? Like send queries to a database using a client driver or use a client library provided by some PaaS provider such as AWS, and get back a cryptic error message (or nothing) and not knowing what really happened? Or how much data is it really generating (bandwidth is usually not unlimited and free..)?

Anyway, I do. Most of the times when I try to look up what happens at the protocol level to debug any issues, I end up on some answers where people say to use Fiddler, Wireshark, or some other overly complex solution. Not to mention all the issues with Fiddler requiring a .NET install (Linux/OSX/Mono and all the problems, no thanks) and being overly complex for my simple needs, or Wireshark having trouble capturing localhost traffic, or me having problems finding the filters etc for my stuff.

I did find a nice TCP tunnel software on Sourceforge, but it has been abandoned long ago and source code is gone. As most small but useful open source projects seem to go, people grow up and figure out there is more to life than writing free software for others and still needing to pay your own bills (which is hard with “free software”). So anyway, after looking around too much and having to debug these issues too many times, I finally tried to write my own. It is found at http://www.github.com/mukatee/java-tcp-tunnel.

How does it work? Instead of sending my network requests directly to the service, I start up my TCP tunnel, and configure it to forward everything received at given port to another TCP address. And data received in response is of course written back to the party who initiated the connection. So it is practically a man-in-the-middle attack on myself. And then we write the data in the request to the console or to a file to get the debugging information we want.

So to give a few examples (some also listed on the project page at https://github.com/mukatee/java-tcp-tunnel:

Start the tunnel, listening for connections on port 5566 and forwarding any requests on that port to port 80 at http://www.github.com.

java -jar tcptunnel-1.0.0.jar 5566 www.github.com 80

To request the contents of http://www.github.com (port 80) using curl and get the full HTTP request-response pair logged, we can do:

curl localhost:5566 --header 'Host: www.github.com'

We are now sending the HTTP request to port 5566 on localhost, where the TCP tunnel captures it, logs it, and forwards it to port 80 on http://www.github.com. The tunnel then waits for any response to this from the remote end (here the server at http://www.github.com). When a response is received, the tunnel logs it and forwards the data as is back to the curl client that sent the request to port 5566. From the client (curl) viewpoint nothing changed, so the client is not modified in any way. Except that we had to explicitly specify the HTTP host header in this case with the parameter given.

The output from running the above commands is:

TCP Forwarding 0:0:0:0:0:0:0:1:62020 <--> 192.30.252.128:80
up:
GET / HTTP/1.1
Host: www.github.com
User-Agent: curl/7.43.0
Accept: */*

down:
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://www.github.com/
Connection: close

TCP Forwarding 0:0:0:0:0:0:0:1:62020 <--> 192.30.252.128:80 stopped.

This is the printout from the TCP tunnel. I started it with the console logger so this is what gets logged (see the github project page for more on configuration).

The “up” part refers to the data sent to upstream (in this case towards http://www.github.com). This shows all the actual headers and all of the HTTP protocol that was really generated by curl.

The “down” part refers to the response data sent back from the server at http://www.github.com. In this case this would be a redirect to the HTTPS version of the site. And no, I have not implemented support for SSL, certificates, etc. Because I have not needed it so far in debugging the REST etc. services I use in development. If I need it, I will, If you implement it, I am happy to get contributions.

So for something different, example number 2. Now I want to send some commands (over HTTP) to an ElasticSearch instance in my local development environment:

I have an index (ElasticSearch term for a database) named ‘my_index1’ created at the ElasticSearch instance running at 192.168.1.100. If I run the standard HTTP DELETE request on it using curl I get only a brief acknowledgement:

curl -XDELETE 'http://192.168.1.100:9200/my_index1/'

gives back

{"acknowledged":true}

What really happened?

I start the TCP tunnel as before:

java -jar tcptunnel-1.0.0.jar 5566 192.168.1.100 9200

Then I send the same command over the tunnel:

curl -XDELETE 'http://localhost:5566/my_index1/'

The tunnel output is similar to before:

TCP Forwarding 0:0:0:0:0:0:0:1:62646 <--> 192.168.1.100:9200
up:
DELETE /my_index1/ HTTP/1.1
Host: localhost:5566
User-Agent: curl/7.43.0
Accept: */*

down:
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 21

{"acknowledged":true}
TCP Forwarding 0:0:0:0:0:0:0:1:62646 <--> 192.168.1.100:9200 stopped.

In the console where I ran curl, I get the same response as without the tunnel:

{"acknowledged":true}

As we can see here, curl just prints the response body, with the tunnel we see all the data that was passed. Not really super-exciting as this is pretty basic HTTP getting passed. But at least we do not need to wonder if that was really what happened, as we see it directly in the tunnel log. For some other types of communications this can be much more interesting.

For example, using the ElasticSearch Java library we can run commands and queries using the ElasticSearch Java API. In this case it is much less clear what really gets sent over the protocol. I find it especially annoying to try to debug this as the whole ElasticSearch API is well documented in terms of HTTP requests that are sent but rather poorly documented in terms of the language specific API’s. So capturing this is the same as before (and works for any programming language as the tunnel is just TCP). Some examples:

This time I start the tunnel programmatically:

public static void main(String[] args) {
  Params params = new Params(5566, "192.168.1.100", 9300);
  params.enableStringConsoleLogger();
  Main main = new Main(params);
  main.start();
}

I then use the ElasticSearch Java API to perform commands/queries on it. First to index a few documents:

    Client client = TransportClient.builder().build()
                     .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 5566));
    Map String, Object< map = new HashMap<>();
    map.put("message", "hello");
    client.prepareIndex("my_index1", "msg_doc", "1").setSource(map).get();

    map.put("message", "world");
    client.prepareIndex("my_index1", "msg_doc", "2").setSource(map).get();

Then to perform a boolean query:

    Client client = TransportClient.builder().build()
            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 5566));
    BoolQueryBuilder qb = QueryBuilders.boolQuery();
    SearchResponse scrollResp = client.prepareSearch()
            .setIndices(“my_index1”)
            .setTypes(“msg_doc”)
            .setSearchType(SearchType.DFS_QUERY_AND_FETCH)
            .setScroll(new TimeValue(60000))
            .setQuery(qb)
            .setSize(10).execute().actionGet();

Note that above, I have configured the ElasticSearch server to be at “localhost:5566”, which is not true. This is where the tunnel is listening. The tunnel captures the connections made to it and forwards the data to the actual ElasticSearch on 192.168.1.100:9300 address.

Unfortunately, these will only print “gibberish” to the console when the tunnel logs it. Well, it is not entirely gibberish as the data can be seen in between the binary characters. But it is not the HTTP request data I was looking for. This is because ElasticSearch is written in Java and the official clients for Java make use of the same codebase and the more optimized internal binary protocol. For debugging any other platform ES client this works as they use the HTTP protocol. However, we can achieve the same on Java side by using an “unofficial” client such as JEST from https://github.com/searchbox-io/Jest.

To try this:

First, change the tunnel to connect to port 9200, which is where ElasticSearch is listening for the HTTP requests:

  public static void main(String[] args) {
    Params params = new Params(5566, "192.168.1.100", 9200);
    params.enableStringConsoleLogger();
    Main main = new Main(params);
    main.start();
  }

Then run some commands using JEST over the tunnel:

  public static void main(String[] args) throws Exception {
    // Construct a new Jest client according to configuration via factory
    JestClientFactory factory = new JestClientFactory();
    factory.setHttpClientConfig(new HttpClientConfig
            .Builder("http://localhost:5566")
            .multiThreaded(true)
            .build());
    JestClient client = factory.getObject();
    client.execute(new CreateIndex.Builder("my_index1").build());
  }

The log from the tunnel is:

TCP Forwarding 127.0.0.1:64562 <--> 192.168.1.100:9200
up:
PUT /my_index1 HTTP/1.1
Content-Length: 2
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{}
down:
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 21

{"acknowledged":true}
TCP Forwarding 127.0.0.1:64562 <--> 192.168.1.100:9200 stopped.

Which, as we see is now using the HTTP API. To index a few docs:

    Map<String, String> source = new HashMap<>();
    source.put("msg", "hello");
    Index index = new Index.Builder(source).index("my_index1").type("msg_doc").build();
    client.execute(index);

    source.put("msg", "world");
    index = new Index.Builder(source).index("my_index1").type("msg_doc").build();
    client.execute(index);

And the log again:

TCP Forwarding 127.0.0.1:64701 <--> 192.168.1.100:9200
up:
POST /my_index1/msg_doc HTTP/1.1
Content-Length: 15
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{"msg":"hello"}

down:
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 145

{"_index":"my_index1","_type":"msg_doc","_id":"AVHgELHqpIrLtHSr6-gg","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

up:
POST /my_index1/msg_doc HTTP/1.1
Content-Length: 15
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{"msg":"world"}

down:
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 145

{"_index":"my_index1","_type":"msg_doc","_id":"AVHgELIMpIrLtHSr6-gh","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}

TCP Forwarding 127.0.0.1:64701 <--> 192.168.1.100:9200 stopped.

Then to do some queries. First I did a “broken” query in a hurry. Here I set the doc type into the index list as well (there is no index named “msg_doc”, hence the error):

    BoolQueryBuilder qb = QueryBuilders.boolQuery();
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(qb);

    Search search = new Search.Builder(searchSourceBuilder.toString())
            // multiple index or types can be added.
            .addIndex("my_index1")
            .addIndex("msg_doc")
            .build();
    SearchResult result = client.execute(search);

We can see the error in the HTTP response below. Running the code above actually passes with no error, exception or whatever. Perfectly fine, I am sure, as probably we need to check the result object properties to get the error. Nothing wrong with that. But if we want to see what ElasticSearch actually returned, we can. The log from the tunnel:

TCP Forwarding 127.0.0.1:64916 <--> 192.168.1.100:9200
up:
POST /my_index1%2Cmsg_doc/_search HTTP/1.1
Content-Length: 38
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{
  "query" : {
    "bool" : { }
  }
}

down:
HTTP/1.1 404 Not Found
es.resource.type: index_or_alias
es.resource.id: msg_doc
es.index: msg_doc
Content-Type: application/json; charset=UTF-8
Content-Length: 311

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"msg_doc","index":"msg_doc"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"msg_doc","index":"msg_doc"},"status":404}

TCP Forwarding 127.0.0.1:64916 <--> 192.168.1.100:9200 stopped.

Here, the error is nicely visible. Too many libraries do a poor job of reporting the actual details etc. so I have found this to be very helpful.

Fix the error and re-run the query:

    BoolQueryBuilder qb = QueryBuilders.boolQuery();
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(qb);

    Search search = new Search.Builder(searchSourceBuilder.toString())
            // multiple index or types can be added.
            .addIndex("my_index1")
            .build();
    SearchResult result = client.execute(search);

Now we get the actual results:

TCP Forwarding 127.0.0.1:64839 <--> 192.168.1.100:9200
up:
POST /my_index1/_search HTTP/1.1
Content-Length: 38
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{
  "query" : {
    "bool" : { }
  }
}

down:
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 339

{"took":25,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"my_index1","_type":"msg_doc","_id":"AVHgELIMpIrLtHSr6-gh","_score":1.0,"_source":{"msg":"world"}},{"_index":"my_index1","_type":"msg_doc","_id":"AVHgELHqpIrLtHSr6-gg","_score":1.0,"_source":{"msg":"hello"}}]}}

TCP Forwarding 127.0.0.1:64839 <--> 192.168.1.100:9200 stopped.

Notice that so far the upstream “query” has been pretty empty. Not so informative as I wanted to use this to see how the queries from the Java API map to the better documented HTTP requests. Notice that I am using the actual Query objects from the ElasticSearch API here with JEST, so the information should map directly also to the “official” ElasticSearch API. So, give the query some content to get more information:

    BoolQueryBuilder qb = QueryBuilders.boolQuery();
    qb.must(QueryBuilders.termQuery("msg_doc”, "hello"));
    qb.should(QueryBuilders.termQuery("msg_doc”, "world"));
    qb.mustNot(QueryBuilders.termQuery("msg_doc”, "bob"));
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(qb);

    Search search = new Search.Builder(searchSourceBuilder.toString())
            // multiple index or types can be added.
            .addIndex("my_index1")
            .build();
    SearchResult result = client.execute(search);

And now we can look at the TCP tunnel log to see how this part of the Java API actually maps to the better documented HTTP API:

TCP Forwarding 127.0.0.1:65189 <--> 192.168.1.100:9200
up:
POST /my_index1/_search HTTP/1.1
Content-Length: 300
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{
  "query" : {
    "bool" : {
      "must" : {
        "term" : {
          "msg_doc" : "hello"
        }
      },
      "must_not" : {
        "term" : {
          "msg_doc" : "bob"
        }
      },
      "should" : {
        "term" : {
          "msg_doc" : "world"
        }
      }
    }
  }
}

down:
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 122

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

TCP Forwarding 127.0.0.1:65189 <-->; 192.168.1.100:9200 stopped.

Nice. But wait, something went wrong with that. What? Well, I put the doc type into the term field name so nothing can be matched. Which we can also see from the log above.

Fix this and try again:

    BoolQueryBuilder qb = QueryBuilders.boolQuery();
    qb.must(QueryBuilders.termQuery("msg", "hello"));
    qb.should(QueryBuilders.termQuery("msg", "world"));
    qb.mustNot(QueryBuilders.termQuery("msg", "bob"));
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(qb);

    Search search = new Search.Builder(searchSourceBuilder.toString())
      .addIndex("my_index1")
      .build();
    SearchResult result = client.execute(search);

And the tunnel log:

TCP Forwarding 127.0.0.1:65120 <--> 192.168.1.100:9200
up:
POST /my_index1/_search HTTP/1.1
Content-Length: 288
Content-Type: application/json; charset=UTF-8
Host: localhost:5566
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_60)
Accept-Encoding: gzip,deflate

{
  "query" : {
    "bool" : {
      "must" : {
        "term" : {
          "msg" : "hello"
        }
      },
      "must_not" : {
        "term" : {
          "msg" : "bob"
        }
      },
      "should" : {
        "term" : {
          "msg" : "world"
        }
      }
    }
  }
}

down:
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 243

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.04500804,"hits":[{"_index":"my_index1","_type":"msg_doc","_id":"AVHgELHqpIrLtHSr6-gg","_score":0.04500804,"_source":{"msg":"hello"}}]}}

TCP Forwarding 127.0.0.1:65120 <--> 192.168.1.100:9200 stopped.

Now we got the results we were looking for. Whooppee.

As a final example, the tunnel can also be used for Java unit/integration testing. There is an example on the project website (https://github.com/mukatee/java-tcp-tunnel). To illustrate it with the ElasticSearch example above:

  @Test
  public void sendRequestMITM() throws Exception {
    //assume we have ElasticSearch in the URL/port as before set up for testing
    //configure the tunnel to accept connections on port 5566 and forward them to 192.168.2.100:9200
    Params params = new Params(5566, “192.168.1.100”, 9200);
    //we want to use the captured data in testing, so enable logging the tunnel data in memory with buffer size 8092 bytes
    params.enableInMemoryLogging(8092);
    //this gives us programmatic access to the data passed from client connected to port 5566 -> 192.168.1.100:9200 (test client to ElasticSearch)
    InMemoryLogger upLogger = params.getUpMemoryLogger();
    //this gives us access to the data passed from 192.168.2.100:9200 -> client connected to port 5598 (ElasticSearch to test client)
    InMemoryLogger downLogger = params.getDownMemoryLogger();
    //this is how we actually start the tunnel
    Main main = new Main(params);
    main.start();

    //perform the previously defined query over the tunnel
    JestClientFactory factory = new JestClientFactory();
    factory.setHttpClientConfig(new HttpClientConfig
            .Builder("http://localhost:5566")
            .multiThreaded(true)
            .build());
    JestClient client = factory.getObject();

    BoolQueryBuilder qb = QueryBuilders.boolQuery();
    qb.must(QueryBuilders.termQuery("msg", "hello"));
    qb.should(QueryBuilders.termQuery("msg", "world"));
    qb.mustNot(QueryBuilders.termQuery("msg", "bob"));
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(qb);

    Search search = new Search.Builder(searchSourceBuilder.toString())
            // multiple index or types can be added.
            .addIndex("my_index1")
            .build();
    SearchResult result = client.execute(search);

    //we could get any attributes from “result” here and assert them but i focus on the tunnel
    //assert the HTTP protocol data passed through the tunnel both ways
    assertTcpStream(upLogger, "expected_up1.txt");
    assertTcpStream(downLogger, "expected_down1.txt");
    //You might need to wait a second or so for the requests to pass, in which case a sleep could be in order or a more advanced observer
    Thread.sleep(1000);
}

  private void assertTcpStream(InMemoryLogger logger, String filename) throws Exception {
    //here we get the actual data that was passed through the tunnel in one direction (depending if we get passed the upstream memorylogger or downstream)
    String actual = logger.getString("UTF8");
    //the rest of this is just making sure the test should run the same over different platforms and with varying date-times in HTTP headers
    actual = TestUtils.unifyLineSeparators(actual, "\n");
    String expected = TestUtils.getResource(CaptureTests.class, filename);

    String[] replaced = TestUtils.replace("##", expected, actual);
    actual = replaced[0];
    expected = replaced[1];

    expected = TestUtils.unifyLineSeparators(expected, "\n");
    assertEquals(actual, expected, "Request full content");
  }

I have found this to be helpful in documenting my understanding of how the protocol actually works, for assuring I know if the protocol changes when an update is made, and so on. For example, in another project (https://github.com/mukatee/kafka-consumer) I implemented parsers for the InfluxDB line protocol to capture data from a Kafka stream sent by their Telegraf tool. If the line protocol used by Influx/Telegraf changes, I want to catch it right away (happened once already..). Or we could do this type of test for our own REST interfaces so that we catch all places where it is used and if some refactoring or some other changes produce unwanted side-effects we catch it immediately at the protocol level (instead of wondering why some call is not giving correct response).

So that was the ElasticSearch example. I have also done debugging with the tunnel on binary protocols such as Avro and ProtocolBuffers over Kafka. Those are slighly different in not producing readable data on the console due to being binary protocols. However, the tunnel has options to write the bytes to console or to file. From the file these would then require parsing the message content using the Avro/ProtoBuf/Thrift schemas etc. Have not needed to go there yet, so have not done that. Also might require splitting the messages to parts to separate the actual message from the binary stream for this type of protocol. This can be done, for example, by extending the tunnel code with a specific Observer type.

So some potential improvements would be to have SSL support, GZIP support and maybe specific support for some binary protocols. If I ever need them, maybe I will do it.. Until then.

And you can also go to github and search for “tcp tunnel”, which gives a long list of slightly similar projects..

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s