Harharhar said the Santa when capturing browser test metrics with Webdriver and LittleMobProxy

In the modern age of big data and all that, it is trendy to capture as much data we can. So this is an attempt at capturing data on a set of web browsing tests I run on Selenium WebDriver. This is with Java using Selenium Webdriver and LittleMobProxy.

What I do here is configure an instance of LittleMobProxy to capture the traffic to/from a webserver in our tests. The captured data is written to a HAR file (HTTP archive file), the HAR file is parsed and the contents are used for whatever you like. In this case I dump them to InfluxDB and show some graphs on the generated sessions using Grafana.

This can be useful to see how much bandwidth your website uses, how many requests end up being generated, what elements are slowest to load, how your server caching configurations affect everything, and so on. I have used it to provide data for overall network performance analysis by simulating a set of browsers and capturing the data on their sessions.

First up, start the LittleMobProxy instance, create a WebDriver instance, and configure the WebDriver instance to use the LittleMobProxy instance:

    // start the proxy
    BrowserMobProxy proxy = new BrowserMobProxyServer();
    // get the Selenium proxy object
    Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);

    // configure it as a desired capability
    DesiredCapabilities capabilities = new DesiredCapabilities();
    capabilities.setCapability(CapabilityType.PROXY, seleniumProxy);

    driver = new ChromeDriver(capabilities);

Then we run some scripts on our website. The following is just a simple example as I do not wish to put here bots for browsing common website. I have prototyped some on browsing various videos on YouTube and on browsing different news portals. Generally this might be against the terms of service on public websites, so either use your own service you are testing (with a test service instance) or download a Wikipedia dump or something similar to run your tests on. Example code:

  public List<WebElement> listArticles() {
    List<WebElement> elements = driver.findElements(By.className("news"));
    List<WebElement> invisibles = new ArrayList<>();
    for (WebElement element : elements) {
      if (!element.isDisplayed()) {
        System.out.println("Not displayed:"+element);
    List<WebElement> articles = new ArrayList<>();
    List<String> descs = new ArrayList<>();
    for (WebElement element : elements) {
      List<WebElement> links = element.findElements(By.tagName("a"));
      for (WebElement link : links) {
        //we only take long links as this website has some "features" in the content portal causing pointless short links.
        //This also removes the "share" buttons for facebook, twitter, etc. which we do not want to hit
        //A better alternative might be to avoid links leading out of the domain 
        //(if you can figure them out..)
        //this is likely true for the ads as well..
        if (link.getText().length() > 20) {
    return articles;

  public void openRandomArticle() throws Exception {
    List<WebElement> articles = listArticles();
    //sometimes our randomized user might hit a seemingly dead end on the article tree, 
    //in which case we just to the news portal main page ("base" here)
    if (articles.size() == 0) {
      articles = listArticles();
    //this is a random choice of the previously filtered article list
    WebElement link = TestUtils.oneOf(articles);
    Actions actions = new Actions(driver);

    Har har = proxy.getHar();
    har.writeTo(new File(“my_website.har”));
    //if we just want to print it, we can do this..
    //or to drop stats in a database, do something like this

The code to access the data in the HAR file:

  public static void printHar(Har har) {
    HarLog log = har.getLog();
    List<HarPage> pages = log.getPages();
    for (HarPage page : pages) {
      String id = page.getId();
      String title = page.getTitle();
      Date started = page.getStartedDateTime();
      System.out.println("page: id=" + id + ", title=" + title + ", started=" + started);
    List<HarEntry> entries = log.getEntries();
    for (HarEntry entry : entries) {
      String pageref = entry.getPageref();
      long time = entry.getTime();

      HarRequest request = entry.getRequest();
      long requestBodySize = request.getBodySize();
      long requestHeadersSize = request.getHeadersSize();
      String url = request.getUrl();

      HarResponse response = entry.getResponse();
      long responseBodySize = response.getBodySize();
      long responseHeadersSize = response.getHeadersSize();
      int status = response.getStatus();

      System.out.println("entry: pageref=" + pageref + ", time=" + time + ", reqBS=" + requestBodySize + ", reqHS=" + requestHeadersSize +
              ", resBS=" + responseBodySize + ", resHS=" + responseHeadersSize + ", status=" + status + ", url="+url);

So we can use this in many ways. Above I have just printed out some of the basic stats. Some example information available can be found on the internet, e.g. https://confluence.atlassian.com/display/KB/Generating+HAR+files+and+Analysing+Web+Requests. In the following I show some simple data from browsing a local newssite, visualized in Grafana using InfluxDB as a backend:

Here is some example code to write some of the HAR stats to InfluxDB:

  public static void influxHar(Har har) {
    HarLog harLog = har.getLog();
    List<HarPage> pages = harLog.getPages();
    for (HarPage page : pages) {
      String id = page.getId();
      String title = page.getTitle();
      Date started = page.getStartedDateTime();
      System.out.println("page: id=" + id + ", title=" + title + ", started=" + started);
    List<HarEntry> entries = harLog.getEntries();
    long now = System.currentTimeMillis();
    int counter = 0;
    for (int i = index ; i < entries.size() ; i++) {
      HarEntry entry = entries.get(i);
      String pageref = entry.getPageref();
      long loadTime = entry.getTime();

      HarRequest request = entry.getRequest();
      if (request == null) {
        log.debug("Null request, skipping HAR entry");
      HarResponse response = entry.getResponse();
      if (response == null) {
        log.debug("Null response, skipping HAR entry");

      Map<String, Long> data = new HashMap<>();
      data.put("loadtime", loadTime);
      data.put("req_head", request.getHeadersSize());
      data.put("req_body", request.getBodySize());
      data.put("resp_head", response.getHeadersSize());
      data.put("resp_body", response.getBodySize());
      InFlux.store("browser_stat", now, data);
    index += counter;

And the code to write the data into InfluxDB..

  private static InfluxDB db;

  static {
    if (Config.INFLUX_ENABLED) {
      db = InfluxDBFactory.connect(Config.INFLUX_URL, Config.INFLUX_USER, Config.INFLUX_PW);
      db.enableBatch(2000, 1, TimeUnit.SECONDS);

  public static void store(String name, long time, Map<String, Long> data) {
    if (!Config.INFLUX_ENABLED) return;
    Point.Builder builder = Point.measurement(name)
            .time(time, TimeUnit.MILLISECONDS)
            .tag("tom", name);
    for (String key : data.keySet()) {
      builder.field(key, data.get(key));
    Point point = builder.build();
    //you should have enabled batch mode (as shown above) on the driver or this will bottleneck
    db.write(Config.INFLUX_DB, "default", point);

And here are some example data visualized with Grafana for some metrics I collected this way:


In the lower line chart, the number of elements loaded on click is shown. This refers to how many HTTP requests are generated per a WebDriver click on the website. The upper line/plot chart shows the minimum, maximum and average load times for different requests/responses. That is, how much time it took for the server to send back the HTTP responses for the clicks.

This shows how the first few page loads generated high number of HTTP requests/responses. After this, the amount goes down and stays quite steady at a lower level.  I assume this is due to the browser having cached much of the static content and not needing to request it all the time. Occasionally our simulated random browser user enters a slightly less explored path on the webpage, causing a small short term spike.

This nicely shows how modern websites end up generating surprisingly large numbers of requests. Also this shows how some requests are pretty slow to respond, so this might be a useful point to investigate for optimizing overall response time.

That’s it. Not too complicated but I find it rather interesting. Also does not require too many modifications to the existing WebDriver tests, just taking into use the proxy component, parsing the HAR file and writing to the database.


One thought on “Harharhar said the Santa when capturing browser test metrics with Webdriver and LittleMobProxy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s