Site Meter
 
 

Using HtmlUnit on .NET for Headless Browser Automation

If you subscribe to this blog, you may have noticed that I’ve been writing about test automation methods a lot lately. You could even think of it as a series covering different technical approaches:

The reason I keep writing about this is that I still think it’s very much an unsolved problem. We all want to deliver more reliable software, we want better ways to design functionality and verify implementation… but we don’t want to get so caught up in the beaurocracy of test suite maintenance that it consumes all our time and obliterates productivity.

Yet another approach

Rails developers (and to a lesser extent Java web developers) commonly use yet another test automation technique: hosting the app in a real web server, and accessing it through a fast, invisible, simulated web browser rather than a real browser. This is known as headless browser automation.

How is this beneficial?

  • It’s faster than driving a real browser. The simulated browser is just a native code library, so it’s very quick to launch and shut down, doesn’t require interprocess communication with your test suite, and doesn’t waste time physically drawing things on your screen, opening and closing pop-up windows, etc. Nonetheless, the simulated browser is full-featured: it exposes the usual HTML DOM API, runs JavaScript, evaluates CSS rules, and so on, so it’s still an effective way to specify rich client-side behaviours.
  • It’s more reliable than a real browser. Real browsers are complicated. For example, if you launch and shut down Firefox many times in rapid succession, occasionally it will fail to launch because a previous instance is still locking some file. Simulated browsers are totally independent, so don’t suffer these kinds of weirdness.
  • You can get more low-level control if you want it. For example, the simulated browser can easily offer an API to alter the HTTP headers it sends, or let you get or set the contents of its cache. A real browser wouldn’t usually make this easy.

And what drawbacks might you expect?

  • It can be harder to debug. Because you can’t physically see the browser on your desktop, it’s not as obvious what’s happening if a test fails (or passes) unexpectedly. Your debugger’s “immediate mode” will let you call the browser’s API to figure things out, but it can be a longer investigative process.
  • There’s no absolute guarantee that it faithfully replicates a real-world browser. For example, HtmlUnit can simulate multiple versions of Firefox, Internet Explorer, and Netscape, but it may not simulate every single quirk.

About HtmlUnit

HtmlUnit is a headless browser automation library for Java. It’s very well-developed and mature, as you can see from its extensive API. (Need to configure whether the user has JavaScript on or off? No problem.) It’s also the same technology that underlies Celerity, a Ruby library that exposes the Watir API but runs faster.

Unfortunately, in .NET world, we don’t have any good headless browser automation libraries like HtmlUnit that I know of. But don’t let that stop you! We do have IKVM – a near-magic way of running Java code on .NET (either by hosting a JVM on the CLR at runtime, or by a one-time conversion process that generates a native .NET assembly directly from the Java bytecode). So, why not use HtmlUnit itself?

Converting HtmlUnit to .NET

It’s surprisingly easy to get HtmlUnit, a Java library, converted into a native .NET assembly (no Java Virtual Machine needed!) using IKVM.

First, download HtmlUnit (as a compiled JAR file) from SourceForge (I’m using version 2.7), and extract all its files from the ZIP archive.

Second, download IKVM binaries from ikvm.net/SourceForge (I’m using version 0.42.0.3), and again extract all its files from the ZIP archive.

Open a command prompt, ensure you’ve added IKVM’s /bin folder to your PATH variable, change directory to HtmlUnit’s /lib folder, and then run ikvmc to convert the Java bytecode of all of the HtmlUnit JAR files into .NET bytecode. I ran this command:

ikvmc -out:htmlunit-2.7.dll *.jar

This produced a lot of warnings and a large (~10Mb) .NET assembly called htmlunit-2.7.dll. If you don’t want to bother with this process, you can download my sample project (linked below) which contains the .NET assembly I generated.

Using HtmlUnit from .NET

Once you’ve got htmlunit-2.7.dll as a .NET assembly, you can use it with any .NET unit testing library such as NUnit or XUnit. You will also need to reference a few of the IKVM runtime assemblies (I narrowed it down to six additional IKVM assemblies that appear to be needed – these are all included in IKVM’s /bin folder).

As a trivial example, here’s how you can use HtmlUnit with NUnit to load the Google homepage:

[TestFixture]
public class GoogleTests
{
    private WebClient webClient;
    [SetUp] public void Setup() { webClient = new WebClient(); }
 
    [Test]
    public void Can_Load_Google_Homepage()
    {
        var page = (HtmlPage)webClient.getPage("http://www.google.com");
        Assert.AreEqual("Google", page.getTitleText());
    }
}

Slightly more interesting, here’s how you can fill out a form (again, Google Search), click a button, and inspect the results:

[Test]
public void Google_Search_For_AspNetMvc_Yields_Link_To_Codeplex()
{
    var searchPage = (HtmlPage)webClient.getPage("http://www.google.com");
    ((HtmlInput)searchPage.getElementByName("q")).setValueAttribute("asp.net mvc");
    var resultsPage = (HtmlPage)searchPage.getElementByName("btnG").click();
 
    var linksToCodeplex = from tag in resultsPage.getElementsByTagName("a").toArray().Cast<htmlAnchor>()
                          let href = tag.getHrefAttribute()
                          where href.StartsWith("http://")
                          let uri = new Uri(href)
                          where uri.Host.ToLower().EndsWith("codeplex.com")
                          select uri;
    CollectionAssert.IsNotEmpty(linksToCodeplex);
}

As you can see, the API is full of Java idioms and feels a bit odd to a .NET developer. It would be great if someone decided to create a .NET wrapper library to expose a nicer API, .NET-style PascalCase, better use of IEnumerable and generics so LINQ queries were simpler, etc.

The other side of it is that HtmlUnit is very powerful. You can trivially scan the DOM with XPath, search for child elements, invoke events, much more easily than with WatiN.

To show that HtmlUnit has great JavaScript and Ajax support, here’s an example of automating a jQuery AutoComplete plugin to check its suggestions:

[Test]
public void jQuery_Autocomplete_Lon_Suggests_London()
{
    // Arrange: Load the demo page
    var autocompleteDemoPage = (HtmlPage)webClient.getPage("http://jquery.bassistance.de/autocomplete/demo/");
 
    // Act: Type "lon" into the input box
    autocompleteDemoPage.getElementById("suggest1").type("lon");
    webClient.waitForBackgroundJavaScript(1000);
 
    // Assert: Suggestions should include "London"
    var suggestions = autocompleteDemoPage.getByXPath("//div[@class='ac_results']/ul/li").toArray().Cast<htmlElement>().Select(x => x.asText());
    CollectionAssert.Contains(suggestions, "London");
}

In case you want to download all this code as a demo project you can run without needing IKVM or Java or anything weird, I’ve put it on GitHub.

Summary

This is experimental! I haven’t used this on any real project, though it was pretty effortless so far, so I’d definitely consider it. I don’t know how much faster HtmlUnit would be than a real browser, but it does bypass the trickiness of real browsers, which may be worth it alone. If I was going to use it seriously, I’d definitely make some .NET wrapper classes to hide the Java naming and idioms. HtmlUnit on .NET feels robust, but I haven’t pushed it hard.

kick it on DotNetKicks.com

52 Responses to Using HtmlUnit on .NET for Headless Browser Automation

  1. Have a look at WebAii. (http://www.artoftest.com/products/webaii.aspx, or Telerik also host a version) Their framework does the standard ‘control a browser’ style testing that WatiN does (and they do it a lot better; much more stable than WatiN in my experience) but they also have a browser type of ‘AspNetHost’. When using this you host your ASP.NET code in process and then use a simulated browser for your tests, in a similar way to HtmlUnit. However it doesn’t seem as powerful as HtmlUnit; for example I don’t believe it has JavaScript support.

  2. Hi Steve:

    Thanks for the excellent blog entry! I am the maintainer for PushToTest TestMaker, an open source platform that repurposes Web/RIA/BPM tests into functional tests, load and performance tests, and business service monitors. TestMaker integrates Selenium and HtmlUnit. For example, we run Selenium tests as load tests.

    We recently added support for .NET and VB scripts. We make a simple command-line call to execute the scripts. I am looking around for .NET test examples that I can write about in our tutorials. I would be glad to partner with you if you have interest.

    Thanks.

    -Frank

  3. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #571

  4. You might also consider Selenium. The nice thing about this is you actually click around in Firefox and record your clicks. Then you can replay this either in your browser, or package it up and automate on a server for playback in IE and other browsers.

  5. Pingback: Using HtmlUnit on .NET for Headless Browser Automation « Steve Sanderson’s blog | Head.SmackOnTable();

  6. Travis Laborde

    Just as an FYI, if anyone is using this and gets an error with SSL pages, just also include the IKVM.OpenJDK.Security.dll file from IKVM.

  7. John

    I’m still having a difficult time understanding how these tests add value or maybe I’m not understanding how these are testing anything additional beyond the normal unit tests.

    A couple months ago, I heard Uncle Bob Martin give a demo on Fitnesse, which his company supports, and when asked about UI testing, his comment was that it was generally a bad idea, because 1) most ui tests being written were actually domain tests disquised as UI tests (for which he said that Fitnesse was the answer) and 2) UI tests are incredibly brittle make it hard to keep a process automated.

    Maybe I just didn’t understand how the examples are applicable to the real world. Why are you testing what Google returns, since it isn’t within your power to change the application? Does the fact that you’re testing an external link mean something in this test? If not, I would have thought testing a controller action would be appropriate. If so, then this doesn’t actually appear to be a test, as much as it seems more like an example of a screenscraper lookup that might be used in a dashboard app.

    In last example, couldn’t this have been written as a YUI Javascript unit test and eliminated the dependency on creating a web client?

  8. Travis Laborde

    For me what makes it awesome is that it is much easier to integrate “another set of NUnit tests” into my CI process than it would be to learn another testing methodology and also integrate that into my CI.

    With this, we write some more tests using the already familliar xUnit process, and add another step to the CI that runs after deployment. Super plain and simple.

    Thanks Steve!

  9. Troy

    @John I think you’re missing that this isn’t intended to test Google. This is intended to test your own pages. Google is just being used as an example of the process in this sample code.

    In the ‘real world’ it is very important that validation is as expected ON the page. Not just in the view model. This even allows completely client side validation to be tested. It allows links to actually be verified ‘in use’ not just that the model has the correct url string.

    These tests become more and more significant in complex dynamically generated html.

  10. John

    Troy,

    Help me to understand. What about my own page am I testing? Suppose that I write a test

    HomeController_DefaultPage_ReturnsWithTitle()

    Wouldn’t that test the exact same thing without involving the WebClient library? Integration tests are typically done in addition to unit tests, so you would also test the controller in exactly the same way, right? Also, it seems that the only thing that the only thing extra that this would be doing is to test the MVC framework iteself, which appears to me to be a lot of extra work with very little benefit.

  11. @John

    The very nature of integration tests are that they exercise the complete stack: everything from the view to the database to even your JavaScript. So instead of HomeController_DefaultPage_ReturnsWithTitle(), you’d write a test like
    User_Dashboard_Shows_Unfinished_Tasks().

    What this test would do is to visit http://www.yourapplication.com/dashboard/user/1 and make sure that this page correctly displays all the unfinished tasks for the respective user. If you think about it, it’s basically providing an assurance that your tests are testing EXACTLY what your client is interested in.

    I’m a BIG proponent of Integration tests since they in essence provide you an automated testing system that comes as close to the real world as possible, namely by visiting a link, filling out a form, submitting it to a database and then making sure that it shows up correctly on the page.

  12. John

    Praveen,

    Sorry, but the example that you site isn’t compelling enough to convince me that UI integrations tests have value. In the example that you cite:

    The big difference of why I think unit tests and acceptance tests are very, very good, but integration tests are dubious in value is how the tests are created. In Fitnesse, an acceptance testing tool, the business analyst / tester creates the test BEFORE the code is written, much in the same way that unit tests are written before code in TDD. Then, the developer codes the module and runs the tests. The acceptance tests written by the BA / tester actually serve as requirements for the application. It’s also an excellent progress indicator of a project: if you’re a manager and want to know where a project is at, just check what % of tests are passing, because the tests are the requirements.

    Compare that with traditional integration tests. Every system that the application relies on is built first. Then, the tester creates the automated tests. This potentially involves quite a bit of configuration. And let’s be honest, how many teams having said that “the work is done, all that’s left is the testing” only to then watch the project get delayed as fixing the tests “took longer than expected”.

    I’m not an expert on testing, just someone trying to learn and when, or if, UI integration is needed. My gut instinct, along with some agile sources, is telling me that UI integration testing is a step backwards and anti-agile in most cases.

  13. Intriguing, I’ve not noticed this…

  14. Ziss

    Hi Steve,

    I feel like we have to discuss this slogan – “Now with 100% extra MVP”.

    Surely, if there used to be 0 MVP, an increase by 100% would still be 0 MVP. Are you saying that there previously was 1 MVP, this MVP was increased by 100%, and now there is 2 MVP? Your readers are confused, and inquiring minds demand to know.

    Best regards,

  15. Steve

    Ziss, there is a certain amount of MVP-ness (careful how you pronounce that) on this blog now, and 100% of it is “extra” to what was here before. Ah, trixy percentages, they always have multiple interpretations.

  16. Miguel

    Steve,
    i’ve been testing this htmlunit for .net. One thing i’ve notice is that htmlunit requests it’s very slow. I’m doing something wrong?

  17. Charlie

    I’m trying to use this library with https pages, and I get this exception:

    java.net.MalformedURLException: unknown protocol: https

    Any idea what I could do?
    Travis Laborde, above at comment #6, mentioned to include the IKVM.OpenJDK.Security.dll file. I did this as well, and added a reference to the solution, but I’m still getting the same error…

    Any idea?

  18. Mehdi Chaouachi

    HttpUnit is an excellent library with multiple very benfical uses.
    I personally used around 5 years ago to build a program for a web marketing company : They wanted a software to capture google results and analyze them to better advice thier clients.
    Html made this task so easy that i became a huge fan of that library.

  19. Mehdi Chaouachi

    oops, I meant HtmlUnit, sorry for the spelling mistake.

  20. Mark Pawelek

    Steve thanks for all your articles on testing via the UI. Very exciting.

    Have you had a look at Window Licker [http://code.google.com/p/windowlicker/] ?

    Right now, I’m reading “Continuous Delivery” [Jez Humble and David Farley]. Haven’t got to reading it yet, but chapter 8 deals with ‘Automated Acceptance Testing’. They mention the Window Driver pattern, which looks like an interesting alternative.

    Now that you told me how to do it, I’ll be using HttpUnit and HtmlUnit too – and probably Window Licker!

  21. John Tarbox

    Wow, this is great stuff. I just watched your presentation at 2nd mvcConf on MCS Scaffolding (great talk!) and discovered your blog. In reading your posts I discovered this one and downloaded your code from GitHub.

    This is the best solution I have ever used for testing and scraping web pages. Highly recommended!

    Congratulations on your job at Microsoft and keep up the good work.

  22. Vinayak

    Nice blog steve. It Helped me a lot.

    I have one query. Is there any clean way to access elements of the html page apart from getElementById you mentioned in the example?
    The issue is .NET mangles the ID and we have pages to test where controls doesn’t have names. So I can not use getElementByName also.

    Thanks,
    Vinayak

  23. Ranjith Venkatesh

    Is it possible that using htmlunit in Windows might cause blue screens?

    I have been using it successfully and since then having multiple blue screens.

  24. Pingback: Web Scraping Ajax and Javascript Sites « Data Big Bang Blog

  25. Thanks for the article, except it doesn’t seem like this is a practical solution. I’m running into errors, not to mention the fact that the getPage call takes on average 30 seconds. Try the test case below:

    [Test]
    public void TestHtmlUnit()
    {
    var html = String.Empty;
    try
    {
    var webClient = new WebClient();
    webClient.setThrowExceptionOnScriptError(false);
    var demoPage = (HtmlPage)webClient.getPage(“http://realtor.com/realestateandhomes-search/Geneva_NY#/listingType-any”);
    html = demoPage.asXml();
    }
    catch (com.gargoylesoftware.htmlunit.ScriptException exc)
    {
    html = exc.getPage().asXml();
    }
    Console.WriteLine(html);
    Assert.IsNotEmpty(html, “No html returned”);
    }

    Half the time it gives this useless error message:

    Test ‘Housters.Test.ScrapeTest.TestHtmlUnit’ failed: net.sourceforge.htmlunit.corejs.javascript.EcmaError :
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(WebResponse , WebWindow , HtmlPage )

    Other times it executes but like I said, it takes 30 seconds.

    I can’t see how this could be used in a real-world solution, let me know if I’m missing something.

  26. Glad I’ve finally found sotmehing I agree with!

  27. Pingback: SEO spidering for javascript/AJAX/client-scripts « Karuna Sorate

  28. Pingback: Automated Browserless OAuth Authentication for Twitter « Data Big Bang Blog

  29. Wow cool site! I’ll try to spread the word.

  30. Is there any clean way to access elements of the html page apart from getElementById you mentioned in the example?

  31. Have you got a current driving licence? lolita xxx
    564624

  32. I personally used HttpUnit for around 5 years ago to build a program for a web marketing company. .NET development company.

  33. I’m in my first year at university smuggler bbs
    265218

  34. Could I have a statement, please? Pre Teen Toplist 8(((

  35. We’re at university together Hussyfan Rapidshare 4700

  36. Will I get paid for overtime? Hussyfan Torrent 21221

  37. I was made redundant two months ago Teem Models Nude
    486229

  38. Thanks for sharing. I had tried the conversion process with the latest binaries (HtmlUnit 2.9, IKCV 0.46.x), and ended up with a missing class or something; the provided sample code worked beautifully!

  39. Remove card http://huhenogypaec.blog.free.fr/ preteen archives pics me thinks she did a pretty good job sucking his Cock and the POV looked sweet .. enjoyed watching her pulling and stroking at his Cock with both her hands while sucking on the head

  40. Is there ? http://oletosijiq.de.tl incest models nn is this jayden james mom or sumthin… she has tits like her and they r both hot… joslyn got sum sexy tits ass and feet

  41. Special Delivery http://iqyufucacak.de.tl preeteen modeles nude Yeah, err.. i imagine this would be what it would be like if a meth addict and a bookworm lezzed out.

  42. What do you do for a living? http://ebomoqujifono.de.tl sluty young models Unsure about the bro n sis thing, but this is basically the exact perfect kind of fucking for me. doggy. she rubs her pussy, she fingers her own ass, shes wet as hell, and she one fine ass. and shes hot.

  43. Gloomy tales http://ejaepyfau.de.tl uncensored sex bbs she has perfect tits and a cute ass. This would have been way hotter if she were fucking me.

  44. How much is a First Class stamp? http://roripyniqeyn.de.tl nymphets nudist anal I must confess: While I understand the safe sex element, this girl is so damn fine, I could easily be persuaded to go bare up in that beautiful ass!

  45. What company are you calling from? http://edoheado.de.tl index of nymphets jpg Nice to see you get fucked hun. And that slow grinding position from behind while she lies is my fav with the gf. Loved seeing you getting it

  46. It’s funny goodluck http://ubymymunydy.de.tl little uncensored only thing i dont like about this video is her moans..she sounds so fake and stupid lol..i had to silence this video..haha..but shes still Hott tho

  47. Pingback: Switching from Selenium 1.x to WebDriver/Selenium 2 and HtmlUnit – Simple-Talk