Twitter About Home

ASP.NET MVC: Prevent XSS with automatic HTML encoding

There’s an interesting (and sometimes heated) debate on the ASP.NET MVC forums about HTML encoding.

Published Dec 19, 2007

It started with a proposal for a helper method to HTML-encode strings as soon as they are received from the visitor, so they’d be stored HTML-encoded in the database. That way, you don’t have to HTML-encode them for display to prevent cross-site scripting. If that was the default behaviour for the UpdateFrom() method, the idea of encoding for storage would no doubt be widely adopted.

Almost everyone else on the forum, though, has a strong preference for not encoding anything until the moment of display. There are some obvious benefits to this approach – you don’t have to remember which strings were pre-encoded (according to their origin), and you don’t have un-encode them when outputting to any non-HTML format. But it does mean you have to remember to encode things wherever you output them.

Sadly the two methods are incompatible, and you will have to choose one side or the other. I am very definitely in the encode-when-displaying camp.

Another solution

What I’d really like is to change the default behaviour of ASPX’s <%= … %> syntax so that it HTML-encodes the result by default. That’s what you want 95% of the time, so why should you keep writing <%= HttpUtility.HtmlEncode(…) %> all the time?

  Current reality In my ideal world
Output unencoded string <%= value %> <%= (RawHtml) value %>
Output encoded string <%= HttpUtility.HtmlEncode(value) %> <%= … %>

This would give us the best of both worlds. You wouldn’t need to remember to HTML-encode your strings (since that happens by default), so there’d be no need to store things pre-encoded in the database and then worry about double-escaping, sharing data with external systems, unencoding for output to non-HTML format and all that other nonsense.

Spike implementation

It’s a great credit to the ASP.NET architecture that we can actually implement that change of behaviour ourselves, and with not much code either. The idea is to intercept the code generation phase that happens when an ASPX file is compiled.

You can specify your own compiler implementation by editing this section of the web.config:

<system.codedom>
   <compilers>
      <compiler language="c#;cs;csharp" type="Microsoft.CSharp.CSharpCodeProvider .. etc" extension=".cs" warninglevel="4" />
   </compilers>
</system.codedom>

… and, helpfully, you can subclass CSharpCodeProvider, override the GenerateCodeFromStatement() method, and redirect all the <%= … %> evaluations through a suitable helper function.

Demonstration

You can download a demonstration project to see this in action, or to install the behaviour into your own project, follow these steps:

  1. Download the SafeEncodingHelper assembly (or build it yourself – the demo project includes sources), and add a reference to it in your project.

  2. In your web.config, edit the system.codedom.compilers element, to look like this:

<compiler language="c#;cs;csharp" type="SafeEncodingHelper.SafeEncodingCSharpCodeProvider, SafeEncodingHelper" extension=".cs" warninglevel="4">
	<provideroption value="v3.5" name="CompilerVersion" />
	<provideroption value="false" name="WarnAsError" />
</compiler>
  1. Also in web.config, under pages/namespaces, add a reference to the SafeEncodingHelper namespace:
<namespaces>
	<add namespace="System.Web.Mvc" />
	<add namespace="System.Linq" />
	<add namespace="SafeEncodingHelper" />
</namespaces>

 

That’s all! You will now find that <%=…%> encodes its output, or you can get unencoded output by casting your value to the RawHtml type, i.e. <%= (RawHtml)myValue %>.

What about MVCToolkit?

You might be thinking that this is going to break the MVC toolkit, since you use it to build HTML controls with a syntax like this:

<%= Html.TextBox("myinput", "It's nice") %>

You might, reasonably, expect this now to render a bunch of useless HTML-encoded nonsense. There’s a neat solution, though – the MVC toolkit could return values of the RawHtml type (which is merely a wrapper around System.String which adds no functionality). This is specially recognised by the SafeEncodingHelper compiler, and bypasses the HTML encoding. So, you can keep your clean syntax for any methods that you specifically want to render unencoded HTML.

Also, if someone isn’t using SafeEncodingHelper, no problem! The RawHtml type has a .ToString() method that simply returns the underlying value, so the MVC toolkit methods would still work just as well.

The demonstration project contains an alternative MVC toolkit that behaves this way. Actually, it only has a single facility (TextBox), but it’s enough to give you the idea.

Should I really use this then?

Firstly, this code comes with no warranties at all. Use it if you want, but beware – I just cooked it up on impulse and there may be any number of special cases I haven’t accounted for. It’s a proof of concept, that’s all.

Unless Microsoft chooses to support the RawHtml type in their MVC toolkit and related methods, you would have to remember to cast all MVC toolkit output to RawHtml, or write your own wrapper methods or something. Not much fun, sorry.

READ NEXT

ASP.NET MVC Architecture 1: Routing

I’m going to do a series of short posts summarising the core architectural structure of ASP.NET MVC. This is really for my own benefit – to make sure I understand it in detail – so if someone else has already covered these topics, it doesn’t matter!

Published Dec 17, 2007