Geolocation is easier (and cheaper) than you think
Most of the time when you’re surfing the web, or creating web applications, you don’t expect real geography to be involved. Historically it’s been tricky to identify with any accuracy or reliability the physical location of your visitors, and might even be said to contravene the spirit of the web. But if what if you really do want to take action based on where they’re coming from?
I discovered today that it’s really easy. Geolocation is the term used for the conversion of an IP address to a real-world geographic location. For example:
216.239.59.103 maps to: |
|
There are a whole bunch of companies selling geolocation services – usually either as web services, or as downloadable databases containing IP addresses and geographic data.
Free geolocation data
If you’re looking for something free, check out MaxMind GeoLite City. You can download a huge pair of CSV files (one for IP blocks, one for corresponding location data) from their website and import them into SQL Server, and you’ll end up with a schema like this:
Now, there are 4 billion possible IP addresses, so rather than explicitly listing every one as a separate row (which really would hurt SQL Server) the MaxMind people have split up the IP address space into blocks which correspond to a single geographic location. To help SQL Server find which block an IP address is in, they have defined a one-to-one mapping from IP addresses to BIGINT values, as such:
A.B.C.D <=> A*256^3 + B*256^2 + C*256 + D
So now you can (almost) instantly obtain geographic data for any IP address by defining a SQL user defined function (UDF) such as:
CREATE FUNCTION [GetGeocodingData] (@a tinyint, @b tinyint, @c tinyint, @d tinyint) RETURNS @ReturnTable TABLE ( City VARCHAR(255), Region VARCHAR(255), Country VARCHAR(255), Latitude FLOAT, Longitude FLOAT ) AS BEGIN INSERT @ReturnTable SELECT lo.City, lo.Region, lo.Country, lo.Latitude, lo.Longitude FROM Blocks bl JOIN Locations lo on lo.LocationID = bl.LocationID WHERE (CAST(@a AS BIGINT)*256*256*256 + @b*256*256 + @c*256 + @d) BETWEEN bl.[BlockStart] AND bl.[BlockEnd] RETURN END GO
In this example, the parameters @a, @b, @c, @d correspond to the four parts of the IP address (and should therefore be integers in the range 0-255).