tl;dr:

Adding geohash based spatial indexing and search capabilities to DokuWiki.

DokuWikiSpatial: spatial indexing and search for your wiki

Indexing

Now that we can add location to our wiki pages using the mapping and geotagging plugins discussed in previous posts lets start making use of that data. Next to the SEO benefits it would be nice to be able to use that data for generating information. To start off this data needs to be made searchable in a fast manner, this is done by creating a (spatial) index.

The chosen index algorithm is based on a value calculated from the coordinate pair, the geohash. This hash provides a one dimensional representation of a coordinate pair with the length of the hash being a measure for the accuracy of the coordinates. For example the coordinates (57.64911 10.40744) gives a hash of u4pruydqqvj and (57.6 10.4) gives a hash of u4pru. A big advantage is that this can be easily stored in an array with the geohash as a key and the resource(s) as value. Using PHP’s built-in serialisation it is stored on disk in plain text, just like the other indexes and wiki metadata.

As part of the indexing both a KML and a GeoRSS file are generated, these can be served up as a spatial sitemap but may also be used in a map on the wiki. Also, as DokuWiki provides support for EXIF data in images, when uploading JPEG or TIFF media into the wiki’s media store these are added to the spatial index as well if they have the proper EXIF GPS tags.

Searching

The most typical question for generating information is “What is near (to)?…” In case of the wiki resources this would yield one or more pages or images.

Looking back at the example above note that there is an overlap in the hash, this is another benefit of using the geohash. This allows selective matching implicitly enabling the use of a bounding box search ie. longer geohashes specify a more accurate location but in reverse also a smaller serach area.

A small drawback with using a geohash this way is that the positional error and boundingbox grow rapidly with shortening geohash length. For example, a geohash of five characters (u4pru) has an error of ±2.4km where a geohash that is three characters shorter (u4) has a positional error of ±630km. This means that when doing hash based lookups the actual search location is not static with shortening the hash/enlarging the bounds. This requires some more work, but for the initial 0.1 release of the plugin it is sufficient.

The plugin integrates with the geotag plugin to link to a page with dynamic search results. So a geotag with coordinates 48º11'36.384"N;16º27'39.06"E links to ?do=findnearby&lat=48.19344&lon=16.46085 which initially does a lookup in the index for a geohash of u2ednt67js. This will render as a html page with a list of results. The search interface will be augmented to support other output formats, eg. a complete map or a GeoRSS document are also possibilities.

search results screen capture
An example of a list of search results on a wiki page of a findnearby to 50.62ºN;6.04ºE.

Some more examples are available on the sample site and wild-water.nl

Talk about this on twitter.

If you've published a reaction to this blog, let me know the url by twitter or mail and I will add your link here.

Tweets van @GeoDiensten