My BBC colleagues, Tom Scott and Roo Reynolds were recently kicking around ideas for what to do with the 12,000 or so comments (a BBC record) that a blog post for the BBC TV show Springwatch recently received after asking if people had heard a cuckoo. They had lots of interesting ideas but I immediately connected this with the recent release of the Yahoo Placemaker API. This service allows you to submit some text and receive a list of recognised places back - the place name, a unique Where-on-Earth ID (WOEID) and a latitude/longitude. So I've written some Python scripts to extract placenames from the cuckoo comments and plot the sightings (hearings?) on a map.
I submit the comment text to the Yahoo Placemaker API in batches because Placemaker has a 50,000 byte limit for posted documents. This discovered 16078 place mentions consisting of 5386 unique places. The Top 20 are...
- New Forest, England, GB, 99 mentions
- Scotland, GB, 94
- Woodland, England, GB, 83
- Suffolk, England, GB, 80
- Yorkshire, England, GB, 79
- Norfolk, England, GB, 77
- Lake District National Park, England, GB, 77
- Dartmoor National Park, England, GB, 77
- Wales, GB, 74
- Essex, England, GB, 72
- Sussex, England, GB, 66
- Surrey, England, GB, 66
- Kent, England, GB, 59
- Somerset, England, GB, 51
- Hampshire, England, GB, 50
- Cumbria, England, GB, 48
- Norwich, England, GB, 47
- Island of Skye, Scotland, GB, 47
- Perthshire, Scotland, GB, 46
- Dorset, England, GB, 45
The top places are mainly counties or areas as you'd expect but further down the list are towns and villages. There are some false positives, like "Woodland" above, but by filtering the place names to include "GB" I removed many of these. There is another problem where some comments will generate multiple locations - e.g. "Basingstoke, Hampshire" gets two places extracted.
Then I converted these places into KML, the data format used by Google Earth, and generated a compressed KML file of the sightings that you can download. You can load this into Google Earth and it seems to cope, though it's pretty useless looking at a map full of pins until you zoom in. You can also paste that URL into the Google Maps search box to see the sightings but it will only load 1000 points and only plot about 80 points at one time - zoom in for more detail.
Finally, as originally suggested by Tom, I drew a heatmap-type image using Nodebox and Python to plot translucent dots onto a map of the UK (courtesy of OpenStreetMap). I fiddled around with the size and opacity until I got this (click to enlarge)...
Tom got into contact with Springwatch and I was phoned by Paul, their web producer. It turns out they had already got a data entry company to extract the place names from the comments, using people to do what I'm doing with code, but hopefully getting higher quality results. They are also sharing the data with the British Trust for Ornithology who will work on producing more accurate results from it. I redrew the map using their manual data and they also asked for a couple of closer views of London, Birmingham, Manchester and Scotland which I also generated though these aren't quite as pretty..
It's not scientifically accurate by any means but it is interesting and was a good experiment in extracting useful data from large numbers of comments.
I've just been watching Springwatch right now and they featured a map but I'm not sure it was mine - looks like it was this one from the BTO.
Disclaimer: These are my thoughts and opinions and not those of my employer