Monday 25 November 2019

Filling the White Holes

I'm growing increasingly obsessed with data that isn't there... Earlier this month Dom Greaves used the phrase White Holes on Twitter:



It stuck in my mind and I haven't been able to shift it. Previously, I've used R to plot heatmaps of VC55 (Leicestershire and Rutland) spider records (all VC55 spider records (>43k) to end 2018, data copyright Leicestershire and Rutland Environmental Records Centre) (click images for larger versions):



However, the problem with heatmaps is that they inevitably focus attention on where the data is, rather than where it is missing. As an attempt to try to switch the emphasis I tried quadrat mapping - arbitrarily dividing VC55 into a grid and looking at the number of records within each section. Initially I tried a 50x50 grid but with 43,000 records, that choked my computer. A 25x25 grid worked but the the intervals were a bit small and a 10x10 grid is more informative:



Conveniently, R gives the record counts for each tile:



and plotting a histogram of the counts draws attention to how skewed the distribution of records is. Result!



The method isn't prefect. The tiles and hence the counts are of unequal area where they overlap the VC55 boundary, but I don't know how you get around that, with the possible exception that plotting the data by parish rather than by quadrat might be better?

The other place I'm currently stuck is trying to turn unstructured data into occupancy models. This is another White Hole Problem, but one that continues to defeat me.


Acknowledgements:
  • All data Copyright Leicestershire and Rutland Environmental Records Centre.
  • Data visualization performed using the R platform, v. 3.6.1 (R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org).
  • J. Cann for assistance with data visualization.

No comments:

Post a Comment

Comments welcome, I will respond as soon as I can.