Yesterday was World Digital Preservation Day and saw the publication of the Digital Preservation Coalition’s Bitlist – their global list of Digitally Endangered Species. Interestingly, under their ‘Practically Extinct’ category (“when the few known examples are inaccessible by most practical means and methods”) sits Unpublished Research Data, which they define as
“research data which has not been shared or published by any means and is thus in contravention of the ‘FAIR’ principles which require data to be Findable Accessible, Interoperable and Reusable”.
Although the DPC jury hopes that this is a small group, I rather suspect that there is an unseen mountain of unpublished research data in archaeology (and in the interest of full disclosure: reader, I have some).
This crossed my screen at the same time as a paper published in the Harvard Data Science Review by Stephen Stigler: ‘Data Have a Limited Shelf Life’, in which he argues that data, unlike wines, do not improve with age. He suggests that old data are “Often … no more than decoration; sometimes they may be misleading in ways that cannot easily be discovered”, while emphasising this is not the same as saying they have no value. Using three examples of old statistical data, he shows how misleading and incomplete they can be if their full background is not known. In each case, the data were selected from a prior source, not always accurately referenced if at all. In some instances, uncovering the original data flagged problems with the sample that had been taken, in others it revealed a greater breadth and depth of information which had gone un-used because the particular research question had stripped them away.