Data Archives as Digital Platforms

From Cory Doctorow’s article, based on a 1936 original drawing by Wanda Gag for ‘Hansel and Gretel’ by the Brothers Grimm.

Cory Doctorow recently coined the term ‘enshittification’ in relation to digital platforms, which he defines as the way in which a platform starts by maximising benefits for its users and then, once they are locked in, switches attention to building profit for its shareholders at the expense of the users, before (often) entering a death-spiral (Doctorow 2023). He sees this applying to everything from Amazon, Facebook, Twitter, Tiktok, Reddit, Steam, and so on as they monetise their platforms and become less user-focused in a form of late-stage capitalism (Doctorow 2022; 2023). As he puts it:

… first, they are good to their users; then they abuse their users to make things better for their business customers; finally, they abuse those business customers to claw back all the value for themselves. Then, they die. (Doctorow 2023).

For instance, Harford (2023) points to the way that platforms like Amazon run at a loss for years in order to grow as fast as possible and make their users dependent upon the platform. Subsequent monetisation of a platform can be a delicate affair, as currently evidenced by the travails of Musk’s Twitter and the increasing volumes of people overcoming the inertia of the walled garden and moving to other free alternatives such as Mastodon, Bluesky, and, most recently, Threads. The vast amounts of personal data collected by commercial social media platforms strengthens their hold over their users, a key feature of advanced capitalism (e.g., Srnicek 2017), making it difficult for users to move elsewhere and also raising concerns about privacy and the uses to which such data may be put. Harford (2023) emphasises the undesirability of such monopolisation and the importance of building in interoperability between competing systems to allow users to switch away as a means of combatting enshittification.

Continue reading

Grey Data

Row of books with Informed on the spinesIn recent years, digital access to unpublished archaeological reports (so-called ‘grey literature’) has become increasingly transformational in archaeological practice. Besides being important as a reference source for new archaeological investigations including pre-development assessments (the origin of many of the grey literature reports themselves), they also provide a resource for regional and national synthetic studies, and for automated data mining to extract information about periods of sites, locations of sites, types of evidence, and so on. Despite this, archaeological grey literature itself has not yet been closely evaluated as a resource for the creation of new archaeological knowledge. Can the data embedded within the reports (‘grey data’) be re-used in full knowledge of their origination, their strategies of recovery, the procedures applied, and the constraints experienced? Can grey data be securely repurposed, and if not, what measures need to be taken to ensure that it can be reliably reused?

Continue reading

The Death of Data

Dead Data” by Stinging Eyes CC BY-SA 2.0

Yesterday was World Digital Preservation Day and saw the publication of the Digital Preservation Coalition’s Bitlist – their global list of Digitally Endangered Species. Interestingly, under their ‘Practically Extinct’ category (“when the few known examples are inaccessible by most practical means and methods”) sits Unpublished Research Data, which they define as

“research data which has not been shared or published by any means and is thus in contravention of the ‘FAIR’ principles which require data to be Findable Accessible, Interoperable and Reusable”.

Although the DPC jury hopes that this is a small group, I rather suspect that there is an unseen mountain of unpublished research data in archaeology (and in the interest of full disclosure: reader, I have some).

This crossed my screen at the same time as a paper published in the Harvard Data Science Review by Stephen Stigler: ‘Data Have a Limited Shelf Life’, in which he argues that data, unlike wines, do not improve with age. He suggests that old data are “Often … no more than decoration; sometimes they may be misleading in ways that cannot easily be discovered”, while emphasising this is not the same as saying they have no value. Using three examples of old statistical data, he shows how misleading and incomplete they can be if their full background is not known. In each case, the data were selected from a prior source, not always accurately referenced if at all. In some instances, uncovering the original data flagged problems with the sample that had been taken, in others it revealed a greater breadth and depth of information which had gone un-used because the particular research question had stripped them away.

Continue reading

Delving into Data Reuse

Given the years, the money, expertise and energy we’ve spent on creating and managing archaeological data archives, the relative lack of evidence of reuse is a problem. Making our data open and available doesn’t equate to reusing it, nor does making it accessible necessarily correspond to making it usable. But if we’re not reusing data, how can we justify these resources? In their reflections on large-scale online research infrastructures Holly Wright and Julian Richards (2018) have recently suggested that we need to understand how to optimize archives and their interfaces in order to maximize the use and reuse of archaeological data, and explore how archaeological archives can better respond to user needs alongside ways to document and understand both quantitative and qualitative reuse.

However, I would argue that all these kinds of issues (alongside those of citation, recognition, training, etc.) while not resolved are at least known and mostly acknowledged. The real challenges to data reuse lie elsewhere and entail a much deeper understanding and appreciation of what reuse entails: issues associated with the re-presentation and interpretation of old data, the nature and purpose of reuse, and the opportunities and risks presented by reuse. Such questions are not specific to digital data; however, digital data change the terms of engagement with their near-instant access, volume, and flexibility, and their potentially transformative effects on the practice of archaeology now and in the future.

Continue reading

Digital Data Relations

Data is the new oil
(adapted from original by Gerd Leonhard, CC-BY-SA 2.0)

We sometimes underestimate the impact of digital data on archaeology because we have become so accustomed to the capture, processing, and analysis of data using our digital tools. Of course, archaeology is by no means alone in this respect. For example, Sandra Rendgren, who writes about data visualisation, infographics and interactive media, recently pointed to the creation of a new genre of journalism that has arisen from the availability of digital data and the means to analyse them (2018a). But this growth in reliance on digital data should lead to a re-consideration of what we actually mean by data. Indeed, Sandra Rendgren suggests that the term ‘data’ can be likened to a transparent fluid – “always used but never much reflected upon” – because of its ubiquity and apparent lack of ambiguity (2018b).

Continue reading

Is there a digital File Drawer problem?

by Sailko via Wikimedia Commons CC BY-SA 3.0

Although there has been a dramatic growth in the development of autonomous vehicles and consequent competition between different companies and different methodologies, and despite the complexities of the task, the number of incidents remains remarkably small though no less tragic where the death of the occupants or other road users is involved. Of course, at present autonomous cars are not literally autonomous in the sense that a human agent is still required to be available to intervene, and accidents involving such vehicles are usually a consequence of the failure of the human component of the equation not reacting as they should. A recent fatal accident involving a Tesla Model X (e.g. Hruska 2018) has resulted in some push-back by Tesla who have sought to emphasise that the blame lies with the deceased driver rather than with the technology. One of the company’s key concerns in this instance appears to be the defence of the functionality of their Autopilot system, and in relation to this, a rather startling comment on the Tesla blog recently stood out:

No one knows about the accidents that didn’t happen, only the ones that did. The consequences of the public not using Autopilot, because of an inaccurate belief that it is less safe, would be extremely severe. (Tesla 2018).

Continue reading

Data Citation Reprised

CC0 by Tama66 via Pixabay

So here’s a thing. A while ago, I asked whether there was any way to quantify the extent to which archaeologists were citing their reuse of data. I used the Thomson Reuters/Clarivate Analytics Data Citation Index (DCI) as a starting point, but it didn’t go too well … Back then, the DCI indicated that 56 of the 476 data studies derived from the UK’s Archaeology Data Service repository had apparently been cited elsewhere in the Web of Science databases (the figure is currently 58 out of 515). But I also found that the citations themselves were problematic: the citation of the published paper/volume was frequently incomplete or abbreviated, many appeared to be self-citations from within interim or final reports, in some cases the citations preceded the dates of the project being referenced, and in many instances it was possible to demonstrate that the data had been cited (in some form or other) but this had not been captured in the DCI. At that point I concluded that the DCI was of little value at present. So what was going on?

Continue reading

Opening up Open Archaeology

openRecent years have seen a flurry of publications and statements concerning the importance and value of the open science movement in archaeology. Examples include the collection of papers published in 2012 in World Archaeology (see Lake 2012), the volume on Open Source Archaeology edited by Andrew Wilson and Ben Edwards (2015), and, most recently, a series of papers by Ben Marwick (2016; Marwick et al 2017). The idea that publications, data, and methods (including code) should be freely accessible in order to make archaeological research more reproducible is evidently a ‘good thing’ and very much in vogue.

As Tom Brughmans has recently written:

“Our very diverse work ranging from excavation, over lab tests, to interpretations is often only made available through a summarising publication that is rarely accessible to anyone other than institutions paying huge amounts of money. This is just not the way science works anymore. In such a system, how can we find out all the details of excavation results? How can we reproduce lab tests? How can we evaluate the empirical and historical background to a published interpretation in exhaustive detail? The answer is: we can’t.”

Rob Barrett has recently said something similar specifically in relation to 3D reconstruction. The value of opening up archaeological research seems undeniable, and the set of practices outlined by the new Open Science Interest Group (Marwick et al 2017, 12-13) put forward make a great deal of sense and are highly desirable. But there are some implicit underlying assumptions behind all this which don’t seem to have been addressed. They don’t detract from the importance of pursuing a truly open archaeology, but not recognising them risks not learning from past experience.

Continue reading

Citing Data Reuse

Beyonce CitationI’ve commented here and here about the question of data reuse (or more accurately, the lack of it) and the implications for archaeological digital repositories. It’s frequently argued that the key incentive for making data available for reuse is providing credit through citation. So how’s that going? I’ve not seen any attempt to actually quantify this, so out of curiosity I thought I’d have a go.

A logical starting point is Thomson Reuters Data Citation Index  – according to its owners (it’s a licensed rather than public resource), this indexes the contents of a large number of the world’s leading data repositories, and, on checking, the UK’s Archaeology Data Service (ADS) appears among them. So far so good.

Continue reading

The idle archive?

dont-wear-out-my-archiveWe often hear of the active archive, but what about an idle one? In a post on Digital Data Realities, I suggested that, although we might wish otherwise, our digital archaeological data repositories seemed relatively little-used. The Archaeology Data Service access statistics did not suggest a large uptake for the project archives it holds, and the ADS had not found it easy to attract entries to its Digital Data Reuse Awards in the past. In that light, I commented that it would be interesting to see how the OpenContext & Carleton Prize for Archaeological Visualization would get on. Well, the jury is now in, and the winner is … the ‘Poggio Civitate VR Data Viewer’, an impressive-looking data viewer, though as it requires an HTC Vive to use, I can sadly only watch the video rather than experience it myself …

However, as interesting are Shawn Graham’s reflections on the experience of organising the contest:

“We offered real money – up to a $1000 in prizes. We promoted the hang out of it. We made films, we wrote tutorials, we contacted professors across the anglosphere. We had very little uptake.”

(accompanied in his presentation by an image of tumbleweed) … Indeed, only the one winner was announced for the team prize – no individual or student prizes were awarded as was originally intended. So what’s going on?

Continue reading