Faith, Trust, and Pixie Dust

Trust - broken egg
Adapted from original image by Kumar’s Edit, CC BY 2.0 Deed

It’s been some time since I last blogged, largely because my focus has lain elsewhere in recent months writing long-form pieces for more traditional outlets. The most recent of these considers the question of trust in digital things, a topic spurred by the recent (and ongoing) scandal surrounding the Post Office Horizon computer system here in the UK which saw the false conviction for theft, fraud, and false accounting of hundreds of people. One of the things that came to the fore as a result of the scandal was the way that English law presumes the reliability of a computer system:

In effect, the ‘word’ of a computational system was considered to be of a higher evidential value than the opinion of legal professionals or the testimony of witnesses. This was not merely therefore a problem with digital evidence per se, but also the response to it. (McGuire and Renaud 2023: 453)

Continue reading

Data Archives as Digital Platforms

From Cory Doctorow’s article, based on a 1936 original drawing by Wanda Gag for ‘Hansel and Gretel’ by the Brothers Grimm.

Cory Doctorow recently coined the term ‘enshittification’ in relation to digital platforms, which he defines as the way in which a platform starts by maximising benefits for its users and then, once they are locked in, switches attention to building profit for its shareholders at the expense of the users, before (often) entering a death-spiral (Doctorow 2023). He sees this applying to everything from Amazon, Facebook, Twitter, Tiktok, Reddit, Steam, and so on as they monetise their platforms and become less user-focused in a form of late-stage capitalism (Doctorow 2022; 2023). As he puts it:

… first, they are good to their users; then they abuse their users to make things better for their business customers; finally, they abuse those business customers to claw back all the value for themselves. Then, they die. (Doctorow 2023).

For instance, Harford (2023) points to the way that platforms like Amazon run at a loss for years in order to grow as fast as possible and make their users dependent upon the platform. Subsequent monetisation of a platform can be a delicate affair, as currently evidenced by the travails of Musk’s Twitter and the increasing volumes of people overcoming the inertia of the walled garden and moving to other free alternatives such as Mastodon, Bluesky, and, most recently, Threads. The vast amounts of personal data collected by commercial social media platforms strengthens their hold over their users, a key feature of advanced capitalism (e.g., Srnicek 2017), making it difficult for users to move elsewhere and also raising concerns about privacy and the uses to which such data may be put. Harford (2023) emphasises the undesirability of such monopolisation and the importance of building in interoperability between competing systems to allow users to switch away as a means of combatting enshittification.

Continue reading

Digital Twins

Adapted from an original by MikeRun; CC BY-SA 4.0

Sometimes words or phrases are coined that seem very apposite in that they appear to capture the essence of a thing or concept and quickly become a shorthand for the phenomenon. ‘Digital twin’ is one such term, increasingly appearing in both popular and academic use with its meaning seemingly self-evident. The idea of a ‘digital twin’ carries connotations of a replica, a duplicate, a facsimile, the digital equivalent of a material entity, and conveniently summons up the impression of a virtual exact copy of something that exists in the real world.

For example, there was a great deal of publicity surrounding the latest 3D digital scan of the Titanic, created from 16 terabytes of data, 715,000 digital images and 4K video footage, and having a resolution capable of reading the serial number on one of the propellors. The term ‘digital twin’ was bandied around in the news coverage, and you’d be forgiven for thinking it simply means a high-resolution digital model of a physical object although the Ars Technica article hints at the possibility of using it in simulations to better understand the breakup and sinking of the ship. The impression gained is that a digital twin can simply be seen as a digital duplicate of a real-world object, and the casual use of the term would seem to imply little more than that. By this definition, photogrammetric models of excavated archaeological sections and surfaces would presumably qualify as digital twins of the original material encountered during the excavation, for instance.

Continue reading

Data as Mutable Mobiles

How Standards Proliferate
(c) Randall Monroe https://xkcd.com/927/ (CC-BY-NC)

As archaeologists, we frequently celebrate the diversity and messiness of archaeological data: its literal fragmentary nature, inevitable incompleteness, variable recovery and capture, multiple temporalities, and so on. However, the tools and technologies that we have developed to use and reuse that data do their best to disguise or remove that messiness. Most of the tools and technologies that we employ in the recording, location, analysis, and reuse of our data generally try to reduce its complexity. Of course, there is nothing new in this – by definition, we always tend to simplify in order to make data analysable. However, those technical structures assume that data are static things, whereas they are in reality highly volatile as they move from their initial creation through to subsequent reuse, and as we select elements from the data to address the particular research question in hand. This data instability is something we often lose sight of.

The illusion of data as a fixed, stable resource is a commonplace – or, if not specifically acknowledged, we often treat data as if this were the case. In that sense, we subscribe to Latour’s perspective of data as “immutable mobiles” (Latour 1986; see also Leonelli 2020, 6). Data travels, but it is not essentially changed by those travels. For instance, books are immutable mobiles, their immutability acquired through the printing of multiple identical copies, their mobility through their portability and the availability of copies etc. (Latour 1986, 11). The immutability of data is seen to give it its evidential status, while its mobility enables it to be taken and reused by others elsewhere. This perspective underlies, implicitly at least, much of our approach to digital archaeological data and underpins the data infrastructures that we have created over the years.

Continue reading

Productive Friction

Mastodon vs Twitter meme
Mastodon vs Twitter meme (via https://mastodon.nz/@TheAtheistAlien/109331847144353101)

Right now, the great #TwitterMigration to Mastodon is in full flood. The initial trickle of migrants when Elon Musk first indicated he was going to acquire Twitter surged when he finally followed through, sacked a large proportion of staff and contract workers, turned off various microservices including SMS two-factor authentication (accidentally or otherwise), and announced that Twitter might go bankrupt. Growing numbers of archaeologists opened accounts on Mastodon, and even a specific archaeology-focussed instance (server) was created at archaeo.social by Joe Roe.

Something most Twitter migrants experienced on first encounter with Mastodon was that it worked in a manner that was just different enough from Twitter to be somewhat disconcerting. This was nothing to do with tweets being called ‘toots’ (recently changed to posts following the influx of new users), or retweets being called ‘boosts’, or the absence of a direct equivalent to quote tweets. It had a lot to do with the federated model with its host of different instances serving different communities which meant that the first decision for any new user was which server to sign up with, and many struggled with this after the centralised models of Twitter (and Facebook, Instagram etc.) though older hands welcomed it as a reminder of how the internet used to be. It also had a lot to do with the feeds (be they Home, Local, or Federated) no longer being determined by algorithms that automatically promoted tweets but simply presenting posts in reverse chronological order. And it had to do with anti-harassment features that meant you could only find people on Mastodon if you knew their username and server, and the inability to search text other than hashtags. These were deliberately built into Mastodon, together with other, perhaps more obviously, useful features like Content Warnings on text and Sensitive Content on images, and simple alt-text handling for images.

Continue reading

Data Detachment

‘Data detachment’ via Craiyon

A couple of interesting but unrelated articles around the subject of humanities digital data recently appeared: a guest post in The Scholarly Kitchen by Chris Houghton on data and digital humanities, and an Aeon essay by Claire Lemercier and Clair Zalc on historical data analysis.

Houghton’s article emphasises the benefits of mass digitisation and large-scale analysis in the context of the increasing availability of digital data resources provided through digital archives and others. According to Houghton, “The more databases and sources available to the scholar, the more power they will have to ask new questions, discover previously unknown trends, or simply strengthen an argument by adding more proof.” (Houghton 2022). The challenge he highlights is that although digital archives increasingly provide access to large bodies of data, the work entailed in exploring, refining, checking, and cleaning the data for subsequent analysis can be considerable.

An academic who runs a large digital humanities research group explained to me recently, “You can spend 80 percent of your time curating and cleaning the data, and another 80 percent of your time creating exploratory tools to understand it.” … the more data sources and data formats there are, the more complex this process becomes. (Houghton 2022).

Continue reading

HARKing to Big Data?

Aircraft Detection Before Radar
A 1920s aircraft detector

Big Data has been described as revolutionary new scientific paradigm, one in which data-intensive approaches supersede more traditional scientific hypothesis testing. Conventional scientific practice entails the development of a research design with one or more falsifiable theories, followed by the collection of data which allows those theories to be tested and confirmed or rejected. In a Big Data world, the relationship between theory and data is reversed: data are collected first, and hypotheses arise from the subsequent analysis of that data (e.g., Smith and Cordes 2020, 102-3). Lohr described this as “listening to the data” to find correlations that appear to be linked to real world behaviours (2015, 104). Classically this is associated with Anderson’s (in)famous declaration of the “end of theory”:

With enough data, the numbers speak for themselves … Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. (Anderson 2008).

Such an approach to investigation has traditionally been seen as questionable scientific practice, since patterns will always be found in even the most random data, if there’s enough data and powerful enough computers to process it.

Continue reading

Nothing is Something

The black hole at the centre of Messier 97, via the Event Horizon Telescope (Wikimedia CC-BY)

Shannon Mattern has recently written about mapping nothing: from the ‘here be dragons’ on old maps marking the limits of knowledge and the promise of new discoveries, to the perception of the Amazon rainforest as an unpeopled wilderness until satellite imagery revealed pre-Columbian geoglyphs which had been largely invisible on the ground. In her wide-ranging essay, she makes the point that nothingness is always something: “A map of nothing demonstrates that an experiential nothingness depends upon a robust ecology of somethingness to enable its occurrence” (Mattern 2021). The question, of course, is what that something actually is.

Nothingness is something that has long been an issue in databases. Null is traditionally used to represent something missing. As null is not a value, it is technically and meaningfully distinct from zeros and empty strings which are values and hence indicators of something. Although this seems straightforward, the boundaries begin to blur when some guides to SQL, for instance, define null in terms of both missing and unknown values. After all, if something is missing, then we know we are missing it; if something is unknown, then we don’t know whether or not it was ever something. Indeed, Codd, in his classic book on relational databases argued that null should also indicate why the data is missing, distinguishing between a null that is ‘missing but applicable’, and a null that is ‘missing but inapplicable’ (Codd 1990, 173), but this was never adopted. Consequently, nulls tend to have a bad reputation because of the ways they may variously be used (mostly in error) as representing ‘nothing’, ‘unknown’, ‘value not yet entered’, ‘default value’, etc. in part because of messy implementations in database management systems.

Continue reading

Fictive Realism in Visualisation

Ray Harryhausen by David Voigt (1999). Adapted from original image by Bob Sinclair (CC BY-NC 2.0)

Michael Shanks has recently blogged about Ray Harryhausen and his stop-motion animation (Shanks 2020), sparked by an exhibition at the Scottish National Gallery of Modern Art (currently shut as a result of coronavirus restrictions). Harryhausen’s work proved inspirational to many film directors over the years, but might his technique also be inspirational for archaeological visualisation?

For example, Shanks draws a sharp distinction between the stop motion creations of Harryhausen and computer-generated imagery in the way that the technique of stop motion animation never quite disappears into the background which is part of both its charm and effect, unlike the emphasis on photorealistic models in CGI.

In CGI the objective is often to have the imagery fabricated by the computer blend in so one doesn’t notice where the fabrication begins or ends. The rhetorical purpose of CGI is to fool, to deceive. Harryhausen’s models don’t look “real”. More precisely, they don’t look “natural”. No one need be fooled. One admires the craft in their making. (Shanks 2020)

Continue reading

The Digital Derangement of Archives

Modified from original by Michael Schwarzenberger via Pixabay

Bill Caraher has recently been considering the nature of ‘legacy data’ in archaeology (Caraher 2019) (with a commentary by Andrew Reinhard). Amongst other things, he suggests there has been a shift from paper-based archives designed with an emphasis on the future to digital archives which often seem more concerned with present utility. Coincidentally, Bill’s post landed just as I was pondering the nature of the relationship between digital archives and our use of data.

So do digital archives represent a paradigm shift from traditional archives and archival practice, or are they simply a technological development of them? Digital archives are commonly understood to be a means of storing, organising, maintaining, and making data accessible in digital format. Relative to traditional archives they are therefore not limited by physical space or its associated costs and so can make much more information available more easily, cheaply, and widely. But a consequence of this can be a kind of ‘storage mania’, in which data become easier to accumulate than to delete because of digitalisation, and where data are released from the limitations of time and space through their dematerialisation (Sluis 2017, 28). This is akin to David Berry’s “infinite archives” (2017, 107), who suggests that “One way of thinking about computational archives and new forms of abstraction they produce is the specific ways in which they manage the ‘derangement’ of knowledge through distance.” (Berry 2017, 119). At the same time, digital archives represent new technological material structures built on the performativity of the software which delivers large-scale processing of these apparently dematerialised data (Sluis 2017, 28).

Continue reading