The Data Interface

Painting of woman looking out of window by Friedrich
Detail from Woman at a Window, by Casper David Friedrich (1822)

We understand knowledge construction to be social and combinatorial: we build on the knowledge of others, we create knowledge from data collected by ourselves and others, and so on. Although we pay a lot of attention to the processes behind the collection, recording, and archiving of our data, and are concerned about ensuring its findability, accessibility, interoperability, and reusability into the future, we pay much less attention to the technological mediation between ourselves and those same data. How do the search interfaces which we customarily employ in our archaeological data portals influence our use of them, and consequently affect the knowledge we create through them? How do they both enable and constrain us? And what are the implications for future interface designs?

As if to underline the lack of attention to interfaces, it’s often difficult to trace their history and development. It’s not something that infrastructure providers tend to be particularly interested in, and the Internet Archive’s Wayback Machine doesn’t capture interfaces which use dynamically scripted pages, which writes off the visual history of the first ten years or more of development of the Archaeology Data Service’s ArchSearch interface, for example. The focus is, perhaps inevitably, on maintaining the interfaces we do have and looking forward to developing the next ones, but with relatively little sense of their history. Interfaces are all too often treated as transparent, transient – almost disposable – windows on the data they provide access to.

Continue reading

Discovery Machines

A model robot reading a kindle
Adapted from the original by Brian J. Matis (CC BY-NC-SA 2.0)

Michael Brian Schiffer is perhaps best-known (amongst archaeologists of a certain age in the UK at least), for his development of behavioural archaeology, which looked at the changing relationships between people and things as a response to the processual archaeology of Binford et al. (Schiffer 1976; 2010), and for his work on the formation processes of the archaeological record (Schiffer 1987). But Schiffer also has an extensive track record of work on archaeological (and behavioural) approaches to modern technologies and technological change (e.g., Schiffer 1992; 2011) which receives little attention in the digital archaeology arena, in part because despite his interest in a host of other electrical devices involved in knowledge creation (e.g., Schiffer 2013, 81ff) he has little to say about computers beyond observing their use in modelling and simulation or as an example of an aggregate technology constructed from multiple technologies and having a generalised functionality (Schiffer 2011, 167-171).

In his book The Archaeology of Science, Schiffer introduces the idea of the ‘discovery machine’. In applying such an apparatus,

Continue reading

Digital Tools or Knowledge Devices?

A characterization of humanities knowledge based on the @gapingvoid information-knowledge visualization by Deb Verhoeven (CC-BY 4.0)

Digital tools increasingly permeate our world, supporting, enhancing, or replacing many of our day-to-day activities in archaeology as elsewhere. Many of these devices lay claim to being ‘smart’, even intelligent, though more often than not this has more to do with sleight of hand and invisible software functionality than any actual intelligence. As Ian Bogost has recently observed, the key characteristic of these so-called smart devices is not intelligence so much as online connectivity, the realisation of which brings with it external surveillance and data-gathering (Bogost 2022). Such perceptions of ‘smartness’ might also point to a tendency for us to overestimate the capabilities of digital tools while at the same time minimise their influence.

In this light, I came across an interesting quotation from an anonymous archaeologist cited in Smiljana Antonijević’s book Amongst Digital Humanists: An Ethnographic Study of Digital Knowledge Production who said:

In archaeology, digital technologies such as GIS applications, laser scanning, or databases have been used for decades, and they are as common as a trowel or any other archaeological tool. (Antonijević 2016, 49).

Continue reading

Mining the Grey

Text mining icon
Text mining icon by Julie McMurray (via Pixabay)

Archaeological grey literature reports were primarily a response to the explosion of archaeological work from the 1970s (e.g. Thomas 1991) which generated a backlog which quickly outstripped the capacity of archaeologists, funders, and publishers to create traditional outputs, and it became accepted that the vast majority of fieldwork undertaken would never be published in any form other than as a client report or summary format. This in turn (and especially in academic circles) frequently raised concerns over the quality of the reports, as well as their accessibility: indeed, Cunliffe suggested that some reports were barely worth the paper they were printed on (cited in Ford 2010, 827). Elsewhere, it was argued that the schematisation of reports could make it easier to hide shortcomings and lead to lower standards (e.g. Andersson et al. 2010, 23). On the other hand, it was increasingly recognised that such reports had become the essential building blocks for archaeological knowledge to the extent that labelling them ‘grey’ was something of a misnomer (e.g. Evans 2015, sec 5), and the majority of archaeological interventions across Europe were being carried out within the framework of development-led archaeology rather than through the much smaller number of more traditional research excavations (e.g. Beck 2022, 3).

Continue reading

Data in a Crisis

One of the features of the world-wide COVID-19 pandemic over the past eighteen months has been the significance of the role of data and associated predictive data modelling which have governed public policy. At the same time, we have inevitably seen the spread of misinformation (as in false or inaccurate information that is believed to be true) and disinformation (information that is known to be false but is nevertheless spread deliberately), stimulating an infodemic alongside the pandemic. The ability to distinguish between information that can be trusted and information which can’t is key to managing the pandemic, and failure to do so lies behind many of the surges and waves that we have witnessed and experienced. Distinguishing between information and mis/disinformation can be difficult to do. The problem is all too often fuelled by algorithmic amplification across social media and compounded by the frequent shortage of solid, reliable, comprehensive, and unambiguous data, and leads to expert opinions being couched in cautious terms, dependent on probabilities and degrees of freedom, and frustratingly short on firm, absolute outcomes. Archaeological data is clearly not in the same league as pandemic health data, but it still suffers from conclusions drawn on often weak, always incomplete data and is consequently open to challenge, misinformation, and disinformation.

Continue reading

Nothing is Something

The black hole at the centre of Messier 97, via the Event Horizon Telescope (Wikimedia CC-BY)

Shannon Mattern has recently written about mapping nothing: from the ‘here be dragons’ on old maps marking the limits of knowledge and the promise of new discoveries, to the perception of the Amazon rainforest as an unpeopled wilderness until satellite imagery revealed pre-Columbian geoglyphs which had been largely invisible on the ground. In her wide-ranging essay, she makes the point that nothingness is always something: “A map of nothing demonstrates that an experiential nothingness depends upon a robust ecology of somethingness to enable its occurrence” (Mattern 2021). The question, of course, is what that something actually is.

Nothingness is something that has long been an issue in databases. Null is traditionally used to represent something missing. As null is not a value, it is technically and meaningfully distinct from zeros and empty strings which are values and hence indicators of something. Although this seems straightforward, the boundaries begin to blur when some guides to SQL, for instance, define null in terms of both missing and unknown values. After all, if something is missing, then we know we are missing it; if something is unknown, then we don’t know whether or not it was ever something. Indeed, Codd, in his classic book on relational databases argued that null should also indicate why the data is missing, distinguishing between a null that is ‘missing but applicable’, and a null that is ‘missing but inapplicable’ (Codd 1990, 173), but this was never adopted. Consequently, nulls tend to have a bad reputation because of the ways they may variously be used (mostly in error) as representing ‘nothing’, ‘unknown’, ‘value not yet entered’, ‘default value’, etc. in part because of messy implementations in database management systems.

Continue reading

Digital Recall

Total RecallWe’ve all experienced that rush of recollection when we uncover some long-hidden or long-lost object from our past in the bottom of a drawer or box, triggering memories of encounters, activities, people, and places. We’re accustomed to the idea that we use evocative things as stored memories, deliberately or inadvertently, and as distributed extensions of our embodied memory (e.g. Heersmink 2018). Is it the same with digital objects? For example, van Dijck asks:

Are analog and digital objects interchangeable in the making, storing, and recalling of memories? Do digital objects change our inscription and remembrance of lived experience, and do they affect the memory process in our brains? (2007, xii).

Perhaps it’s a neurosis brought on by the contemplation of my excavation backlog, but I think there is a difference: that not all analog objects are equally interchangeable with digital equivalents in terms of their functioning as distributed memories, and that this difference is significant when we consider the archaeological narratives we are able to construct from our digital records. It may be that this perspective is coloured by the physical nature of my backlog from the 1980s and 1990s which for various reasons sits on the cusp of analog/digital recording. Although Ruth Tringham recalls how in the 1980s the digital recording of hitherto paper records was distrusted (Tringham 2010, 87), not least due to concerns about the fragility of the hardware and impermanence of the product, in my case it was rather more prosaic: as someone working with computers full-time in my day job I had no desire to turn my excavation experience into a busman’s holiday as the on-site computer technician. The downside was that I subsequently gave myself the monumental task of manually entering the record sheets into the database and scanning/digitising the plans and sections in the off-season. In retrospect, however, this provides the opportunity to consider the different affordances of the two sets of analog and digital records, a perception that is reinforced by the pre-pandemic experience of packing my office which incorporated two days of sorting and moving the physical archive and about five minutes transferring the digital files.

Continue reading

Dark Data

There are quite a few metaphors associated with archaeological data, many of which relate to its apparent mystery. For example, Gavin Lucas has described the archaeological record as being “haunted by absences” created by decay and destruction (Lucas 2012, 178). In a similar vein, Alison Wylie has described archaeological data as “shadowy” and that archaeology is defined “by the challenges of working with gaps and absences in its primary data” (Wylie 2017, 204). In a special issue of the Science, Technology, & Human Values journal on ‘Data Shadows’, Leonelli et al. describe data in terms of its presence, but also in terms of its unavailability, inaccessibility, or its absence, defining absence as a descriptor of how “data are missing, incomplete, unreliable, ignored, unwanted, or untagged”  (Leonelli et al. 2017, 192). As Chris Chippendale described it,

Archaeology is plagued in many an instance with poorly defined variables (usually thought of as ‘data’) drawn from ill-understood populations, and with uncertain articulations between the entities whose logical relations we seek to understand. (2000, 611)

Continue reading

Intrinsic Digitality

One might imagine that a claim that

“The archaeological record is intrinsically digital, not in the sense that it turns digital once the data have been entered and processed, but, more radically, in the sense that it is by its very nature digital, in its genesis and its structure.” (Buccellati 2017, 232)

would pique the interest of any digital archaeologist. But strangely, that seems not to be the case: Giorgio Buccellati’s book appears to be currently unreviewed and largely, it seems, unremarked upon. Two exceptions to this generalisation are Gavin Lucas and Bill Caraher. In his latest book, Gavin Lucas suggests that Buccellati’s characterisation of archaeology as natively digital is problematic (2019, 91), but the critique is limited as the book’s focus lies elsewhere, on textuality. In his response to Sara Perry and James Taylor’s ‘Theorising the Digital’ paper (2018), in which they point to the disconnect between the demonstrable impact of digital archaeology on archaeological method relative to its comparative lack of effect on archaeological theory, Bill Caraher suggests (2018) that Buccellati’s book represents a rare example of the interplay between digital theory and broader archaeological theory. So why does Buccellati argue that archaeology is natively digital? And is his characterisation of digitality useful to digital archaeology, as well as to archaeology more broadly?

Continue reading

Explainability in digital systems

Created via http://www.hetemeel.com/

Some time ago, I suggested that machine-learning systems in archaeology ought to be able to provide human-scale explanations in support of their conclusions, noting that many of the techniques used in ML were filtering down into automated methods used to classify, extract and abstract archaeological data. I concluded: “We would expect an archaeologist to explain their reasoning in arriving at a conclusion; why should we not expect the same of a computer system?”.

This seemed fair enough at the time, if admittedly challenging. What I hadn’t appreciated, though, was the controversial nature of such a claim. For sure, in that piece I referred to Yoshua Bengio’s argument that we don’t understand human experts and yet we trust them, so why should we not extend the same degree of trust to an expert computer (Pearson 2016)? But it transpires this is quite a common argument posited against claims that systems should be capable of explaining themselves, not least among high-level Google scientists. For example, Geoff Hinton recently suggested in an interview that to require that you can explain how your AI systems works (as, for example, the GDPR regulations do) would be a disaster:

Continue reading