Frozen Data

Original image by Noel Bauza from Pixabay

There was a flurry of interest in the technical press during the summer with the news that GitHub had placed much of the open source code it held into an almost improbably long-term Arctic archive (e.g. Kimball 2020; Metcalf 2020; Vaughan 2020). GitHub’s timing seemed propitious: in the midst of a global pandemic, with wild fires burning out of control on the west coast of the USA and elsewhere, and with upgrades to the nearby Global Seed Vault recently finished after being flooded as a consequence of global warming.

The Arctic World Archive was set up by Piql in 2017 and situated in a decommissioned mineshaft deep within the permafrost near Longyearbyen on the Svalbard archipelago. The data are stored on reels of piqlFilm (see Piql 2019, Piql nd), a high-resolution photosensitive film claimed to be secure for 750 years (and over 1000 years in cold low-oxygen conditions) and hence require no cycle of refresh and migrate, unlike all other forms of digital archive. The film holds both analog (text, images etc.) and digital information, with digital data stored as high resolution QR codes. Explanations of how to decode and retrieve the information are included as text at the beginning of each reel that can simply be read by holding it up to a light source with a magnifying glass, and Piql claim that only a camera/scanner and a computer of some kind will be required to restore the information in the future which means that the archive outlives any technology used to store the data in the first place.

Continue reading

Digital Recall

Total RecallWe’ve all experienced that rush of recollection when we uncover some long-hidden or long-lost object from our past in the bottom of a drawer or box, triggering memories of encounters, activities, people, and places. We’re accustomed to the idea that we use evocative things as stored memories, deliberately or inadvertently, and as distributed extensions of our embodied memory (e.g. Heersmink 2018). Is it the same with digital objects? For example, van Dijck asks:

Are analog and digital objects interchangeable in the making, storing, and recalling of memories? Do digital objects change our inscription and remembrance of lived experience, and do they affect the memory process in our brains? (2007, xii).

Perhaps it’s a neurosis brought on by the contemplation of my excavation backlog, but I think there is a difference: that not all analog objects are equally interchangeable with digital equivalents in terms of their functioning as distributed memories, and that this difference is significant when we consider the archaeological narratives we are able to construct from our digital records. It may be that this perspective is coloured by the physical nature of my backlog from the 1980s and 1990s which for various reasons sits on the cusp of analog/digital recording. Although Ruth Tringham recalls how in the 1980s the digital recording of hitherto paper records was distrusted (Tringham 2010, 87), not least due to concerns about the fragility of the hardware and impermanence of the product, in my case it was rather more prosaic: as someone working with computers full-time in my day job I had no desire to turn my excavation experience into a busman’s holiday as the on-site computer technician. The downside was that I subsequently gave myself the monumental task of manually entering the record sheets into the database and scanning/digitising the plans and sections in the off-season. In retrospect, however, this provides the opportunity to consider the different affordances of the two sets of analog and digital records, a perception that is reinforced by the pre-pandemic experience of packing my office which incorporated two days of sorting and moving the physical archive and about five minutes transferring the digital files.

Continue reading

Dark Data

There are quite a few metaphors associated with archaeological data, many of which relate to its apparent mystery. For example, Gavin Lucas has described the archaeological record as being “haunted by absences” created by decay and destruction (Lucas 2012, 178). In a similar vein, Alison Wylie has described archaeological data as “shadowy” and that archaeology is defined “by the challenges of working with gaps and absences in its primary data” (Wylie 2017, 204). In a special issue of the Science, Technology, & Human Values journal on ‘Data Shadows’, Leonelli et al. describe data in terms of its presence, but also in terms of its unavailability, inaccessibility, or its absence, defining absence as a descriptor of how “data are missing, incomplete, unreliable, ignored, unwanted, or untagged”  (Leonelli et al. 2017, 192). As Chris Chippendale described it,

Archaeology is plagued in many an instance with poorly defined variables (usually thought of as ‘data’) drawn from ill-understood populations, and with uncertain articulations between the entities whose logical relations we seek to understand. (2000, 611)

Continue reading

The Digital Derangement of Archives

Modified from original by Michael Schwarzenberger via Pixabay

Bill Caraher has recently been considering the nature of ‘legacy data’ in archaeology (Caraher 2019) (with a commentary by Andrew Reinhard). Amongst other things, he suggests there has been a shift from paper-based archives designed with an emphasis on the future to digital archives which often seem more concerned with present utility. Coincidentally, Bill’s post landed just as I was pondering the nature of the relationship between digital archives and our use of data.

So do digital archives represent a paradigm shift from traditional archives and archival practice, or are they simply a technological development of them? Digital archives are commonly understood to be a means of storing, organising, maintaining, and making data accessible in digital format. Relative to traditional archives they are therefore not limited by physical space or its associated costs and so can make much more information available more easily, cheaply, and widely. But a consequence of this can be a kind of ‘storage mania’, in which data become easier to accumulate than to delete because of digitalisation, and where data are released from the limitations of time and space through their dematerialisation (Sluis 2017, 28). This is akin to David Berry’s “infinite archives” (2017, 107), who suggests that “One way of thinking about computational archives and new forms of abstraction they produce is the specific ways in which they manage the ‘derangement’ of knowledge through distance.” (Berry 2017, 119). At the same time, digital archives represent new technological material structures built on the performativity of the software which delivers large-scale processing of these apparently dematerialised data (Sluis 2017, 28).

Continue reading

The Death of Data

Dead Data” by Stinging Eyes CC BY-SA 2.0

Yesterday was World Digital Preservation Day and saw the publication of the Digital Preservation Coalition’s Bitlist – their global list of Digitally Endangered Species. Interestingly, under their ‘Practically Extinct’ category (“when the few known examples are inaccessible by most practical means and methods”) sits Unpublished Research Data, which they define as

“research data which has not been shared or published by any means and is thus in contravention of the ‘FAIR’ principles which require data to be Findable Accessible, Interoperable and Reusable”.

Although the DPC jury hopes that this is a small group, I rather suspect that there is an unseen mountain of unpublished research data in archaeology (and in the interest of full disclosure: reader, I have some).

This crossed my screen at the same time as a paper published in the Harvard Data Science Review by Stephen Stigler: ‘Data Have a Limited Shelf Life’, in which he argues that data, unlike wines, do not improve with age. He suggests that old data are “Often … no more than decoration; sometimes they may be misleading in ways that cannot easily be discovered”, while emphasising this is not the same as saying they have no value. Using three examples of old statistical data, he shows how misleading and incomplete they can be if their full background is not known. In each case, the data were selected from a prior source, not always accurately referenced if at all. In some instances, uncovering the original data flagged problems with the sample that had been taken, in others it revealed a greater breadth and depth of information which had gone un-used because the particular research question had stripped them away.

Continue reading

Delving into Data Reuse

Given the years, the money, expertise and energy we’ve spent on creating and managing archaeological data archives, the relative lack of evidence of reuse is a problem. Making our data open and available doesn’t equate to reusing it, nor does making it accessible necessarily correspond to making it usable. But if we’re not reusing data, how can we justify these resources? In their reflections on large-scale online research infrastructures Holly Wright and Julian Richards (2018) have recently suggested that we need to understand how to optimize archives and their interfaces in order to maximize the use and reuse of archaeological data, and explore how archaeological archives can better respond to user needs alongside ways to document and understand both quantitative and qualitative reuse.

However, I would argue that all these kinds of issues (alongside those of citation, recognition, training, etc.) while not resolved are at least known and mostly acknowledged. The real challenges to data reuse lie elsewhere and entail a much deeper understanding and appreciation of what reuse entails: issues associated with the re-presentation and interpretation of old data, the nature and purpose of reuse, and the opportunities and risks presented by reuse. Such questions are not specific to digital data; however, digital data change the terms of engagement with their near-instant access, volume, and flexibility, and their potentially transformative effects on the practice of archaeology now and in the future.

Continue reading

On Digital Scholarship

I recently published a paper, ‘Resilient Scholarship in the Digital Age’, which looked at the tensions between digital practice and academic labour (Huggett 2019). My focus was on the nature of academic experience within the modern university and the way in which the professional and personal life of the university academic is influenced by the digital technologies which enable and support the neoliberal commodification and commercialisation of universities (at least in the UK, North America and Australasia). It was a difficult paper to write, not least because of a strong personal interest and involvement, but also because of the way it ranged across digital sociology, the sociality of labour, resilience theory, management theory, feminist and Marxist theory, and so on, most of which was entirely new to me.

The referees were very positive in their comments (thankfully!), but one particular observation they made was that in focussing on university academia, I overlooked the implications for archaeological scholarship more widely, given that much of it occurs within the realms of Cultural Resource Management and related contract work, within governmental departments and non-governmental agencies, as well as within community initiatives. This is certainly true, as is underlined in the periodic surveys of archaeological employment in the UK (e.g. Aitchison 2019). However, in my response to the editors I argued that this was too broad a definition of scholarship for the scope of this particular paper, and, perhaps more importantly, would require a level of knowledge about the scholarly experience outside the university environment that I simply didn’t have – it’s some 30 years since I worked in contract archaeology, for example. Other people are better qualified than I to discuss scholarship in these working contexts.

Continue reading

Dipping in Data Lakes

We’re becoming increasingly accustomed to talk of Big Data in archaeology and at the same time beginning to see the resurgence of Artificial Intelligence in the shape of machine learning. And we’ve spent the last 20 years or so assembling mountains of data in digital repositories which are becoming big data resources for mining in the pursuit of machine learning training data. At the same time we are increasingly aware of the restrictions that those same repositories impose upon us – the use of pre-cooked ‘what/where/when’ queries, the need to (re)structure data in order to integrate different data sources and suppliers, and their largely siloed nature which limits cross-repository connections, for example. More generally, we are accustomed to the need to organise our data in specific ways in order to fit the structures imposed by database management systems, or indeed, to fit our data into the structures predefined by archaeological recording systems, both of which shape subsequent analysis. But what if it doesn’t need to be this way?

Continue reading

Towards a digital ethics of agential devices

Image by Rawpixel CC0 1.0 via Creative Commons

Discussion of digital ethics is very much on trend: for example, the Proceedings of the IEEE special issue on ‘Ethical Considerations in the Design of Autonomous Systems’ has just been published (Volume 107 Issue 3), and the Philosophical Transactions of the Royal Society A published a special issue on ‘Governing Artificial Intelligence – ethical, legal and technical opportunities and challenges’ late in 2018. In that issue, Corinne Cath (2018, 3) draws attention to the growing body of literature surrounding AI and ethical frameworks, debates over laws governing AI and robotics across the world and points to an explosion of activity in 2018 with a dozen national strategies published and billions in government grants allocated. She also notes the way that many of the leaders in both debates and the technologies are based in the USA which itself presents an ethical issue in terms of the extent to which AI systems mirror the US culture rather than socio-cultural systems elsewhere around the world (Cath 2018, 4).

Agential devices, whether software or hardware, essentially extend the human mind by scaffolding or supporting our cognition. This broad definition therefore runs the gamut of digital tools and technologies, from digital cameras to survey devices (e.g. Huggett 2017), through software supporting data-driven meta-analyses and their incorporation in machine-learning tools, to remotely controlled terrestrial and aerial drones, remotely operated vehicles, autonomous surface and underwater vehicles, and lab-based robotic devices and semi-autonomous bio-mimetic or anthropomorphic robots. Many of these devices augment archaeological practice, reducing routinised and repetitive work in the office environment and in the field. Others augment work by developing data-driven methods which represent, store, and manipulate information in order to undertake tasks previously thought to be uncomputable or incapable of being automated. In the process, each raises ethical issues of various kinds. Whether agency can be associated with such devices can be questioned on the basis that they have no intent, responsibility or liability, but I would simply suggest that anything we ascribe agency to acquires agency, especially bearing in mind the human tendency to anthropomorphize our tools and devices. What I am not suggesting, however, is that these systems have a mind or consciousness themselves, which represents a whole different ethical set of questions.

Continue reading

Intrinsic Digitality

One might imagine that a claim that

“The archaeological record is intrinsically digital, not in the sense that it turns digital once the data have been entered and processed, but, more radically, in the sense that it is by its very nature digital, in its genesis and its structure.” (Buccellati 2017, 232)

would pique the interest of any digital archaeologist. But strangely, that seems not to be the case: Giorgio Buccellati’s book appears to be currently unreviewed and largely, it seems, unremarked upon. Two exceptions to this generalisation are Gavin Lucas and Bill Caraher. In his latest book, Gavin Lucas suggests that Buccellati’s characterisation of archaeology as natively digital is problematic (2019, 91), but the critique is limited as the book’s focus lies elsewhere, on textuality. In his response to Sara Perry and James Taylor’s ‘Theorising the Digital’ paper (2018), in which they point to the disconnect between the demonstrable impact of digital archaeology on archaeological method relative to its comparative lack of effect on archaeological theory, Bill Caraher suggests (2018) that Buccellati’s book represents a rare example of the interplay between digital theory and broader archaeological theory. So why does Buccellati argue that archaeology is natively digital? And is his characterisation of digitality useful to digital archaeology, as well as to archaeology more broadly?

Continue reading