A couple of interesting but unrelated articles around the subject of humanities digital data recently appeared: a guest post in The Scholarly Kitchen by Chris Houghton on data and digital humanities, and an Aeon essay by Claire Lemercier and Clair Zalc on historical data analysis.
Houghton’s article emphasises the benefits of mass digitisation and large-scale analysis in the context of the increasing availability of digital data resources provided through digital archives and others. According to Houghton, “The more databases and sources available to the scholar, the more power they will have to ask new questions, discover previously unknown trends, or simply strengthen an argument by adding more proof.” (Houghton 2022). The challenge he highlights is that although digital archives increasingly provide access to large bodies of data, the work entailed in exploring, refining, checking, and cleaning the data for subsequent analysis can be considerable.
An academic who runs a large digital humanities research group explained to me recently, “You can spend 80 percent of your time curating and cleaning the data, and another 80 percent of your time creating exploratory tools to understand it.” … the more data sources and data formats there are, the more complex this process becomes. (Houghton 2022).