Histoire numérique et l’historiographie

DHBenelux 2017 - Infrastructures everywhere

DHBenelux 2017 - Infrastructures everywhere
Last week I was at the fourth DHBenelux conference held in Utrecht, the Netherlands. With over 200 participants attending two days of almost a 100 presentations, demos, and posters, this conference has become an interesting snapshot of the state of digital humanities in the Benelux.

When I summarised the first conference in 2014, I considered how the presentations argued for the necessity of technology, to appropriate it for humanities research, but also learn to understand how it works. For the second conference in 2015, I considered the limitations of tools, and the need for critical reflection on the results, as well as on HCI methods to study how tools should function for humanities research. This year’s DHBenelux showed many of these debates are still relevant today, and while I could not possibly describe the entire conference within a single blogpost (if only because I could not attend every presentation), for me this year’s conference framed these debates within the approach of infrastructures.

Carlos Martinez-Ortiz in his presentation1 declared (media) scholars need two things: 1) access to collections, and 2) tools to make sense of this data, notably for searching, filtering, aggregation, and contextualisation. One approach to these two requirements is to provide infrastructures including virtual research environments (VREs). Infrastructures are the technology underlying research, with the goal of enabling research by low threshold access to data and tools. Arguably these two features explain why there were respectively only few presentations about research results, or about high investment features of DH such as learning to code.

Underlying research

As a first requirement, the infrastructures provide access to collections. Before the conference, several workshops were offered. A workshop that aimed to help scholars investing in method to use a collection was for example the Delpher workshop, where scholars could learn how to work with the KB newspaper dataset. I attended the Tool Criticism workshop, where we discussed the continuous interaction between research question, tool, and dataset in the exploratory phase of a research project2. As the data limits what the tool can do, and the tool determines to some extent what can be asked, while the research question shapes what a scholar wants to do, none of these three can be said to be the sole determining factor of a research project. Laura Hollink demonstrated3 the importance of knowing what a dataset consists of. Investigating the corpus of transcriptions of the European Parliament, she compared the topics of the different languages. Interestingly, she found very large differences, so that a scholar working on one language might get entirely different results than a scholar working on another language. Whether this discrepancy was a result of the corpus, or an artefact of the tool determining topics working better in one language than another was still an open question.

The underlying aspect of DH work was also brought up as an issue in career development during a panel on text mining.4 Pim Huijnen stated that while doing computational advanced stuff does not earn you any applause by historians, doing historical research with stable and known digital methods does not get you in DH conferences. This comment is related to the second aspect of infrastructures: who should invest in the new methods?

Low thresholds and high investments

The second requirement is enabling scholars to perform analyses on the collections, usually in the form of tools. During the opening, Frans Wiering brought up the paradox of technology:

The same technology that simplifies life by providing more functions in each device also complicates life by making the device harder to learn, harder to use. This is the paradox of technology.

— Donald Norman, The Design of Everyday Things (1988)

A recurring debate during the conference was the extent to which scholars must understand the underlying technology. During the aforementioned text mining workshop, Ralf Futselaar compared the new digital methods with the introduction of new cars, where owners did not have the time to invest in learning the technology, and instead hired drivers to do that for them. Serge ter Braake in his presentation5 speculated that there is a trade-off of investing in understanding a tool, and at some point deciding that the tool is too opaque so that it is better to just close read the material. During the QA I replied that scholars might also go in the opposite direction: could at some point the tool be too opaque so the scholar will have to accept not understanding their methods fully and trust the developers? It is simply not always feasible to explain everything, as was demonstrated by Sally Chambers et al.6 In their presentation on developing a framework for tool and data criticism, they discussed the case of TRACER, a tool that does have an extensive user manual, but does not explain how the underlying algorithm works, because it has over 700 different algorithms. This tension also became clear in the workshop on tool criticism: there are so many tools out there (as illustrated by the daunting list by the DMI that was presented as a starting point for choosing a tool) that it’s hard to decide what to use upfront, so that one might stick with what one knows from colleagues, or what is simply offered within a VRE. On the other hand, other participants decided that a tool had too many assumptions, so they preferred to export a CSV as they felt more in control in Excel.

Infrastructures everywhere!

Arguably the emphasis on infrastructures was steered by the strong presence of the many infrastructure projects; CLARIN, CLARIAH, DARIAH and PARTHENOS were all present with flyers, papers, workshops, and events. It could also be an artefact of my own research on digital history infrastructuring work as negotiations of incentives which I presented at the conference. After the presentation of Carlos Martinez-Ortiz, Julia Noordegraaf commented access to collections through infrastructures and VREs is necessary for reasons of copyright, which I took to Twitter as a possible defeating argument against small open source tools in DH. Does DH necessarily require infrastructures?

Perhaps this is a question to discuss more in depth next year. The next DHBenelux will be held in Amsterdam from 6-8 June 2018, hosted by the new KNAW Humanities Cluster.

  • 1. From Tools to “Recipes”: Building a Media Suite within the Dutch Digital Humanities Infrastructure CLARIAH. Carlos Martinez-Ortiz, Roeland Ordelman, Marijn Koolen, Julia Noordegraaf, Liliana Melgar, Lora Aroyo, Jaap Blom, Victor de Boer, Willem Melder, Jasmijn van Gorp, Eva Baaren, Kaspar Beelen, Norah Karrouche, Oana Inel, Rosita Kiewik, Themis Karavellas and Thomas Poell
  • 2. This discussion was largely based on the blogpost by Trevor Owens Where to Start? On Research Questions in The Digital Humanities
  • 3. Bias in the analysis of multilingual legislative speech. Laura Hollink, Astrid van Aggelen and Jacco van Ossenbruggen
  • 4. Text mining in practice: A discussion on user-applied text mining techniques in historical research. Jesse de Does, Yasuto Nakano, Melvin Wevers, Pim Huijnen and Milan van Lange
  • 5. The Pyramid of Conscientious Digital Humanities Research: how to get a ‘general idea of what you should be seeing. Serge ter Braake
  • 6. Towards a tool and data criticism framework: a developer’s and user’s perspective. Sally Chambers, Greta Franzini, Joke Daems and Marco Büchler