Histoire numérique et l’historiographie

impresso - Media Monitoring of the Past II. Beyond Borders: Connecting Historical Newspapers and Radio

impresso. Media Monitoring of the Past is an interdisciplinary research project in which a team of computational linguists, designers and historians collaborate on the datafication of a multilingual corpus of historical media. The primary goals of the project are to improve text mining tools for historical text, to enrich historical documents with (semi-) automatically generated data and to integrate such data into historical research workflows by means of newly developed user interfaces.

In a first phase (2017-2020), impresso enriched a corpus of 76 newspapers from Luxembourg and Switzerland using text mining techniques which included, for example, named entity recognition, topic modeling, content type and text reuse detection as well as image similarity detection. In parallel, impresso developed a research platform for the exploration and critical study of this enriched data which is freely available at https://impresso-project.ch/app/.

In a second phase (2023-2027) impresso will expand its corpus to include newspapers and radio collections from 20 Western European libraries, archives and private partners. Among the objectives is the creation of a dense vector space which will enable new comparative views of such document collections across time, languages and media types. In addition, impresso will develop new interfaces for the computational analysis of this data by means of user-facing APIs and executable notebooks.

The presentation will reflect on usages of the existing application, its added value for historians and the challenges we have observed over the past years. Finally, we will give an outlook on work planned in the coming years.

