Histoire numérique et l’historiographie

C²DH receives funding for a project on critical text mining in historical newspapers

toutes les news
Media Monitoring of the Past
The aim of the project “Media monitoring of the past. Mining 200 years of historical newspapers” is to link digitised corpora of newspapers from Switzerland, Luxembourg, France and Germany and to develop new methods to analyse them.

Over the next three years, the Luxembourg Centre for Contemporary and Digital History (C²DH) will work in cooperation with the DHLAB at the École polytechnique fédérale de Lausanne (EPFL) and the Institute for Computational Linguistics at the University of Zurich on this project, which will receive 1.7 million Swiss francs in funding from the Swiss National Science Foundation (SNSF). Associated project partners include the Luxembourg National Library, the Swiss National Library, the Swiss newspapers Le Temps and Neue Zürcher Zeitung, Swiss archives, and researchers from the University of Lausanne. In Luxembourg the project will be coordinated by Dr Marten Düring, Dr. Lars Wieneke and Prof. Dr Andreas Fickers, in coordination with Daniele Guido and Estelle Bunout.

Historical newspapers represent a wealth of archival material, and many have already been digitised. However, conducting research using these sources raises a number of problems, including a lack of text searchability as a result of poor text recognition and missing metadata, the relative isolation of digitised newspapers within their respective archives, search functions that are difficult to use, and poorly designed user interfaces. Recent progress in text analysis has also opened up new possibilities for conducting research on large collections of texts.

The project will develop new deep learning methods with the aim of correcting errors in text recognition, improving the identification of people, institutions and places, and enhancing this entity recognition using external data repositories. The C²DH will be responsible for developing a user interface that will incorporate new search functions and facilitate the critical analysis of the newspaper corpora. This may include providing information on the provenance of the data and the quality of automatically generated annotations, as well as indicating any gaps in the inventory.

To boost the relevance of the project for history, the humanities and social sciences in general, the C²DH will coordinate a series of workshops that will provide a forum for users and developers to exchange their ideas. Further links between history, computer science and design will be developed via an associated C²DH-based research project on resistance to European unification in the late 19th and early 20th centuries. Finally, the project will also be used for University teaching, giving young scholars the opportunity to explore automated methods for the extraction and representation of information from historical sources.

The project will not only lead to academic publications; at the end of the project, the individual processing, analysis and storage systems will also be made available on an open source basis for others to reuse and develop.

The SNSF’s Sinergia Programme offers exclusive support to interdisciplinary collaborative research groups working on pioneering research. Projects are eligible for funding under Sinergia if they draw on theories and methods from two or more disciplines, with a similar degree of importance being attached to all the disciplines involved, and if the respective partners can provide complementary skills and knowledge.