Digital history & historiography

Machine Learning to Geographically Enrich Understudied Sources: A Conceptual Approach

written by

Lorella Viola

published on

1 January 2020

This paper discusses the added value of applying machine learning (ML) to contextually enrich digital collections. In this study, we employed ML as a method to geographically enrich historical datasets. Specifically, we used a sequence tagging tool (Riedl and Padó 2018) which implements TensorFlow to perform NER on a corpus of historical immigrant newspapers. Afterwards, the entities were extracted and geocoded. The aim was to prepare large quantities of unstructured data for a conceptual historical analysis of geographical references. The intention was to develop a method that would assist researchers working in spatial humanities, a recently emerged interdisciplinary field focused on geographic and conceptual space. Here we describe the ML methodology and the geocoding phase of the project, focussing on the advantages and challenges of this approach, particularly for humanities scholars. We also argue that, by choosing to use largely neglected sources such as immigrant newspapers (a lso known as ethnic newspapers), this study contributes to the debate about diversity representation and archival biases in digital practices.

Show this publication on our institutional repository (orbi.lu).

Author(s)

Lorella Viola

Lorella is a Postdoctoral Research Associate working on the DHARPA project.

More about this author →

Machine Learning to Geographically Enrich Understudied Sources: A Conceptual Approach

Author(s)

Tags

24 November 2024

impresso Text Reuse at Scale. A Prototype Interface for the Exploration of Text Reuse Data in Semantically Enriched Historical Newspapers

18 April 2024

„Digitale Geschichtswissenschaft - quo vadis?“

research areas

Public history

Contemporary history of Luxembourg

Contemporary history of Europe

Digital history & historiography

Machine Learning to Geographically Enrich Understudied Sources: A Conceptual Approach

Author(s)

Tags

related content

24 November 2024

impresso Text Reuse at Scale. A Prototype Interface for the Exploration of Text Reuse Data in Semantically Enriched Historical Newspapers

18 April 2024

„Digitale Geschichtswissenschaft - quo vadis?“

research areas

Public history

Contemporary history of Luxembourg

Contemporary history of Europe

Digital history & historiography