Digital history & historiography

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898-1920

This article aims to offer a methodological contribution to digital humanities by exploring the value of a mixed-method approach to uncover and understand historical patterns in large quantities of textual data. It refines the distant reading technique of topic modelling (TM) by using the discourse-historical approach (DHA——Wodak, 2001) in order to analyse the mechanisms underlying discursive practices in historical newspapers. Specifically, we investigate public discourses produced by Italian minorities and test the methodology on a corpus of digitized Italian ethnic newspapers published in the USA between 1898 and 1920 (ChroniclItaly—Viola, 2018). This combined methodology, which we suggest to label ‘discourse-driven topic modelling’ (DDTM), enabled us to triangulate linguistic, social, and historical data and to examine how the changing experience of migration, identity construction, and assimilation was reflected over time in the accounts of the minorities themselves. The results proved DDTM to be effective in obtaining a categorization of the topics discussed in the immigrant press. The changing distribution of topics over time revealed how the Italian immigrant community negotiated their sense of connectedness with both the host country and the homeland. At the same time, without jeopardizing the analytical depth of the findings, the method proved its value of minimizing the risk of biases when identifying the topics which stemmed from the results rather than from preconceived assumptions.

Show this publication on our institutional repository (