Created for the Library of Congress, Newspaper Navigator re-imagines how we search the rich visual content in historic newspapers. The first phase of the project utilized machine learning techniques to extract visual content from 16.3 million digitized newspaper pages in Chronicling America. 1 This resulted in the Newspaper Navigator dataset, released in May 2020. The dataset and finetuned machine learning model 2 are in the public domain. A paper on the dataset was presented at the 2020 ACM Conference on Information Knowledge & Management (CIKM).
The second phase consisted of building a search application for 1.5 million photos from the dataset. The search application was launched in September 2020. In addition to supporting faceted and keyword search, it empowers users to search by visual similarity by training an interactive machine learning model called an “AI navigator,” which enables users to retrieve photos of topics such as “baseball players” or “sailboats” even if their captions do not contain these keywords. An AI navigator can train and predict over all 1.5 million photos in a couple seconds. This new search affordance forms the basis for Benjamin Lee’s Ph. D. dissertation research, which re-imagines standard faceted search as “open faceted search.” A demo of the search application was presented at the 2020 ACM Symposium on User Interface and Software Technology (UIST).
Show this publication on our institutional repository (orbi.lu).