The U-CORE team (Researching the Collecting, Preserving, Analyzing, and Disclosing of Ukrainian Testimonies of the War, project leader Prof. Machteld Venken) has developed a workflow that can be used and applied by others collecting personal memories in conflict zones. The aim is to ensure that not only are testimonies preserved for future generations of historians, but that the eyewitnesses are also protected now.
Humans as a Valuable Source of Information
The current conflicts around the world and the spread of manipulated or AI generated visual information bring oral history to the forefront as a method of documenting war through the eyes of witnesses. Being able to rely on firsthand experiences in historiographical research is a well-established and proven method. However, the situation changes when people are speaking about ongoing wars. Their testimonies could potentially create serious risks for themselves, either now or in the future, depending on how the war concludes. Therefore, all testimonies should be treated preventing data breaches. How can this be achieved? U-CORE team is designing a workflow.
Securing Collected Information in Three Locations
First, we differentiate the data and divide it into three categories: personal data of the narrator, a pseudonymized audio testimony file, and the pseudonymization key. The design of our workflow proposes to store every type of information in a different location, so that a potential hacker would need to access three separate servers to connect all the dots. Knowing what a person said in the pseudonymized interview won't reveal their identity, just as having their personal data won’t disclose what they said as the pseudonym key alone.
Additional Protection Measures
To ensure that the data cannot be leaked before it reaches secure locations, interviews are recorded on an encrypted recorder operated by the interviewer, Dr. Kateryna Zakharchuk. The audio file is then transferred to the encrypted server of the FHSE-MediaCentre, before being managed in the CatDV catalog for audiovisual products. By this point, the eyewitness's identity is already pseudonymized, so the audio testimony should be disconnected from the narrator's personal data.
Making the Testimony Available to the Public
The current U-CORE collection contains over 400 interviews conducted with Ukrainians in Luxembourg, Poland, and Ukraine, gathered in a central digital environment according to the data model developed by Dr. Inna Ganschow. A part of the collection is intended to be accessible not only to the research team but also to the general public, students, and journalists on the Oral History Digital platform. Some interviews are as long as 9 hours, with the average duration being around 2 hours. To simplify navigation within the interview collection, developer Pin Zhu and student assistant Vladyslav Siulhin have worked on connecting interview transcriptions with audio files using locally used tolls or secured web services. The idea is to use the transcription for full-text research while allowing users to listen to specific segments through synchronized subtitles. The management of audiovisual files, including storage on the encrypted server, pseudonymization, and metadata insertion, is handled by multimedia technician Alexandre Germain of the FHSE-MediaCentre.
The following visualization shows the process of converting human-made interview transcriptions from Poland and Ukraine into subtitles. The Luxembourg team uses the Automatic Speech Recognition AI tool HappyScribe, which can operate offline after a data-sharing agreement was signed with the company. HappyScribe delivers the transcription in subtitle format. For the Polish and Ukrainian teams’ interview transcriptions, the following algorithm was developed to generate timestamps and align them with transcription:
- Modify the Interview Transcript: Reformat the document so that each sentence starts on a new line.
- Generate a TextGrid File: Use the BAS service to create a '.TextGrid' file from the reformatted text and audio.
- Extract Timecodes: Process the '.TextGrid' file generated in step 2 to produce a '.txt' file with timecodes for each sentence.
- Convert to SRT Format to create subtitles: Transform the '.txt' file from step 3 into a '.srt' subtitle file.
BAS, the Bayerisches Archiv für Sprachsignale, are a potential partner for the long-term preservation of the future Luxembourgish interview archive of U-CORE, which could be connected to the Oral History Digital platform. A signed joint processing agreement with this platform facilitates cooperation with tools offered by BAS.
All this work by researchers, technicians, developers, and students is driven by a humanitarian approach to this precious and potentially sensitive collection of war testimonies about the Russian full-scale invasion of Ukraine.