Eitan Wagner, Renana Keydar, Amit Pinchevski, Omri Abend
In recent decades, efforts have been made to gather and digitize the testimonies of living
Holocaust survivors. The challenge we now face is attending to those thousands of human
stories, which while safely stored in archives, may disappear into oblivion. Despite recent
advances in narrative analysis in the fields of Computational Literature (CL) and Natural
Language Processing (NLP), existing language model technology still faces challenges in
analyzing elaborate narratives and long texts. One such challenge is text segmentation -- a longstanding
issue in the area of CL and NLP. Our work hypothesizes that boundary points between
segments correspond to low mutual information between the sentences proceeding and
following the boundary. Based on this hypothesis, we explore a range of algorithmic
approaches to the task, building on previous work on segmentation that uses generative
Bayesian modeling and state-of-the-art neural machinery. We find that the developed methods
show considerable improvements over previous work. Our research draws on testimony
transcripts from the Shoah Foundation (SF) Holocaust archive for supervised topic
classification, which is then used as topic guidance for automatic segmentation.
• Topical Segmentation of Spoken Narratives: A Test Case on Holocaust Survivor Testimonies
(EMNLP 2022)
• Automatic Topic-Guided Segmentation of Holocaust Survivor Testimonies (JCLS)
• Poster Presentation- TADA 2021 conference