diff --git a/_posts/2026-03-01-LREC-gmichel.md b/_posts/2026-03-01-LREC-gmichel.md new file mode 100644 index 0000000..2740ec6 --- /dev/null +++ b/_posts/2026-03-01-LREC-gmichel.md @@ -0,0 +1,22 @@ +--- +layout: post +title: "S-VoCAL: A Dataset and Evaluation Framework for Inferring Speaking Voice Character Attributes in Literature" +date: 2026-03-01 10:00:00 +0200 +category: Publication +author: gmichel +readtime: 1 +domains: + - NLP +people: + - gmichel + - eepure +publication_type: conference +publication_title: "S-VoCAL: A Dataset and Evaluation Framework for Inferring Speaking Voice Character Attributes in Literature" +publication_year: 2026 +publication_authors: Abigail Berthe-Pardo, Gaspard Michel, Elena V. Epure, Christophe Cerisara +publication_conference: LREC +publication_code: "https://github.com/AbigailBerthe/S-VoCAL" +publication_preprint: "https://arxiv.org/pdf/2603.00958" +--- + +With recent advances in Text-to-Speech (TTS) systems, synthetic audiobook narration has seen increased interest, reaching unprecedented levels of naturalness. However, larger gaps remain in synthetic narration systems' ability to impersonate fictional characters, and convey complex emotions or prosody. A promising direction to enhance character identification is the assignment of plausible voices to each fictional characters in a book. This step typically requires complex inference of attributes in book-length contexts, such as a character's age, gender, origin or physical health, which in turns requires dedicated benchmark datasets to evaluate extraction systems' performances. We present S-VoCAL (Speaking Voice Character Attributes in Literature), the first dataset and evaluation framework dedicated to evaluate the inference of voice-related fictional character attributes. S-VoCAL entails 8 attributes grounded in sociophonetic studies, and 952 character-book pairs derived from Project Gutenberg. Its evaluation framework addresses the particularities of each attribute, and includes a novel similarity metric based on recent Large Language Models embeddings. We demonstrate the applicability of S-VoCAL by applying a simple Retrieval-Augmented Generation (RAG) pipeline to the task of inferring character attributes. Our results suggest that the RAG pipeline reliably infers attributes such as Age or Gender, but struggles on others such as Origin or Physical Health. \ No newline at end of file diff --git a/_posts/2026-04-21-ACL-gmichel.md b/_posts/2026-04-21-ACL-gmichel.md new file mode 100644 index 0000000..6a66718 --- /dev/null +++ b/_posts/2026-04-21-ACL-gmichel.md @@ -0,0 +1,22 @@ +--- +layout: post +title: "Computational Narrative Understanding for Expressive Text-to-Speech" +date: 2026-04-21 10:00:00 +0200 +category: Publication +author: gmichel +readtime: 1 +domains: + - NLP +people: + - gmichel + - eepure +publication_type: conference +publication_title: "Computational Narrative Understanding for Expressive Text-to-Speech" +publication_year: 2026 +publication_authors: Gaspard Michel, Elena V. Epure, Christophe Cerisara +publication_conference: ACL +publication_code: "https://github.com/deezer/libriquote" +publication_preprint: "https://arxiv.org/pdf/2509.04072" +--- + +Recent advances in text-to-speech (TTS) have been driven by large, multi-domain speech corpora, yet the expressive potential of audiobook data remains underexamined. We argue that human-narrated audiobooks, particularly fictional works, contain rich and diverse prosodic cues arising from the natural alternation between neutral narration and expressive character dialogue. Building from this observation, we introduce LibriQuote, a large-scale 5.3K hours of expressive speech drawn from character quotations. Each quote is supplemented with contextual pseudo-labels for speech verbs and adverbs that characterize the intended delivery of direct speech (e.g., "he whispered softly"). We found that fine-tuning a flow-matching model on LibriQuote yields substantial improvements in expressivity and intelligibility, while training from scratch enhances expressiveness of an autoregressive TTS model. Benchmarking on LibriQuote-test highlights significant variability across systems in generating expressive speech. We publicly release the dataset, code, and evaluation resources to facilitate reproducibility. Audio samples can be found at this [url](https://libriquote.github.io/). \ No newline at end of file diff --git a/_posts/2026-05-28-PrePrint-gmichel.md b/_posts/2026-05-28-PrePrint-gmichel.md new file mode 100644 index 0000000..20e3b6f --- /dev/null +++ b/_posts/2026-05-28-PrePrint-gmichel.md @@ -0,0 +1,23 @@ +--- +layout: post +title: "GraphLit: Learning Text-Enriched Dynamic Character Network Representations for Literary Study" +date: 2026-05-28 10:00:00 +0200 +category: Publication +author: gmichel +readtime: 1 +domains: + - NLP +people: + - gmichel + - eepure + - rhennequin +publication_type: conference +publication_title: "GraphLit: Learning Text-Enriched Dynamic Character Network Representations for Literary Study" +publication_year: 2026 +publication_authors: Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara, Mirella Lapata +publication_conference: Preprint +publication_code: "https://github.com/gasmichel/GraphLit" +publication_preprint: "https://arxiv.org/pdf/2605.28643" +--- + +Methods to represent literary texts as graphs or sequences of graphs mainly focus on representing character interactions, and often overlook another crucial aspect: the textual context in which characters interact. We introduce Dynamic Heterogeneous Character Networks (DHCNs), which organize long novels into temporally localized heterogeneous graphs that align characters with their textual contexts. We extract around 20,000 DHCNs from Project Gutenberg, and propose GraphLit, a self-supervised learning framework that learns rich literary representations through a masked graph autoencoder objective. Across a wide-range of 12 character-related tasks, GraphLit improves over text-only and graph-only baselines, particularly on tasks requiring contextual understanding. Finally, we demonstrate the applicability of DHCNs and GraphLit for literary analysis by studying the link between narrative non-linearity and dynamic social features. \ No newline at end of file