benty-fields - Search paper

Bhavin Jawade, Joao V. B. Soares, Kapil Thadani, Deen Dayal Mohan, Amir Erfan Eshratifar, Benjamin Culpepper, Paloma de Juan, Srirangaraj Setlur et al.

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.08347v1

Vote

Add to Library

Recommend

5576. Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis

Rui Liu, Zhenqi Jia, Feilong Bao, Haizhou Li

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.06467v1

Conversational speech synthesis (CSS) aims to take the current dialogue (CD) history as a reference to synthesize expressive speech that aligns with the conversational style. Unlike CD, stored dialogue (SD) contains preserved dialogue fragments from earlier stages of user-agent interaction, which include style expression knowledge relevant to scenarios similar to those in CD. Note that this knowledge plays a significant role in enabling the agent to synthesize expressive conversational speech that generates empathetic feedback. However, prior research has overlooked this aspect. To address this issue, we propose a novel Retrieval-Augmented Dialogue Knowledge Aggregation scheme for expressive CSS, termed RADKA-CSS, which includes three main components: 1) To effectively retrieve dialogues from SD that are similar to CD in terms of both semantic and style. First, we build a stored dialogue semantic-style database (SDSSD) which includes the text and audio samples. Then, we design a multi-attribute retrieval scheme to match the dialogue semantic and style vectors of the CD with the stored dialogue semantic and style vectors in the SDSSD, retrieving the most similar dialogues. 2) To effectively utilize the style knowledge from CD and SD, we propose adopting the multi-granularity graph structure to encode the dialogue and introducing a multi-source style knowledge aggregation mechanism. 3) Finally, the aggregated style knowledge are fed into the speech synthesizer to help the agent synthesize expressive speech that aligns with the conversational style. We conducted a comprehensive and in-depth experiment based on the DailyTalk dataset, which is a benchmarking dataset for the CSS task. Both objective and subjective evaluations demonstrate that RADKA-CSS outperforms baseline models in expressiveness rendering. Code and audio samples can be found at: https://github.com/Coder-jzq/RADKA-CSS.
Authors' comments: Accepted by Information Fusion 2025

Vote

Add to Library

Recommend

5577. ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting

Steven H. Wang, Maksim Zubkov, Kexin Fan, Sarah Harrell, Yuyang Sun, Wei Chen, Andreas Plesner, Roger Wattenhofer

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.06582v2

Vote

Add to Library

Recommend

5578. Convergence analysis of Wirtinger Flow for Poisson phase retrieval

Bing Gao, Ran Gu, Shigui Ma

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.06402v1

Vote

Add to Library

Recommend

5579. Deep Reversible Consistency Learning for Cross-modal Retrieval

Ruitao Pu, Yang Qin, Dezhong Peng, Xiaomin Song, Huiming Zheng

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.05686v1

Cross-modal retrieval (CMR) typically involves learning common representations to directly measure similarities between multimodal samples. Most existing CMR methods commonly assume multimodal samples in pairs and employ joint training to learn common representations, limiting the flexibility of CMR. Although some methods adopt independent training strategies for each modality to improve flexibility in CMR, they utilize the randomly initialized orthogonal matrices to guide representation learning, which is suboptimal since they assume inter-class samples are independent of each other, limiting the potential of semantic alignments between sample representations and ground-truth labels. To address these issues, we propose a novel method termed Deep Reversible Consistency Learning (DRCL) for cross-modal retrieval. DRCL includes two core modules, \ie Selective Prior Learning (SPL) and Reversible Semantic Consistency learning (RSC). More specifically, SPL first learns a transformation weight matrix on each modality and selects the best one based on the quality score as the Prior, which greatly avoids blind selection of priors learned from low-quality modalities. Then, RSC employs a Modality-invariant Representation Recasting mechanism (MRR) to recast the potential modality-invariant representations from sample semantic labels by the generalized inverse matrix of the prior. Since labels are devoid of modal-specific information, we utilize the recast features to guide the representation learning, thus maintaining semantic consistency to the fullest extent possible. In addition, a feature augmentation mechanism (FA) is introduced in RSC to encourage the model to learn over a wider data distribution for diversity. Finally, extensive experiments conducted on five widely used datasets and comparisons with 15 state-of-the-art baselines demonstrate the effectiveness and superiority of our DRCL.

Vote

Add to Library

Recommend

5580. Re-ranking the Context for Multimodal Retrieval Augmented Generation

Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.04695v1

Vote

Add to Library

Recommend

Benty-search

5561. Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.13017v1

5562. Ptychographic retrieval for complete ultrashort pulse amplitude swing reconstruction

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.13184v1

5563. Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.12835v2

5564. Improved Coded Caching Scheme for Multi-User Information Retrieval System

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.12528v1

5565. A Linear Programming Approach to Private Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.12286v1

5566. Assisting Mathematical Formalization with A Learning-based Premise Retriever

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.13959v2

5567. Sun-Jafar-Type Schemes for Weak Private Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.11505v1

5568. MechIR: A Mechanistic Interpretability Framework for Information Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.10165v1

5569. Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.08686v1

5570. Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.09136v3

5571. MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.08828v2

5572. ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.07861v1

5573. Enhancing Retrieval-Augmented Generation: A Study of Best Practices

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.07391v1

5574. Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.07040v1

5575. SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.08347v1

5576. Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.06467v1

5577. ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.06582v2

5578. Convergence analysis of Wirtinger Flow for Poisson phase retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.06402v1

5579. Deep Reversible Consistency Learning for Cross-modal Retrieval

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.05686v1

5580. Re-ranking the Context for Multimodal Retrieval Augmented Generation

Show abstract | Show figures | Show BibTeX | Show discussion 0 | View PDF | 2501.04695v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.13017v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.13184v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.12835v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.12528v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.12286v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.13959v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.11505v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.10165v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.08686v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.09136v3

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.08828v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.07861v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.07391v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.07040v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.08347v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.06467v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.06582v2

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.06402v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.05686v1

Show abstract | Show figures | Show BibTeX | Show discussion | View PDF | 2501.04695v1