Speech & Audio › Synthesis ›

Text-to-Speech

835 directly classified papers

Papers per year

Papers

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models ICML 2023

VC-T: Streaming Voice Conversion Based on Neural Transducer INTERSPEECH 2023

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model INTERSPEECH 2023

DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer INTERSPEECH 2023

Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0 INTERSPEECH 2023

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed INTERSPEECH 2023

FACTSpeech: Speaking a Foreign Language Pronunciation Using Only Your Native Characters INTERSPEECH 2023

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus INTERSPEECH 2023

RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech INTERSPEECH 2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions ACL 2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech INTERSPEECH 2023

Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer INTERSPEECH 2023

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design INTERSPEECH 2023

ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks ACL 2023

Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers INTERSPEECH 2023

STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework INTERSPEECH 2023

SASPEECH: A Hebrew Single Speaker Dataset for Text To Speech and Voice Conversion INTERSPEECH 2023

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis AAAI 2023

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech AAAI 2023

Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech INTERSPEECH 2023

Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages INTERSPEECH 2023

RWEN-TTS: Relation-Aware Word Encoding Network for Natural Text-to-Speech Synthesis AAAI 2023

Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation INTERSPEECH 2023

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center INTERSPEECH 2023

Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer INTERSPEECH 2023