SMIIP-NV
开源时间:2025年8月
在自然语言交流中,情感往往是通过像笑声、哭声、咳嗽声等这类非言语声音来传递的。现有大多数语音合成语料库却缺乏对这类非言语声音的标注信息,因此导致生成此类发声的系统相对稀缺。为填补这一空白,我们构建了SMIIP-NV 语料库—— 这是一个同时标注了情感与非言语声音的语音合成语料库,非言语声音涵盖了笑声、哭声以及咳嗽声。据我们所知,SMIIP-NV是目前规模最大的涵盖非言语声音的开源情感语音语料库。该语料库包含 33 小时语音数据,覆盖 5 种不同情感及 3 类非言语发声,针对非言语部分标注了文本以及精确的时间戳信息。 此外,语料库还针对包含笑声或哭声的语音片段进行了专项标注。为验证本数据集的实用价值,我们采用轻量级大语言模型(LLM),搭建了非言语语音合成任务的基准模型。
In natural language communication, emotions are often conveyed through non-verbal sounds (NVs), such as laughter, crying, and cough. However, most existing text-to-speech (TTS) corpora lack annotations for these non-verbal sounds, leading to a scarcity of systems capable of generating them. To address this gap, we introduce SMIIP-NV, a non-verbal speech synthesis corpus annotated with both emotions and non-verbal sounds, including laughter, crying, and cough. To the best of our knowledge, SMIIP-NV is the largest publicly available open-source expressive speech corpus that includes non-verbal speech. It comprises 33 hours of speech data, covering five distinct emotions and three types of non-verbal sounds, with detailed transcriptions and precise timestamps for each occurrence of non-verbal sounds. Additionally, the corpus provides annotations for speech segments that contain laughter or crying. To demonstrate the utility of this dataset, we establish a baseline for non-verbal speech synthesis by employing a lightweight large language model (LLM).
论 文

数据下载

基线系统

License: CC BY-NC-SA 4.0
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区海淀大悦信息科技园D5-A501
开源数据
