AISHELL-Stammertalk
开 源 数 据 ,助 力 人 工 智 能 发 展
AISHELL-Stammertalk 中文口吃数据库
A Mandarin stuttered speech dataset
The AISHELL-Stammertalk datasets consists of recordings from 70 native mandarin AWS (Adults who stutter), including 46 males and 24 females. The total duration is 48.8 hours. Each participant engaged in a recording session lasting up to one hour, comprising two parts: conversation and voice command reading. Conversations were conducted through online interviews using platforms like Zoom or Tencent Meet, aiming to capture spontaneous speech on diverse topics. The interviewer, one of the two authors, posed questions based on a prepared list, with the flexibility to introduce impromptu questions as needed.
In the voice command reading part, participants were tasked with reading a set of 200 commands, categorized into car navigation and smart home device interaction. To ensure variety, a new set of 200 commands was introduced for every 25 participants, resulting in a dataset featuring a total of 600 unique commands. Participants were encouraged to employ the Voluntary Stuttering technique, deliberately introducing stuttering.
Five types of stuttering were specified by the annotation guidelines, including:
[]: Word/phrase repetition. Designated for marking entire repeated character or phrase.
/b: block. Gasps for air or stuttered pauses.
/p: prolongation. Elongated phoneme.
/r: sound repetition. Repeated phoneme that do not constitute an entire character.
/i: interjections. Filler characters due to stuttering e.g., ‘嗯’, ‘啊’, or ‘呃’. Notably, naturally occurring interjections that don't disrupt the speech flow are excluded.
The StutteringSpeech Challenge is designed to detect stuttering events and perform automatic speech recognition. The objectives of this challenge are to: 1) stuttering event detection, 2) stuttering automatic speech recognition (ASR).
The challenge dataset is just a redivision of the training and test sets from The AISHELL-Stammertalk datasets, with the data itself being completely identical. You can refer this link for details: https://stutteringspeech.org/
数据下载
查看样例
数据下载
论 文
基线系统
基线系统
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区西北旺东路10号院东区10号楼新兴产业联盟大厦3层316室
开源数据