AISHELL-Stammertalk 中文口吃数据库

A Mandarin stuttered speech dataset

The AISHELL-Stammertalk datasets consists of recordings from 70 native mardarin AWS (Adults who stutter), including 46 males and 24 females. The total duration is 48.8 hours. Each participant engaged in a recording session lasting up to one hour, comprising two parts: conversation and voice command reading. Conversations were conducted through online interviews using platforms like Zoom or Tencent Meet, aiming to capture spontaneous speech on diverse topics. The interviewer, one of the two authors, posed questions based on a prepared list, with the flexibility to introduce impromptu questions as needed.

 

In the voice command reading part, participants were tasked with reading a set of 200 commands, categorized into car navigation and smart home device interaction. To ensure variety, a new set of 200 commands was introduced for every 25 participants, resulting in a dataset featuring a total of 600 unique commands. Participants were encouraged to employ the Voluntary Stuttering technique, deliberately introducing stuttering.

 

Five types of stuttering were specified by the annotation guidelines, including:
[]: Word/phrase repetition. Designated for marking entire repeated character or phrase.
/b: block. Gasps for air or stuttered pauses.
/p: prolongation. Elongated phoneme.
/r: sound repetition. Repeated phoneme that do not constitute an entire character.
/i: interjections. Filler characters due to stuttering e.g., ‘嗯’, ‘啊’, or ‘呃’. Notably, naturally occurring interjections that don't disrupt the speech flow are excluded.

The StutteringSpeech Challenge is designed to detect stuttering events and perform automatic speech recognition. The objectives of this challenge are to: 1) stuttering event detection, 2) stuttering automatic speech recognition (ASR).


The challenge dataset is just a redivision of the training and test sets from The AISHELL-Stammertalk datasets, with the data itself being completely identical. You can refer this link for details: https://stutteringspeech.org/

数据下载

 

Apply

查看样例

 

Demo

数据下载

 

Apply

论 文

 

Paper

基线系统

 

Recipe

基线系统

 

Recipe