MDSC includes 18,630 recordings totaling 17 hours, of which 10,125 are from non-dysarthric recordings (Control) totaling 7.6 hours, and 8,505 are from dysarthric recordings (Dysarthria) totaling 9.4 hours. We record utterances from 21 dysarthric (12 females, 9 males) and 25 non-dysarthric (13 females, 12 males) speakers. The participants with dysarthric speakers have the following characteristics:

 

 • Native Mandarin speakers;
 • Broad age distribution (from 18 to 48) and gender balance
 • Diverse etiologies contribute to dysarthria, including cerebral palsy and hepatolenticular degeneration

 

The recordings consist of 10 wake-up words repeated five times at varying speeds. MDSC also includes 355 non-wake-up words, encompassing fixed command words, free command words, household instructions, and other phrases. The single person text list has 295 non-repeated sentences. The recordings, sampled at 16kHz, take place in a quiet indoor environment, with the participants positioned approximately 20cm away from the mobile microphone.

MDSC 中文构音障碍数据库

A Mandarin Dysarthria Speech Corpus 

数据下载

 

Dataset

基线系统

 

Recipe

论 文

 

arxiv

License: CC BY NC 4.0

The LRDWWS Challenge is designed to tackle the wake-up word spotting task for individuals with dysarthria, with the ultimate goal of facilitating broader integration in real-world applications.


The challenge data uses the MDSC database as the training and development sets, and a new test set with 20 dysarthric was recorded, named MDSC-Eval. MDSC-Eval includes 8,760 recordings totaling 9 hours. The recording method for the MDSC-Eval is consistent with the MDSC, with the difference being that single person in the set has 11 additional negative words, with each negative word read 3 times. You can refer this link for details: https://lrdwws.org/

测试集下载

 

训练集下载

 

Test-A set
Train set

验证集下载

 

测试集下载

 

Dev set
Test-B set