AISHELL-MDSC
开 源 数 据 ,助 力 人 工 智 能 发 展
MDSC includes 18,630 recordings totaling 17 hours, of which 10,125 are from non-dysarthric recordings (Control) totaling 7.6 hours, and 8,505 are from dysarthric recordings (Dysarthria) totaling 9.4 hours. We record utterances from 21 dysarthric (12 females, 9 males) and 25 non-dysarthric (13 females, 12 males) speakers. The participants with dysarthric speakers have the following characteristics:
• Native Mandarin speakers;
• Broad age distribution (from 18 to 48) and gender balance
• Diverse etiologies contribute to dysarthria, including cerebral palsy and hepatolenticular degeneration
The recordings consist of 10 wake-up words repeated five times at varying speeds. MDSC also includes 355 non-wake-up words, encompassing fixed command words, free command words, household instructions, and other phrases. The single person text list has 295 non-repeated sentences. The recordings, sampled at 16kHz, take place in a quiet indoor environment, with the participants positioned approximately 20cm away from the mobile microphone.
MDSC 中文构音障碍数据库
A Mandarin Dysarthria Speech Corpus
数据下载
基线系统
论 文
License: CC BY NC 4.0
The LRDWWS Challenge is designed to tackle the wake-up word spotting task for individuals with dysarthria, with the ultimate goal of facilitating broader integration in real-world applications.
The challenge data uses the MDSC database as the training and development sets, and a new test set with 20 dysarthric was recorded, named MDSC-Eval. MDSC-Eval includes 8,760 recordings totaling 9 hours. The recording method for the MDSC-Eval is consistent with the MDSC, with the difference being that single person in the set has 11 additional negative words, with each negative word read 3 times. You can refer this link for details: https://lrdwws.org/
测试集下载
训练集下载
验证集下载
测试集下载
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区西北旺东路10号院东区10号楼新兴产业联盟大厦3层316室
开源数据