AISHELL-DMASH
开 源 数 据 ,助 力 人 工 智 能 发 展
AISHELL-DMASH 中文普通话麦克风阵列家居场景语音数据库
Distributed Microphone Arrays in Smart Home (DMASH) Dataset
The AISHELL-DMASH dataset is recorded in real smart home scenarios with two different rooms. The dataset contains 30000 hours speech data. The recording devices include one close-talking microphone and seven groups of devices at seven different positions of the room. A group of recording devices include one iPhone, one Android phone, one iPad, one microphone, and one circular microphone array with a radius of 5cm. The dataset includes 511 speakers and each speaker visits three times with a gap of 7-15 days. AISHELL-DMASH dataset was transcribed by the professional speech annotators with high QA process, and the accuracy rate of word is 98%, which could be used in research of voiceprint recognition, speech recognition, wake-up words recognition and so on.
The FFSVC 2020 challenge is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenes. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and technologies in speaker verification, 3) provide an open, free, and large scale speech database to the community that exhibits the far-field characteristics in real scenes.
The FFSVC20 challenge dataset is part of the DMASH dataset. It includes the recordings from the close-talking microphone, the iPhone at 25cm distance, and three randomly selected circular microphone arrays. In FFSVC20, the training partition includes 120 speakers and the development partition includes 35 speakers. For each task, the evaluation data includes 80 speakers.
If you want to download full challenge data and trial files, please email to aishell.foundation@gmail.com. And please indicate "Apply for the FFSVC2020 Challenge data" on the email subject.
The setup of the recording environment.
数据样例
数据介绍
File Structure
训练集样例下载
测试集样例下载
论 文
Non-Open Source
数据使用申请
Company:bd@aishelldata.com
Academic Institution:
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区西北旺东路10号院东区10号楼新兴产业联盟大厦3层316室
开源数据