开源数据产品              AISHELL-2


共享数据,助力人工智能发展。




AISHELL-2 中文语音数据库

AISHELL-2 Open Source Mandarin Speech Corpus


       希尔贝壳中文普通话语音数据库AISHELL-2的语音时长为1000小时,其中718小时来自AISHELL-ASR0009-[ZH-CN],282小时来自AISHELL-ASR0010-[ZH-CN]。录音文本涉及唤醒词、语音控制词、智能家居、无人驾驶、工业生产等12个领域。录制过程在安静室内环境中, 同时使用3种不同设备: 高保真麦克风(44.1kHz,16bit);Android系统手机(16kHz,16bit);iOS系统手机(16kHz,16bit)。AISHELL-2采用iOS系统手机录制的语音数据。1991名来自中国不同口音区域的发言人参与录制。经过专业语音校对人员转写标注,并通过严格质量检验,此数据库文本正确率在96%以上。支持学术研究,未经允许禁止商用。


AISHELL-2 is a 1000-hour Mandarin Chinese Speech Corpus. 718 hours are from AISHELL-ASR0009-[ZH-CN] and 282 hours are from AISHELL-ARS0010-[ZH-CN]. The speech utterance contains 12 domains, including keywords, voice command, smart home, autonomous driving, industrial production, etc.The recording was put in quiet indoor environment, using 3 different devices in parallel: high fidelity microphone (44.1kHz, 16-bit); Android-system mobile phone (16kHz, 16-bit), iOS-system mobile phone (16kHz, 16-bit). AISHELL-2 choose audio data record by iOS-system.1991 speakers from different accent areas in China were participate in this recording. The manual transcription accuracy rate is above 96%, through professional speech annotation and strict quality inspection.This database is free for academic research, not in the commerce, if without permission. )



1000小时 | 1000 Hours

1991人中文普通话

1991 speakers in the recording

语音识别实验

声纹实验

Speech & Speaker Recognition

evaluation

Kaldi系统应用

    marged with Kaldi system 

               Kaldi recipe

    • 客服
    • 电话:010-80225006
    • 邮箱:bd@aishelldata.com
本网站由阿里云提供云计算及安全服务