AISHELL-ASR0009-OS1 开源中文语音数据库

AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus

      希尔贝壳中文普通话开源语音数据库AISHELL-ASR0009-OS1录音时长178小时,是希尔贝壳中文普通话语音数据库AISHELL-ASR0009的一部分。AISHELL-ASR0009录音文本涉及智能家居、无人驾驶、工业生产等11个领域。录制过程在安静室内环境中, 同时使用3种不同设备: 高保真麦克风(44.1kHz,16-bit);Android系统手机(16kHz,16-bit);iOS系统手机(16kHz,16-bit)。高保真麦克风录制的音频降采样为16kHz,用于制作AISHELL-ASR0009-OS1。400名来自中国不同口音区域的发言人参与录制。经过专业语音校对人员转写标注,并通过严格质量检验,此数据库文本正确率在95%以上。分为训练集、开发集、测试集。

 

This Open Source Mandarin Speech Corpus, AISHELL-ASR0009-OS1, is 178 hours long. It is a part of AISHELL-ASR0009, of which utterance contains 11 domains, including smart home, autonomous driving, and industrial production. The whole recording was put in quiet indoor environment, using 3 different devices at the same time: high fidelity microphone (44.1kHz, 16-bit,); Android-system mobile phone (16kHz, 16-bit), iOS-system mobile phone (16kHz, 16-bit). Audios in high fidelity were re-sampled to 16kHz to build AISHELL- ASR0009-OS1. 400 speakers from different accent areas in China were invited to participate in the recording. The manual transcription accuracy rate is above 95%, through professional speech annotation and strict quality inspection. The corpus is divided into training, development and testing sets. 

语音识别实验

声纹实验

Speech & Speaker Recognition

178小时 | 178 Hours

400人中文普通话

400 speakers in the recording

Kaldi系统应用

merged with Kaldi system

Kaldi recipe

数据下载

 

Non-Open Source

 

Dataset

License: Apache License v.2.0

相关课程

 

了解课程详情

AISHELL-1

语音识别实战

 

论 文

 

arxiv
IEEE

基线系统

 

Recipe

       数据使用申请                Company:bd@aishelldata.com      

 

Service  Application          Academic Institution:aishell.foundation@gmail.com