AISHELL-4
开 源 数 据 ,助 力 人 工 智 能 发 展
AISHELL-4 多通道中文会议语音数据库
AISHELL-4 Open Source Mandarin Multi-channel Meeting Speech Corpus
AISHELL-4是一个通过麦克风阵列实录的八通道中文普通话会议场景语音数据集。该数据集共包含211场会议,每场会议4至8人,数据集共120小时左右。该数据集旨在促进实际应用场景下多说话人处理的研究。AISHELL-4数据包括了实际会议场景下各种重要特性,例如停顿、重叠、说话人轮转、噪声等。同时数据集提供了准确的音字转写文本及时间戳信息,方便研究者进行诸如前端处理、语音识别、说话人分割等单独任务,并可以进行联合优化。
The AISHELL-4 is a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bride the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, the accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks. We also release a PyTorch-based training and evaluation framework as baseline system to promote reproducible research in this field.
The setup of the recording environment.
120 小时 丨 120 Hours
Speech front-end processing
Speech Recognition
Speaker Diarization
开源系统
Open Source
211 场会议 丨 211 Meeting Sessions
10个 会议室 丨 10 Meeting Rooms
60 人 丨 60 Speakers
AISHELL-4 is part of the AISHELL-ASR0055 Corpus
The setup of the recording environment.
29 个会议室 丨 29 Meeting Rooms
956 场会议 丨 956 Meeting Sessions
666 小时/单通道 丨 666 Hours/Single Channel
261 人 丨 261 Speakers
Non-Open Source
数据使用申请 Company:bd@aishelldata.com
Service Application Academic Institution:aishell.foundation@gmail.com
数据下载
论 文
License: Apache License v.2.0
基线系统
相关课程
语音识别实战
样例下载
数据介绍
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区西北旺东路10号院东区10号楼新兴产业联盟大厦3层316室
开源数据