AISHELL-5
开 源 数 据 ,助 力 人 工 智 能 发 展
开源时间:2025年5月
AISHELL-5数据集是在混合动力汽车内部采集完成。数据采集系统包括:车载自带门把手上方的远讲麦克风,用以捕捉车内不同区域的远讲音频;同时,每位说话人均佩戴一个头戴麦克风,用以采集近讲清晰音频。为真实还原驾驶场景下的复杂声学环境,我们系统性地设计了涵盖多类噪声的录制场景:
环境噪声:在不同时段(日间与夜间)及不同路段(城市道路与高速公路)进行录制。
风噪与胎噪:通过控制车窗开合程度(全关、半开、三分之一开)及车速(静止、低速、中速、高速)进行录制。
车内噪声:将音乐与空调设置不同档位。
以上子场景被编号并组合,最终形成超过60种不同的录制场景,全面覆盖了真实车载环境中的各类噪声条件。录制过程共邀请260名无明显口音的参与者,包含超过100小时的语音数据。每次录音时,2-4名说话人随机就坐于车内四个位置,进行无内容限制的自由对话,以确保音频数据的自然性与真实性,单次会话平均时长为10分钟,并进行人工校对转写。数据集划分为训练集94小时,验证集3.3小时,测试集共6.88小时。所有数据集均包含4通道远讲音频,仅训练集额外包含近场音频。此外,为促进语音仿真技术研究,我们还提供了一个大规模噪声数据集 (Noise),其录制设置与远场数据相同,但不含人声,时长约40小时。
The AISHELL-5 dataset is recorded inside a hybrid electric car, with a far-field microphone placed above the door handles of all four doors to capture far-field audio from different areas of the car. Additionally, each speaker wears a high-fidelity microphone to collect near-field audio for data annotation. A total of 260 participants are involved in the recording with no notable accents. During the recording, 2-4 speakers are randomly seated in the four positions inside the car and engaged in free conversations without content restrictions to ensure the naturalness and authenticity of the audio data. The average duration of each session is 10 minutes. The scripts for all our speech data are prepared in TextGrid format.
In normal driving scenarios, the car typically contains various noises from both inside and outside. External noises include environmental sounds, wind noise, tire noise, etc., while internal noises come from sources like music players and air conditioning. These noises significantly impact the accuracy of in-car speech recognition systems. To comprehensively cover the various noise types encountered in real-world in-car scenarios, we carefully design the recording scenes. For environmental noise, recordings are made with different driving segments (urban streets and highways) during both daytime and night time. About the wind-induced noise and tire noise, we control the degree to which the car windows are open (fully closed, half open, and one-third open) and the car’s speed (stationary, low- speed, medium-speed, and high-speed). For noise inside the car, we set the music player and air conditioning to different levels to cover a variety of in-car conditions. These different sub-scenes are numbered, and all sub-scenes are combined in various ways to form the final recording scenarios, resulting in over 60 recording scenarios in total.
The AISHELL-5 dataset contains more than 100 hours of speech data, divided into 94 hours of training data(Train), 3.3 hours of validation data (Dev), and two test sets(Eval1 and Eval2), with durations of 3.3 and 3.58 hours. Each dataset includes far-field audio from 4 channels, with only the training set containing near-field audio. Additionally,to promote research on speech simulation techniques, we also provide alarge-scale noise dataset (Noise), which has the same recording settings as the far-field data but without any speaker speech, lasting approximately 40 hours.
AISHELL-5 智能驾舱语音交互数据集
AISHELL-5 In-Car Multi-Channel Multi-Speaker Speech Dataset
数据下载

License: CC BY NC 4.0
论 文

基线系统

AISHELL-ASR0051

数据使用申请 Company:bd@aishelldata.com
Service Application Academic Institution:aishell.foundation@gmail.com
样例下载

AISHELL-5 is part of the AISHELL-ASR0051 Corpus
3164 小时 丨3164 Hours
数据介绍

Non-Open Source

11 个拾音位 丨 11 Points
260 人 丨 260 Speakers
60 个场景 丨 60 Scenes

The ICMC-ASR challenge is dedicated to the domain of speech recognition in complex driving conditions. The objectives of this challenge are to: 1) automatic speech recognition (ASR) , 2) automatic speech diarization and recognition (ASDR).
The data of the challenge and the AISHELL-5 dataset itself are completely identical. You can refer this link for details: https://icmcasr.org/
基线系统

数据下载

License: CC BY NC 4.0
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区海淀大悦信息科技园D5-A501
开源数据
