CNSRC 2022
CN-Celeb Speaker Recognition Challenge 2022
Welcome to the first CN-Celeb speaker recognition challenge, CNSRC 2022 ! The challenge aims to probe how well the current speaker recognition methods can work in real world scenarios, including with in-the-wild complexity and real-time processing speed.
Tasks
Task 1. Speaker Verification (SV)
The objective of this task is to improve performance on the standard CN-Celeb evaluation set. According to the data used in system development, two tracks are defined for the SV task: fixed track and open track, shown as follows:
Fixed Track, where only the CN-Celeb training set is allowed for training/tuning the system.
Open Track, where any data sources can be used for developing the system, except the CN-Celeb evaluation set.
Task 2. Speaker Retrieval (SR)
The purpose of this task is to find out the utterances spoken by a target speaker from a large data pool, given an enrollment data of the target speaker. Each target speaker forms a retrieval request. Each target individual has 1 enrolled utterance and 10 test utterances. The non-target set contains a large amount of utterances, coming from multiple sources. The target and non-target utterances are put together, and the participants are required to design their retrieval system to find top-10 candidates for each target speaker, and list them in descending order according to the LLR scores. Participants can use any data sources to train their system, except the CN-Celeb evaluation set.
CNSRC 2022 defines two tasks: speaker verification (SV) and speaker retrieval (SR).
Evaluation
Task 1. Speaker Verification (SV)
The primary metric for SV performance evaluation is minimum Detection Cost Function (minDCF).
Firstly define the detection cost function as follows:
where 𝑃𝑀𝑖𝑠𝑠 (𝜃) is the missing rate and 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 (𝜃) is the false alarm rate with the decision threshold set to 𝜃. 𝐶𝑀𝑖𝑠𝑠 and 𝐶𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 are the cost of a missed detection and a spurious detection, respectively; 𝑃𝑇𝑎𝑟𝑔𝑒𝑡 is a prior probability of the specified target speaker. Then minDCF is obtained by minimizing 𝐶𝐷𝑒𝑡 (𝜃) with respect to 𝜃 and setting 𝐶𝑀𝑖𝑠𝑠 = 𝐶𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 = 1 and 𝑃𝑇𝑎𝑟𝑔𝑒𝑡 = 0.01:
Besides minDCF, the SV performance is also evaluated/analyzed in three ways:
• Equal Error Rate (EER). EER is defined as the balanced value of 𝑃𝑀𝑖𝑠𝑠 and 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚, formally 𝑃𝑀𝑖𝑠𝑠 (𝜃 ∗ ) = 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 (𝜃 ∗ ), where 𝜃 ∗ is the decision threshold that achieves the balance. EER is used as the auxiliary metric and should be reported in the system description.
• Decision Error Tradeoff (DET) curve. DET curve is a curve within a two-dimensional space where the two axes represent 𝑃𝑀𝑖𝑠𝑠 and 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 respectively. The DET curve reflects the trade-off between missing and false alarm, and presents the performance of the system at various operation points determined by 𝜃.
Task 2. Speaker Retrieval (SR)
The performance of the SR system will be measured in terms of Mean Average Precision (mAP). For a single speaker 𝑖, suppose there are 𝑀 test utterances overall, and the system output maximum top-𝑁 candidates for each retrieval request. For the top-k case, the Precision is defined as:
The AP of top-𝑁 is defined as the averaged precision over the top-𝑘 (𝑘 = 1, 2, .., 𝑁) cases:
Then mAP is computed as the averaged AP over all the target speakers: retrieval awards of all the target speakers:
where 𝑆 is the number of target speakers. For the evaluation set of CNSRC 2022, the parameters are as follows: the number of target speakers 𝑆 = 5 in SV.dev and 𝑆 = 25 in SV.eval, the number of test utterances per target speaker 𝑀 = 10, and the SR system output maximum candidates 𝑁 = 10.
Data
Task 1. Speaker Verification (SV)
In the fixed track, only the CN-Celeb training set is allowed to be used to perform system development. It contains 797 speakers of CN-Celeb1.dev and 1996 speakers of CN-Celeb2. You can obtain the dataset from OpenSRL.
In the open track, any data sources and tools can be used to develop the system.
Task 2. Speaker Retrieval (SR)
Two datasets will be released: SR.dev and SR.eval. Each dataset contains two parts:
(1) Target speakers and associated enrollment data;
(2) Utterance pool that involves utterances of the target speakers as well as a large amount of non-target utterances. SR.dev will be provided to the participants for system development, while SR.eval will be released for system evaluation.
The SR.dev and SR.eval will be released soon.
CNSRC 2022 Website
The challenge will be based on CN-Celeb, a free multi-genre speaker recognition dataset with the most real-world complexity so far. The dataset consists of audio from both multiple genres of speech, including entertainment, interview, singing, play, movie, vlog, live broadcast, speech, drama, recitation and advertisement as well as real-world noise, strong and overlapped background speakers, significant variations in speaking styles, time-varying and cross-channel problems and long-short test scenarios. The CNSRC 2022 is open now. Please check the detailed information below about the challenge.
Registration
Participants must sign up for an evaluation account where they can perform various activities such as registering for the evaluation, signing the data license agreement, as well as uploading the submission and system description.
Once the account has been created, the registration can be performed online. The registration is free to all individuals and institutes. The regular case is that the registration takes effect immediately, but the organizers may check the registration information and ask the participants to provide additional information to validate the registration.
To sign up for an evaluation account, please click Quick Registration
Baseline
The organizers prepared multiple baseline systems to demonstrate the process of training/evaluation required by the challenge. All the baseline systems are open-sourced.
For the fixed track SV system, three baselines are provided [1] [2] [3]. These baseline recipes can be easily adapted to develop an open-track system by involving more training data, except CN-Celeb.E.
For speaker retrieval, two baselines are provided [1] [2]. Based on these baseline recipes, participants can use any data sources to train their system, except CN-Celeb.E.
Submission and Leaderboard
Participants should submit their results via the submission system. Once the submission is completed, it will be shown in the Leaderboard, and all participants can check their positions. For each task and each track, participants can submit their results no more than 10 times.
The submission and leaderboard server will be open soon.
Dates
Mid Feb | Registration System Open. |
Late Feb | Development Set for Track 2 SR Release. |
Mid Mar | Evaluation Set for Track 2 SR Release. |
Mid Mar | Submission System and Leader Board Open. |
Mid May | Deadline for Submission of Results. |
Late May | Deadline for Technical Description. |
29th Jun | CNSRC 2022 workshop at Odyssey 2022. |
Organization Committees
Dong Wang, Tsinghua University, Beijing, China
Qingyang Hong, Xiamen University, Xiamen, China
Lantian Li, Tsinghua University, Beijing, China
Wenqiang Du, Tsinghua University, Beijing, China
Yang Zhang, Tsinghua University, Shenzhen, China
Tao Jiang, TalentedSoft, Xiamen, China
Hui Bu, AISHELL, Beijing, China
Xin Xu, AISHELL, Beijing, China
Please contact e-mail cnsrc@cslt.org if you have any queries.
Acknowledgements
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants No.61633013 and No.62171250.
微信公众号
联系我们
商务合作:bd@aishelldata.com
技术服务:tech@aishelldata.com
联系电话:+86-010-80225006
公司地址:
北京市海淀区西北旺东路10号院东区10号楼新兴产业联盟大厦3层316室
开源数据