自适应图片

Sub3Project

This paper introduces the sub-3sec problem in speaker verification, a short-duration task rarely explored. The issue arises from labor-intensive annotations and costly recordings for textdependent speaker verification (TD-SV) corpora. To address this issue, we propose an automatic pipeline to extract short phrases from text-independent speaker verification (TI-SV) corpora.

Sub3Vox Corpus

This work introduces the Sub3Vox, a novel English corpus for TD-SV. It was generated from a TI-SV dataset by a novel automated pipeline and is larger than any existing TD-SV corpora. Notably, this is the first time that a TD-SV corpus has been created from a TI-SV corpus. We further analyze the characteristics of Sub3Vox and report its baseline performance. The proposed pipeline can be applied to other TI-SV datasets, offering a scalable solution for generating large TD-SV corpora.

How and when you can access the database

The data in Sub3Vox1 will be available soon.

News and Updates

May 2025：Our paper “The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus” has been accepted by the Main Tracks of the Interspeech 2025. See you in Rotterdam, the Netherlands!
Feb 2025：Our paper “The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus” has been submitted to the Main Tracks of the Interspeech 2025.
May 2024：We curated our first Sub3Vox corpus from VoxCeleb1!