Sub3Project
This paper introduces the sub-3sec problem in speaker verification, a short-duration task rarely explored. The issue arises from labor-intensive annotations and costly recordings for textdependent speaker verification (TD-SV) corpora. To address this issue, we propose an automatic pipeline to extract short phrases from text-independent speaker verification (TI-SV) corpora.

Sub3Vox Corpus
This work introduces the Sub3Vox, a novel English corpus for TD-SV. It was generated from a TI-SV dataset by a novel automated pipeline and is larger than any existing TD-SV corpora. Notably, this is the first time that a TD-SV corpus has been created from a TI-SV corpus. We further analyze the characteristics of Sub3Vox and report its baseline performance. The proposed pipeline can be applied to other TI-SV datasets, offering a scalable solution for generating large TD-SV corpora.
How and when you can access the database
The data in Sub3Vox1 will be available soon.
News and Updates
- May 2025:Our paper “The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus” has been accepted by the Main Tracks of the Interspeech 2025. See you in Rotterdam, the Netherlands!
- Feb 2025:Our paper “The Sub-3Sec Problem: From Text-Independent to Text-Dependent Corpus” has been submitted to the Main Tracks of the Interspeech 2025.
- May 2024:We curated our first Sub3Vox corpus from VoxCeleb1!