METHOD AND DEVICE FOR VOICEPRINT RECOGNITION
First Claim
1. A method, comprising:
- at a device having one or more processors and memory;
establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data;
establishing a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features;
using the second-level DNN model, registering a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and
performing speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user.
0 Assignments
0 Petitions
Accused Products
Abstract
A method is performed at a device having one or more processors and memory. The device establishes a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data. The device establishes a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features. Using the second-level DNN model, registers a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user. The device performs speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user.
-
Citations
20 Claims
-
1. A method, comprising:
at a device having one or more processors and memory; establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; establishing a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features; using the second-level DNN model, registering a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A voiceprint recognition system, comprising:
-
one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the processors to perform operations comprising; establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; establishing a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features; using the second-level DNN model, registering a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing instructions that, when executed by a computer system with one or more processors, cause the processors to perform operations comprising:
-
establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; establishing a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features; using the second-level DNN model, registering a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification