METHOD AND DEVICE FOR VOICEPRINT RECOGNITION
First Claim
1. A method of voiceprint recognition, comprising:
- at a device having one or more processors and memory;
establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data;
obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features;
based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and
performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and device for voiceprint recognition, include: establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.
52 Citations
20 Claims
-
1. A method of voiceprint recognition, comprising:
at a device having one or more processors and memory; establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A voiceprint recognition system, comprising:
-
one or more processors; and memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising; establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
-
establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification