Speaker verification using neural networks
First Claim
Patent Images
1. A method comprising:
- inputting, by a computing device, speech data that corresponds to a particular utterance of a particular speaker to a neural network having parameters trained based on propagation between an input layer and an output layer through one or more hidden layers located between the input layer and the output layer, wherein the one or more hidden layers were trained using utterances of multiple speakers, and wherein the multiple speakers do not include the particular speaker;
generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, a representation of activations occurring at a particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer;
comparing, by the computing device, the generated representation of activations occurring at the particular layer of the neural network in response to the speech data that corresponds to the particular utterance with a reference representation of activations occurring at the particular layer of the neural network in response to speech data that corresponds to one or more past utterances of the particular speaker;
based on comparing the generated representation and the reference representation, determining, by the computing device, that the particular utterance was likely spoken by the particular speaker; and
providing, by the computing device, access to the computing device based on determining that the particular utterance was likely spoken by the particular speaker.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.
55 Citations
18 Claims
-
1. A method comprising:
-
inputting, by a computing device, speech data that corresponds to a particular utterance of a particular speaker to a neural network having parameters trained based on propagation between an input layer and an output layer through one or more hidden layers located between the input layer and the output layer, wherein the one or more hidden layers were trained using utterances of multiple speakers, and wherein the multiple speakers do not include the particular speaker; generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, a representation of activations occurring at a particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer; comparing, by the computing device, the generated representation of activations occurring at the particular layer of the neural network in response to the speech data that corresponds to the particular utterance with a reference representation of activations occurring at the particular layer of the neural network in response to speech data that corresponds to one or more past utterances of the particular speaker; based on comparing the generated representation and the reference representation, determining, by the computing device, that the particular utterance was likely spoken by the particular speaker; and providing, by the computing device, access to the computing device based on determining that the particular utterance was likely spoken by the particular speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing software having stored thereon instructions, which, when executed by one or more computers, cause the one or more computers to perform operations of:
-
inputting, by a computing device, speech data that corresponds to a particular utterance of a particular speaker to a neural network having parameters trained based on propagation between an input layer and an output layer through one or more hidden layers located between the input layer and the output layer, wherein the one or more hidden layers were trained using utterances of multiple speakers, and wherein the multiple speakers do not include the particular speaker; generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, a representation of activations occurring at a particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer; comparing, by the computing device, the generated representation of activations occurring at the particular layer of the neural network in response to the speech data that corresponds to the particular utterance with a reference representation of activations occurring at the particular layer of the neural network in response to speech data that corresponds to one or more past utterances of the particular speaker; based on comparing the generated representation and the reference representation, determining, by the computing device, that the particular utterance was likely spoken by the particular speaker; and providing, by the computing device, access to the computing device based on determining that the particular utterance was likely spoken by the particular speaker. - View Dependent Claims (16, 17)
-
-
18. A system comprising:
one or more processors and one or more computer storage media storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising; inputting, by a computing device, speech data that corresponds to a particular utterance of a particular speaker to a neural network having parameters trained based on propagation between an input layer and an output layer through one or more hidden layers located between the input layer and the output layer, wherein the one or more hidden layers were trained using utterances of multiple speakers, and wherein the multiple speakers do not include the particular speaker; generating, by the computing device and in response to inputting the speech data that corresponds to the particular utterance to the neural network, a representation of activations occurring at a particular layer of the neural network that was trained as one of the hidden layers located between the input layer and the output layer; comparing, by the computing device, the generated representation of activations occurring at the particular layer of the neural network in response to the speech data that corresponds to the particular utterance with a reference representation of activations occurring at the particular layer of the neural network in response to speech data that corresponds to one or more past utterances of the particular speaker; based on comparing the generated representation and the reference representation, determining, by the computing device, that the particular utterance was likely spoken by the particular speaker; and providing, by the computing device, access to the computing device based on determining that the particular utterance was likely spoken by the particular speaker.
Specification