Joint speaker authentication and key phrase identification
First Claim
1. A spoken command analyzer module comprising instructions embodied in one or more non-transitory machine accessible storage media, the spoken command analyzer module configured to cause a computing system comprising one or more computing devices to:
- extract acoustic features from a speech sample;
in response to input of the acoustic features to a neural network, receive, from the neural network, a temporal sequence of bottleneck features;
wherein the neural network is trained to discriminate between classes of phonetic units;
compute statistics using a combination of the acoustic features and the temporal sequence of bottleneck features;
using the statistics, identify a command contained in the speech sample;
using the statistics, identify a speaker of the command;
in response to a comparison of the command and the speaker to a stored model, output, to a device, data that is used by the device to execute an action.
1 Assignment
0 Petitions
Accused Products
Abstract
A spoken command analyzer computing system includes technologies configured to analyze information extracted from a speech sample and, using a joint speaker and phonetic content model, both determine whether the analyzed speech includes certain content (e.g., a command) and to identify the identity of the human speaker of the speech. In response to determining that the identity matches the authorized user'"'"'s identity and determining that the analyzed speech includes the modeled content (e.g., command), an action corresponding to the verified content (e.g., command) is performed by an associated device.
15 Citations
33 Claims
-
1. A spoken command analyzer module comprising instructions embodied in one or more non-transitory machine accessible storage media, the spoken command analyzer module configured to cause a computing system comprising one or more computing devices to:
-
extract acoustic features from a speech sample; in response to input of the acoustic features to a neural network, receive, from the neural network, a temporal sequence of bottleneck features; wherein the neural network is trained to discriminate between classes of phonetic units; compute statistics using a combination of the acoustic features and the temporal sequence of bottleneck features; using the statistics, identify a command contained in the speech sample; using the statistics, identify a speaker of the command; in response to a comparison of the command and the speaker to a stored model, output, to a device, data that is used by the device to execute an action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method, comprising:
-
extracting acoustic features from a speech sample; in response to inputting of the acoustic features to a neural network, receiving, from the neural network, bottleneck features; wherein the neural network is trained to discriminate between different classes of phonetic units; computing statistics using a combination of the acoustic features and the bottleneck features; using the statistics, identifying a command contained in the speech sample; using the statistics, identifying a speaker of the command; in response to a comparison of the command and the speaker to a stored model, outputting, to a device, data that is used by the device to execute an action; wherein the method is performed by one or more computing devices. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An apparatus, comprising:
-
at least one computing device; wherein the at least one computing device is coupled to a sound capture device; wherein the at least one computing device is configured to; extract time-aligned acoustic features from a speech sample captured by the sound capture device; in response to input of the time-aligned acoustic features to a neural network, receive from the neural network, bottleneck features; wherein the neural network is trained to discriminate between classes of phonetic units; compute statistics using a combination of the acoustic features and the bottleneck features; using the statistics, identify a command contained in the speech sample; using the statistics, identify a speaker of the command; in response to a comparison of the command and the speaker to a stored model, output, to a device, data that is used by the at least one computing device to execute an action. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification