Joint Speaker Authentication and Key Phrase Identification

US 20160248768A1
Filed: 02/02/2016
Published: 08/25/2016
Est. Priority Date: 02/20/2015
Status: Active Grant

First Claim

Patent Images

1. A spoken command analyzer module comprising instructions embodied in one or more non-transitory machine accessible storage media, the spoken command analyzer module configured to cause a computing system comprising one or more computing devices to perform operations comprising:

receive data representative of a current speech sample captured by a sound capture device;

with a model that models both user-specific acoustic properties of one or more prior speech samples and command-specific acoustic properties of the one or or more prior speech samples;

analyze the data to determine substantive content of the speech and whether the substantive content includes a command to effect an action by an associated device, andanalyze the data to determine identity of a human speaker of the speech and whether the identity matches an identity of a user who is authorized to issue the command; and

in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the command and determining that the identified user is authorized to issue the command, issue an instruction to effect performance of the action by the associated device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A spoken command analyzer computing system includes technologies configured to analyze information extracted from a speech sample and, using a joint speaker and phonetic content model, both determine whether the analyzed speech includes certain content (e.g., a command) and to identify the identity of the human speaker of the speech. In response to determining that the identity matches the authorized user'"'"'s identity and determining that the analyzed speech includes the modeled content (e.g., command), an action corresponding to the verified content (e.g., command) is performed by an associated device.

64 Citations

View as Search Results

33 Claims

1. A spoken command analyzer module comprising instructions embodied in one or more non-transitory machine accessible storage media, the spoken command analyzer module configured to cause a computing system comprising one or more computing devices to perform operations comprising:
- receive data representative of a current speech sample captured by a sound capture device;
  
  with a model that models both user-specific acoustic properties of one or more prior speech samples and command-specific acoustic properties of the one or or more prior speech samples;
  
  analyze the data to determine substantive content of the speech and whether the substantive content includes a command to effect an action by an associated device, andanalyze the data to determine identity of a human speaker of the speech and whether the identity matches an identity of a user who is authorized to issue the command; and
  
  in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the command and determining that the identified user is authorized to issue the command, issue an instruction to effect performance of the action by the associated device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The spoken command analyzer module of claim 1, further configured to cause performance of operations comprising:
    - analyze the data to determine the substantive content of the speech independently of a language of the speech.
  - 3. The spoken command analyzer module of claim 1, further configured to cause performance of operations comprising:
    - analyze the data to determine whether the substantive content includes the command to effect the action by the associated device, wherein the command comprises any combination of speech based phonetics.
  - 4. The spoken command analyzer module of claim 1, further configured to cause performance of operations comprising:
    - analyze the data to determine whether the substantive content includes one of a plurality of commands wherein individual ones of the plurality of commands correspond to respective different actions by the associated device;
      
      in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the one of the plurality of commands, issue an instruction to effect performance of an action by the associated device associated with the one of the plurality of commands in substantially real time relative to capturing the speech.
  - 5. The spoken command analyzer module of claim 1, further configured to cause performance of operations comprising:
    - analyze the data to determine that the speech was non-recorded live speech from a living being human speaker.
  - 6. The spoken command analyzer module of claim 1, further configured to cause performance of operations comprising:
    - analyze the data to separate the authorized user'"'"'s speech data from contemporaneously captured speech from other human speakers.
  - 7. The spoken command analyzer module of claim 1, wherein the operation of analyze the data to determine whether the substantive content includes the command and the operation of analyze the data to determine the identity of the human speaker of the speech and whether the identity matches the authorized user'"'"'s identity comprises comparing a joint speaker and content model of the data to a stored joint speaker and content model derived from previously analyzed speech of the authorized user to determine both that the speech contains the command and the identity matches the authorized user'"'"'s identity.
  - 8. The spoken command analyzer module of claim 7 wherein the comparing the joint speaker and content model to the stored joint speaker and content model comprises using a speaker identification i-vector analysis including a probabilistic linear discriminant analysis.
  - 9. The spoken command analyzer module of claim 1, wherein the operation of analyze the data to determine whether the substantive content includes the command and the operation of analyze the data to determine the identity of the human speaker of the speech and whether the identity matches the authorized user'"'"'s identity comprises using a bottleneck in a layer located at a middle layer of or layer in a half-portion of a deep neural network closer to the deep neural network'"'"'s outer layer.
  - 10. The spoken command analyzer module of claim 9 wherein the using the bottleneck comprises appending cepstral features to bottleneck features extracted from the deep neural network bottleneck layer.

11. A method of effecting an action by a device using human speech, the method comprising:
- receiving data representative of human speech captured by a sound capture device associated with the device;
  
  with at least one computing device;
  
  using a model of both user-specific acoustic properties and command-specific acoustic properties of a user'"'"'s speech;
  
  analyzing the data to determine substantive content of the speech and whether the substantive content includes a command to effect an action by the device, andanalyzing the data to determine identity of a human speaker of the speech and whether the identity matches an authorized user'"'"'s identity; and
  
  in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the command, issuing an instruction to effect performance of the action by the associated device.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 12. The method of claim 11 further comprising analyzing the data to determine the substantive content of the speech independently of a language of the speech.
  - 13. The method of claim 11, further comprising analyzing the data to determine whether the substantive content includes the command to effect the action by the associated device, wherein the command comprises any combination of speech based phonetics.
  - 14. The method of claim 11, further comprising:
    - analyzing the data to determine whether the substantive content includes one of a plurality of commands wherein individual ones of the plurality of commands correspond to respective different actions by the associated device;
      
      in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the one of the plurality of commands, issuing an instruction to effect performance of an action by the associated device associated with the one of the plurality of commands in substantially real time relative to capturing the speech.
  - 15. The method of claim 11, further comprising analyzing the data to determine that the speech was non-recorded live speech from a living being human speaker.
  - 16. The method of claim 11, further comprising analyzing the data to separate the authorized user'"'"'s speech data from contemporaneously captured speech from other human speakers.
  - 17. The method of claim 11, wherein the analyzing the data to determine whether the substantive content includes the command and the analyzing the data to determine the identity of the human speaker of the speech and whether the identity matches the authorized user'"'"'s identity comprises comparing a joint content and speaker model of the data to a stored joint content and speaker model derived from previously analyzed speech from the authorized user to determine both that the speech contains the command and the identity matches the authorized user'"'"'s identity.
  - 18. The method of claim 11, wherein the analyzing the data to determine whether the substantive content includes the command and the analyzing the data to determine the identity of the human speaker of the speech and whether the identity matches the authorized user'"'"'s identity are performed by the at least one computing device locally to the device that performs the action associated with the command.
  - 19. The method of claim 17 wherein the comparing the joint content and speaker model to the stored joint content and speaker model comprises using a speaker identification i-vector analysis including probabilistic linear discriminant analysis.
  - 20. The method of claim 11, further comprising using a bottleneck in a layer located at a middle layer of or layer in a half-portion of a deep neural network closer to the deep neural network'"'"'s outer layer.
  - 21. The method of claim 20 further comprising appending cepstral features to bottleneck features of the bottleneck.
  - 22. The method of claim 11, further comprising:
    - receiving at least three samples of training data representative of speech by the authorized user including the command;
      
      analyzing the at least three samples of training data to determine respective stored phonetic models for the command including both content recognition features and speaker recognition features.

23. An apparatus for performing computing device effected actions, the apparatus comprising:
- a sound capture device configured to output data representative of human speech captured by the sound capture device;
  
  at least one computing device configured to effect performance of an action response to receipt of a command;
  
  wherein the at least one computing device is configured to;
  
  jointly;
  
  analyze the data to determine substantive content of the speech and whether the substantive content includes the command to effect the action;
  
  analyze the data to determine identity of a human speaker of the speech and whether the identity matches an authorized user'"'"'s identity; and
  
  in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the command, issue an instruction to effect performance of the action.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 24. The apparatus of claim 23, wherein the at least one computing device is further configured to analyze the data to determine the substantive content of the speech independently of a language of the speech.
  - 25. The apparatus of claim 23, wherein the at least one computing device is further configured to analyze the data to determine whether the substantive content includes the command to effect the action by the associated device, wherein the command comprises any combination of speech based phonetics.
  - 26. The apparatus of claim 23, wherein the at least one computing device is further configured to:
    - analyze the data to determine whether the substantive content includes one of a plurality of commands wherein individual ones of the plurality of commands correspond to respective different actions by the associated device;
      
      in response to determining that the identity matches the authorized user'"'"'s identity and determining that the data includes the one of the plurality of commands, issue an instruction to effect performance of an action by the associated device associated with the one of the plurality of commands in substantially real time relative to capturing the speech.
  - 27. The apparatus of claim 23, wherein the at least one computing device is further configured to analyze the data to determine that the speech was non-recorded live speech from a living being human speaker.
  - 28. The apparatus of claim 23, wherein the at least one computing device is further configured to analyze the data to separate the authorized user'"'"'s speech data from contemporaneously captured speech from other human speakers.
  - 29. The apparatus of claim 23, wherein the at least one computing device or at least one another computing not located with the apparatus but in communication with the at least one computing device are configured to:
    - receive at least three samples of training data representative of speech by the authorized user including the command;
      
      analyze the at least three samples of training data to determine respective stored phonetic models for the command including both content recognition features and speaker recognition features.
  - 30. The apparatus of claim 23, wherein the analyzing the data to determine whether the substantive content includes the command and the analyzing the data to determine the identity of the human speaker of the speech and whether the identity matches the authorized user'"'"'s identity comprises comparing an i-vector of the data to an i-vector derived from previously analyzed speech from the authorized user to determine both that the speech contains the command and the identity matches the authorized user'"'"'s identity.
  - 31. The apparatus of claim 30 wherein the comparing the i-vector comprises using a speaker identification i-vector analysis using probabilistic linear discriminant analysis.
  - 32. The apparatus of claim 23, wherein the at least one computing device is further configured to use a bottleneck in a layer located at a middle layer of or layer in a half-portion of a deep neural network closer to the deep neural network'"'"'s outer layer.
  - 33. The apparatus of claim 32 wherein the at least one computing device is further configured to append cepstral features to bottleneck features of the bottleneck.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
McLaren, Mitchell Leigh, Lawson, Aaron Dennis

Granted Patent

US 10,476,872 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/16   using artificial neural net...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 17/18   Artificial neural networks;...

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

H04L 63/0861   using biometrical features,...

H04L 63/10   for controlling access to d...

H04L 63/102   Entity profiles

Joint Speaker Authentication and Key Phrase Identification

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

64 Citations

33 Claims

Specification

Use Cases

Quick Links

Others

Joint Speaker Authentication and Key Phrase Identification

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

33 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others