DYNAMIC THRESHOLD FOR SPEAKER VERIFICATION

US 20150371639A1
Filed: 07/25/2014
Published: 12/24/2015
Est. Priority Date: 06/24/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, for each of multiple utterances of a hotword, a data set including at least (i) a speaker verification confidence score associated with the utterance, and (ii) environmental context data associated with the utterance;

selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context;

selecting a particular data set from among the subset of data sets based on one or more selection criteria;

selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score included in the particular data set; and

providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a dynamic threshold for speaker verification are disclosed. In one aspect, a method includes the actions of receiving, for each of multiple utterances of a hotword, a data set including at least a speaker verification confidence score, and environmental context data. The actions further include selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context. The actions further include selecting a particular data set from among the subset of data sets based on one or more selection criteria. The actions further include selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score. The actions further include providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.

230 Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, for each of multiple utterances of a hotword, a data set including at least (i) a speaker verification confidence score associated with the utterance, and (ii) environmental context data associated with the utterance;
  
  selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context;
  
  selecting a particular data set from among the subset of data sets based on one or more selection criteria;
  
  selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score included in the particular data set; and
  
  providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the environmental context data specifics an amount of noise detected immediately preceding receipt of the utterance.
  - 3. The method of claim 1, wherein the environmental context data specifies a loudness of the utterance.
  - 4. The method of claim 1, wherein the environmental context data specifies a signal-to-noise ratio of a loudness of an audio signal that encodes the utterance.
  - 5. The method of claim 1, wherein the one or more selection criteria is an empirically defined rejection rate.
  - 6. The method of claim 1, comprising:
    - labeling the data sets with a post trigger accuracy indicator associated with the utterance.
  - 7. The method of claim 1, comprising:
    - labeling the data sets with different, second speaker verification confidence score.
  - 8. The method of claim 1, wherein the data sets each further includes an audio signal that encodes the utterance.
  - 9. The method claim 1, wherein selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context comprises:
    - determining an environmental context data range; and
      
      selecting the subset of the data sets that includes the environmental context data associated with the utterance within the environmental context data range.
  - 10. The method of claim 1, wherein selecting a particular data set from among the subset of data sets based on one or more selection criteria comprises:
    - determining a threshold based on the one or more selection criteria; and
      
      identifying the particular data set from among the subset of data sets that satisfies the threshold by less than other data sets in the subset of data sets.
  - 11. The method of claim 1, comprising:
    - selecting from among the data sets, a plurality of subsets of the data sets that are each associated with a respective particular environmental context;
      
      selecting, based on the one or more selection criteria, a plurality of particular data sets, each particular data set being from among a respective subset of the data sets;
      
      selecting, as a plurality of speaker verification thresholds, each of the speaker verification threshold being for the respective particular environmental context, a plurality of speaker verification confidence scores included in each particular data set; and
      
      providing the plurality of speaker verification thresholds for use in performing speaker verification of utterances that are associated with the respective particular environmental context.
  - 12. The method of claim 1, comprising:
    - selecting from among the data sets, a plurality of subsets of the data sets that are each associated with a respective user;
      
      selecting, based on the one or more selection criteria, a plurality of particular data sets, each particular data set being from among a respective subset of the data sets;
      
      selecting, as a plurality of speaker verification thresholds, each of the speaker verification threshold being for the respective user, a plurality of speaker verification confidence scores included in each particular data set; and
      
      providing the plurality of speaker verification thresholds for use in performing speaker verification of utterances that are associated with the respective user.
  - 13. The method of claim 1, wherein providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context comprises:
    - providing, to a user device, an environmental context data range and a speaker verification threshold for the environmental context data range.

14. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, for each of multiple utterances of a hotword, a data set including at least (i) a speaker verification confidence score associated with the utterance, and (ii) environmental context data associated with the utterance;
  
  selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context;
  
  selecting a particular data set from among the subset of data sets based on one or more selection criteria;
  
  selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score included in the particular data set; and
  
  providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system claim 14, wherein selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context comprises:
    - determining an environmental context data range; and
      
      selecting the subset of the data sets that includes the environmental context data associated with the utterance within the environmental context data range.
  - 16. The system of claim 14, wherein selecting a particular data set from among the subset of data sets based on one or more selection criteria comprises:
    - determining a threshold based on the one or more selection criteria; and
      
      identifying the particular data set from among the subset of data sets that satisfies the threshold by less than other data sets in the subset of data sets.
  - 17. The system of claim 14, wherein the operations further comprise:
    - selecting from among the data sets, a plurality of subsets of the data sets that are each associated with a respective particular environmental context;
      
      selecting, based on the one or more selection criteria, a plurality of particular data sets, each particular data set being from among a respective subset of the data sets;
      
      selecting, as a plurality of speaker verification thresholds, each of the speaker verification threshold being for the respective particular environmental context, a plurality of speaker verification confidence scores included in each particular data set; and
      
      providing the plurality of speaker verification thresholds for use in performing speaker verification of utterances that are associated with the respective particular environmental context.
  - 18. The system of claim 14, wherein the operations further comprise:
    - selecting from among the data sets, a plurality of subsets of the data sets that are each associated with a respective user;
      
      selecting, based on the one or more selection criteria, a plurality of particular data sets, each particular data set being from among a respective subset of the data sets;
      
      selecting, as a plurality of speaker verification thresholds, each of the speaker verification threshold being for the respective user, a plurality of speaker verification confidence scores included in each particular data set; and
      
      providing the plurality of speaker verification thresholds for use in performing speaker verification of utterances that are associated with the respective user.
  - 19. The system of claim 14, wherein providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context comprises:
    - providing, to a user device, an environmental context data range and a speaker verification threshold for the environmental context data range.

20. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, for each of multiple utterances of a hotword, a data set including at least (i) a speaker verification confidence score associated with the utterance, and (ii) environmental context data associated with the utterance;
  
  selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context;
  
  selecting a particular data set from among the subset of data sets based on one or more selection criteria;
  
  selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score included in the particular data set; and
  
  providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Foerster, Jakob, Casado, Diego Melendo

Granted Patent

US 9,384,738 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 17/06   Decision making techniques;...

G10L 17/08   Use of distortion metrics o...

G10L 17/12   Score normalisation

G10L 17/14   Use of phonemic categorisat...

G10L 17/20   Pattern transformations or ...

G10L 17/22   Interactive procedures; Man...

G10L 17/24   the user being prompted to ...

G10L 25/84   for discriminating voice fr...

H04M 3/385   using speech signals

DYNAMIC THRESHOLD FOR SPEAKER VERIFICATION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

230 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DYNAMIC THRESHOLD FOR SPEAKER VERIFICATION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

230 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links