Dynamic threshold for speaker verification

US 9,972,323 B2
Filed: 05/19/2017
Issued: 05/15/2018
Est. Priority Date: 06/24/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a computing device that uses voice-based speaker identification, data identifying an utterance previously received by the computing device and data indicating that a user likely did speak the utterance;

prompting the user to confirm that the user did speak the utterance;

receiving, from the user, data indicating that the user has confirmed that the user did speak the utterance; and

in response to receiving the data indicating that the user has confirmed that the user did speak the utterance, using audio data corresponding to the utterance previously received by the computing device to perform voice-based speaker identification on a subsequently received utterance that has a shared characteristic with the utterance previously received, wherein the shared characteristic is (i) an amount of background noise within a same background noise range, (ii) an amount of loudness within a same loudness range, or (iii) a signal-to-noise ratio within a same signal-to-noise ratio range.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a dynamic threshold for speaker verification are disclosed. In one aspect, a method includes the actions of receiving, for each of multiple utterances of a hotword, a data set including at least a speaker verification confidence score, and environmental context data. The actions further include selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context. The actions further include selecting a particular data set from among the subset of data sets based on one or more selection criteria. The actions further include selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score. The actions further include providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.

43 Citations

View as Search Results

19 Claims

1. A computer-implemented method comprising:
- receiving, by a computing device that uses voice-based speaker identification, data identifying an utterance previously received by the computing device and data indicating that a user likely did speak the utterance;
  
  prompting the user to confirm that the user did speak the utterance;
  
  receiving, from the user, data indicating that the user has confirmed that the user did speak the utterance; and
  
  in response to receiving the data indicating that the user has confirmed that the user did speak the utterance, using audio data corresponding to the utterance previously received by the computing device to perform voice-based speaker identification on a subsequently received utterance that has a shared characteristic with the utterance previously received, wherein the shared characteristic is (i) an amount of background noise within a same background noise range, (ii) an amount of loudness within a same loudness range, or (iii) a signal-to-noise ratio within a same signal-to-noise ratio range.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, comprising:
    - recognizing an identity of the user using a technique other than voice-based speaker identification.
  - 3. The method of claim 2, wherein recognizing the identity of the user using the technique other than voice-based speaker identification comprises prompting the user for a passcode.
  - 4. The method of claim 1, wherein the utterance previously received by the computing device and the subsequently received utterance each include a predefined hotword.
  - 5. The method of claim 1, wherein the amount of background noise is measured prior to receipt of the previously received utterance and the subsequently received utterance.
  - 6. The method of claim 1, wherein prompting the user to confirm that the user did speak the utterance comprises:
    - providing, for display, data indicating a date and time when the utterance was received.

7. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a computing device that uses voice-based speaker identification, data identifying an utterance previously received by the computing device and data indicating that a user likely did speak the utterance;
  
  prompting the user to confirm that the user did speak the utterance;
  
  receiving, from the user, data indicating that the user has confirmed that the user did speak the utterance; and
  
  in response to receiving the data indicating that the user has confirmed that the user did speak the utterance, using audio data corresponding to the utterance previously received by the computing device to perform voice-based speaker identification on a subsequently received utterance that has a shared characteristic with the utterance previously received, wherein the shared characteristic is (i) an amount of background noise within a same background noise range, (ii) an amount of loudness within a same loudness range, or (iii) a signal-to-noise ratio within a same signal-to-noise ratio range.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the operations further comprise:
    - recognizing an identity of the user using a technique other than voice-based speaker identification.
  - 9. The system of claim 8, wherein recognizing the identity of the user using the technique other than voice-based speaker identification comprises prompting the user for a passcode.
  - 10. The system of claim 7, wherein the utterance previously received by the computing device and the subsequently received utterance each include a predefined hotword.
  - 11. The system of claim 7, wherein the amount of background noise is measured prior to receipt of the previously received utterance and the subsequently received utterance.
  - 12. The system of claim 7, wherein prompting the user to confirm that the user did speak the utterance comprises:
    - providing, for display, data indicating a date and time when the utterance was received.

13. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a computing device that uses voice-based speaker identification, data identifying an utterance previously received by the computing device and data indicating that a user likely did speak the utterance;
  
  prompting the user to confirm that the user did speak the utterance;
  
  receiving, from the user, data indicating that the user has confirmed that the user did speak the utterance; and
  
  in response to receiving the data indicating that the user has confirmed that the user did speak the utterance, using audio data corresponding to the utterance previously received by the computing device to perform voice-based speaker identification on a subsequently received utterance that has a shared characteristic with the utterance previously received, wherein the shared characteristic is (i) an amount of background noise within a same background noise range, (ii) an amount of loudness within a same loudness range, or (iii) a signal-to-noise ratio within a same signal-to-noise ratio range.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The medium of claim 13, wherein the utterance previously received by the computing device and the subsequently received utterance each include a predefined hotword.
  - 15. The medium of claim 13, wherein the operations further comprise:
    - recognizing an identity of the user using a technique other than voice-based speaker identification.
  - 16. The medium of claim 15, wherein recognizing the identity of the user using the technique other than voice-based speaker identification comprises prompting the user for a passcode.
  - 17. The medium of claim 13, wherein the amount of background noise is measured prior to receipt of the previously received utterance and the subsequently received utterance.
  - 18. The medium of claim 13, wherein prompting the user to confirm that the user did speak the utterance comprises:
    - providing, for display, data indicating a date and time when the utterance was received.

19. A computer-implemented method comprising:
- receiving, by a computing device that uses voice-based speaker identification, data identifying an utterance previously received by the computing device and data indicating that a user likely did speak the utterance;
  
  providing, for display, a prompt for the user to confirm that the user did speak the utterance, the prompt indicating a date and time that the utterance was received;
  
  receiving, from the user, data indicating that the user has confirmed that the user did speak the utterance; and
  
  in response to receiving the data indicating that the user has confirmed that the user did speak the utterance, using audio data corresponding to the utterance previously received by the computing device to perform voice-based speaker identification on a subsequently received utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Foerster, Jakob Nicolaus, Melendo Casado, Diego
Primary Examiner(s)
Baker, Charlotte M

Application Number

US15/599,578
Publication Number

US 20170345430A1
Time in Patent Office

361 Days
Field of Search

704250, 715735, 715741, 715743
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 17/06   Decision making techniques;...

G10L 17/08   Use of distortion metrics o...

G10L 17/12   Score normalisation

G10L 17/14   Use of phonemic categorisat...

G10L 17/20   Pattern transformations or ...

G10L 17/22   Interactive procedures; Man...

G10L 17/24   the user being prompted to ...

G10L 25/84   for discriminating voice fr...

H04M 3/385   using speech signals

Dynamic threshold for speaker verification

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

43 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamic threshold for speaker verification

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links