Method and system for dual scoring for text-dependent speaker verification

US 9,489,950 B2
Filed: 05/23/2013
Issued: 11/08/2016
Est. Priority Date: 05/31/2012
Status: Active Grant

First Claim

Patent Images

1. A speaker verification method comprising:

receiving an utterance from a speaker by an audio receiving device;

determining a text-independent speaker verification score in response to the utterance using a processor coupled to the audio receiving device to determine the text-independent speaker verification score in response to a speaker-dependent text-independent Gaussian Mixture Model (GMM) of the utterance;

determining a text-dependent speaker verification score in response to the utterance using the processor to determine the text-dependent speaker verification score in response to a continuous density Hidden Markov Model (HMM) of the utterance aligned by a Viterbi decoding;

determining a Universal Background Model (UBM)-independent speaker-dependent normalized score in response to a relationship between the text-dependent speaker verification score and the text-independent speaker verification score using the processor, the relationship being based on a difference between the text-dependent speaker verification score and the text-independent speaker verification score; and

determining speaker verification in response to the UBM-independent speaker-normalized score.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of systems and methods for speaker verification are provided. In various embodiments, a method includes receiving an utterance from a speaker and determining a text-independent speaker verification score and a text-dependent speaker verification score in response to the utterance. Various embodiments include a system for speaker verification, the system comprising an audio receiving device for receiving an utterance from a speaker and converting the utterance to an utterance signal, and a processor coupled to the audio receiving device for determining speaker verification in response to the utterance signal, wherein the processor determines speaker verification in response to a UBM-independent speaker-normalized score.

20 Citations

View as Search Results

16 Claims

1. A speaker verification method comprising:
- receiving an utterance from a speaker by an audio receiving device;
  
  determining a text-independent speaker verification score in response to the utterance using a processor coupled to the audio receiving device to determine the text-independent speaker verification score in response to a speaker-dependent text-independent Gaussian Mixture Model (GMM) of the utterance;
  
  determining a text-dependent speaker verification score in response to the utterance using the processor to determine the text-dependent speaker verification score in response to a continuous density Hidden Markov Model (HMM) of the utterance aligned by a Viterbi decoding;
  
  determining a Universal Background Model (UBM)-independent speaker-dependent normalized score in response to a relationship between the text-dependent speaker verification score and the text-independent speaker verification score using the processor, the relationship being based on a difference between the text-dependent speaker verification score and the text-independent speaker verification score; and
  
  determining speaker verification in response to the UBM-independent speaker-normalized score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method in accordance with claim 1 wherein the step of determining speaker verification in response to the UBM-independent speaker-normalized score comprises determining speaker verification in response to a dual-scoring soft decision margin combination of the UBM-independent speaker-normalized score and the text-dependent speaker verification score.
  - 3. The method in accordance with claim 1 further comprising:
    - determining a first threshold defined in response to a speaker-normalized score minimizing the False Acceptance (PFA); and
      
      determining a second threshold defined in response to a text-dependent speaker verification score minimizing the False Rejection (PFR).
  - 4. The method in accordance with claim 3 further comprising determining a decision tree classification scoring function in response to the first threshold and the second threshold as applied to a plurality of speaker scores of the speaker in a scoring trial.
  - 5. The method in accordance with claim 4 wherein the decision tree classification scoring function comprises a single dimensional confidence interval with three decision regions, and wherein the step of determining speaker verification in response to the UBM-independent speaker-normalized score comprises determining speaker verification in response to mapping the UBM-independent speaker-normalized score and a text-dependent speaker verification score to the three decision regions.
  - 6. The method in accordance with claim 5 wherein the three decision regions comprise an accept decision region, an indecisive decision region and a reject decision region.
  - 7. The method in accordance with claim 6 further comprising requesting a further speaker utterance in response to the speaker verification mapping the speaker UBM-independent speaker-normalized score and the text-dependent speaker verification score to the indecisive decision region.

8. A Universal Background Model (UBM) independent speaker verification method comprising:
- receiving an utterance from a speaker by an audio receiving device;
  
  determining a text-independent speaker verification score in response to the utterance using a processor coupled to the audio receiving device;
  
  determining a text-dependent speaker verification score in response to the utterance using the processor;
  
  determining a UBM-independent speaker-normalized score in response to a difference between the text-independent speaker verification score and the text-dependent speaker verification score using the processor; and
  
  determining speaker verification in response to the UBM-independent speaker-normalized score.
- View Dependent Claims (9, 10, 11)
- - 9. The method in accordance with claim 8 wherein the step of determining the UBM-independent speaker-normalized score comprises determining the UBM-independent speaker-normalized score in response to the difference between the text-independent speaker verification score and the text-dependent speaker verification score by determining a likelihood ratio between the text-dependent speaker verification score and the text-independent speaker verification score.
  - 10. The method in accordance with claim 9 wherein the utterance comprises a prompted pass-phrase, and wherein step of determining the text-independent speaker verification score comprises determining the text-independent speaker verification score in response to the utterance and further in response to one or more pass-phrases different from the prompted pass-phrase and previously pronounced by the speaker as playback impostures.
  - 11. The method in accordance with claim 9 wherein the step of determining the UBM-independent speaker-normalized score comprises determining a likelihood ratio

12. A dual-scoring text-dependent speaker verification method comprising:
- receiving a plurality of test utterances by an audio receiving device;
  
  determining a text-independent speaker verification score in response to each of the plurality of utterances using a processor coupled to the audio receiving device;
  
  determining a text-dependent speaker verification score in response to each of the plurality of utterances using the processor;
  
  determining a Universal Background Model (UBM)-independent speaker-normalized score in response to a relationship between the text-dependent speaker verification score and the text-independent speaker verification score using the processor, the relationship being based on a difference between the text-dependent speaker verification score and the text-independent speaker verification score;
  
  mapping the UBM-independent speaker-normalized score and the text-dependent speaker verification score for each of the plurality of utterances into a two-dimensional score space in response to a score accept threshold and a score reject threshold;
  
  splitting the two-dimensional score space into three clusters, the three clusters corresponding to accept scores, indecisive scores and reject scores; and
  
  defining a binary decision tree for speaker verification confidence score generation by identifying a logistic function at each node of the binary decision tree.
- View Dependent Claims (13, 14)
- - 13. The method in accordance with claim 12 further comprising:
    - receiving an utterance from a speaker by the audio receiving device;
      
      determining the text-independent speaker verification score in response to the utterance using a processor coupled to the audio receiving device;
      
      determining the text-dependent speaker verification score in response to the utterance using the processor;
      
      determining a UBM-independent speaker-normalized score in response to a relationship between the text-dependent speaker verification score and the text-independent speaker verification score using the processor; and
      
      generating a speaker verification confidence score corresponding to the utterance in response to performing the logistic function at each node of the binary decision tree to map the text-dependent speaker verification score for the utterance and the UBM-independent speaker-normalized score for the utterance onto the binary decision tree.
  - 14. The method in accordance with claim 12 wherein the step of defining the binary decision tree comprises defining the binary decision tree for speaker verification confidence score generation, based on the HIerarchical multi-Layer Acoustic Model (HiLAM) binary tree modeling approach.

15. A system for speaker verification comprising:
- an audio receiving device for receiving an utterance from a speaker and converting the utterance to an utterance signal; and
  
  a processor coupled to the audio receiving device for determining speaker verification in response to the utterance signal, wherein the processor determines speaker verification in response to a Universal Background Model (UBM)-independent speaker-normalized score bydetermining a text-independent speaker verification score in response to the utterance signal, the text-independent speaker verification score determined in response to a speaker-dependent text-independent Gaussian Mixture Model (GMM) of the utterance;
  
  determining a text-dependent speaker verification score in response to the utterance signal, the text-dependent speaker verification score determined in response to a continuous density Hidden Markov Model (HMM) of the utterance signal aligned by a Viterbi decoding; and
  
  determining the UBM-independent speaker-normalized score in response to a relationship between the text-dependent speaker verification score and the text-independent speaker verification score, the relationship being based on a difference between the text-independent speaker verification score and the text-dependent speaker verification score.
- View Dependent Claims (16)
- - 16. The system in accordance with claim 15 wherein the processor determines the speaker verification in response to a dual-scoring soft decision margin combination of the UBM-independent speaker-normalized score and the text-dependent speaker verification score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Agency For Science Technology and Research
Original Assignee
Agency For Science Technology and Research
Inventors
Larcher, Anthony, Lee, Kong Aik, Ma, Bin, Huong, Thai Ngoc Thuy
Primary Examiner(s)
Ogunbiyi, Oluwadamilola M

Application Number

US13/900,858
Publication Number

US 20130325473A1
Time in Patent Office

1,265 Days
Field of Search

704/249, 704/239, 704/256
US Class Current

1/1
CPC Class Codes

G10L 17/10 Multimodal systems, i.e. ba...

G10L 17/12 Score normalisation

Method and system for dual scoring for text-dependent speaker verification

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

20 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for dual scoring for text-dependent speaker verification

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links