Method of and device for phone-based speaker recognition

US 6,618,702 B1
Filed: 06/14/2002
Issued: 09/09/2003
Est. Priority Date: 06/14/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A device for phone-based speaker recognition, comprising:

at least one phone recognizer for converting input digitized voice signals into a time ordered stream of phones based on at least one linguistic characteristic, with each of said phone recognizers having a voice input, to receive said input digitized voice signals, and an output for transmitting said time ordered stream of phones;

for each of said phone recognizers, a corresponding tokenizer, having an input for receiving said time ordered stream of phones, with each of said tokenizers creating a set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones, and having an output for transmitting said set containing phone n-grams and the number of times each of said phone n-grams occurred;

for each of said tokenizers, a corresponding recognition scorer further comprising;

(a) at least one speaker model scorer, each of said speaker model scorers receives the corresponding set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones and computes a speaker log-likelihood score for each of said phone n-grams in said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones using a corresponding speaker model which contains the number of occurrences of each of said phone n-grams that occurred in a speaker training speech set collected from a particular speaker;

(b) a background model scorer for computing a background log-likelihood score for each of said phone n-grams in said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones using a corresponding backgrounds model which contains the number of occurrences of each of said phone n-grams that occurred in background training speech set collected from many speakers, excluding all of said particular speakers; and

(c) for each of said speaker model scorers, a ratio scorer that produces a speaker log-likelihood ratio from said speaker log-likelihood score and said background log-likelihood score;

for each of said recognition scorers, a corresponding fusion scorer which combines all of said corresponding speaker log-likelihood ratios from said corresponding ratio scorers to produce a single speaker score; and

a speaker selector which evaluates all of said single speaker scores to determine a speaker identity for the speaker of said input digitized voice signals.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language-independent speaker-recognition system based on parallel cumulative differences in dynamic realization of phonetic features ( i.e. , pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from many phone recognizers to perform text independent speaker recognition. A digitized speech signal from a speaker is converted to a sequence of phones by each phone recognizer. Each phone sequence is then modified based on the energy in the signal. The modified phone sequences are tokenized to produce phone n-grams that are compared against a speaker and a background model for each phone recognizer to produce log-likelihood ratio scores. The log-likelihood ratio scores from each phone recognizer are fused to produce a final recognition score for each speaker model. The recognition score for each speaker model is then evaluated to determine which of the modeled speakers, if any, produced the digitized speech signal.

58 Citations

View as Search Results

15 Claims

1. A device for phone-based speaker recognition, comprising:
- at least one phone recognizer for converting input digitized voice signals into a time ordered stream of phones based on at least one linguistic characteristic, with each of said phone recognizers having a voice input, to receive said input digitized voice signals, and an output for transmitting said time ordered stream of phones;
  
  for each of said phone recognizers, a corresponding tokenizer, having an input for receiving said time ordered stream of phones, with each of said tokenizers creating a set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones, and having an output for transmitting said set containing phone n-grams and the number of times each of said phone n-grams occurred;
  
  for each of said tokenizers, a corresponding recognition scorer further comprising;
  
  (a) at least one speaker model scorer, each of said speaker model scorers receives the corresponding set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones and computes a speaker log-likelihood score for each of said phone n-grams in said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones using a corresponding speaker model which contains the number of occurrences of each of said phone n-grams that occurred in a speaker training speech set collected from a particular speaker;
  
  (b) a background model scorer for computing a background log-likelihood score for each of said phone n-grams in said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones using a corresponding backgrounds model which contains the number of occurrences of each of said phone n-grams that occurred in background training speech set collected from many speakers, excluding all of said particular speakers; and
  
  (c) for each of said speaker model scorers, a ratio scorer that produces a speaker log-likelihood ratio from said speaker log-likelihood score and said background log-likelihood score;
  
  for each of said recognition scorers, a corresponding fusion scorer which combines all of said corresponding speaker log-likelihood ratios from said corresponding ratio scorers to produce a single speaker score; and
  
  a speaker selector which evaluates all of said single speaker scores to determine a speaker identity for the speaker of said input digitized voice signals.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The device of claim 1 further comprising a speech activity detector having an input to receive said input digitized voice signals in parallel with said phone recognizers, wherein said speech activity detector detects the absence of speech in said input digitized voice signals and provides a no-speech signal to said phone recognizers when said absence of speech is detected and wherein said phones in said time ordered stream of phones are set to silence tags by said phone recognizers when said no-speech signal is received.
  - 3. The device of claim 1 wherein at least one of said linguistic characteristics is the language spoken by the speaker of said input digitized voice signal.
  - 4. The device of claim 1 wherein at least one of said linguistic characteristics is the sex of the speaker of said input digitized voice signal.
  - 5. The device of claim 1 wherein at least one of said linguistic characteristics is the dialect spoken by the speaker of said input digitized voice signal.
  - 6. The device of claim 1 wherein at least one of said linguistic characteristics is the age of the speaker of said input digitized voice signal.
  - 7. The device of claim 1 wherein at least one of said linguistic characteristics is the education of the speaker of said input digitized voice signal.
  - 8. The device of claim 1 wherein at least one of said linguistic characteristics is the social standing of the speaker of said input digitized voice signal.
  - 9. The device of claim 1 wherein said phone recognizers insert a start tag at the beginning of, and a stop tag at the end of, each of said time ordered stream of phones.
  - 10. The device of claim 2 wherein said phone recognizers replace each consecutive string of at least one of said silence tags by a stop and start tag pair.
  - 11. The device of claim 1 wherein said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones contains only n-grams where n is equal to a single positive integer.
  - 12. The device of claim 1 wherein said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones contains only n-grams where n is equal to one.
  - 13. The device of claim 1 wherein each of said speaker log-likelihood ratios is modified by a weighting function wherein said weighting function is dependent on the number of times a particular of said phones is found in said time ordered stream of phones.
  - 14. The device of claim 1 wherein said each of said corresponding fusion scorer use a Gaussian mixture model with k-means clustering to produce said single speaker score.

15. A method of phone-based speaker recognition, comprising the steps of:
- converting input digitized voice signals into at least one time ordered stream of phones with each of said time ordered stream of phones based on at least one linguistic characteristic;
  
  creating a set containing phone n-grams and the number of times each of said phone n-grams occurred in each of said time ordered stream of phones;
  
  computing a speaker log-likelihood score for each of at least one possible particular speaker for each of said phone n-grams in said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones using a corresponding particular speaker model which contains the number of occurrences of each of said phone n-grams that occurred in a speaker training speech set collected from the particular speaker;
  
  computing a background log-likelihood score for each of said phone n-grams in said set containing phone n-grams and the number of times each of said phone n-grams occurred in said time ordered stream of phones using a corresponding backgrounds model which contains the number of occurrences of each of said phone n-grams that occurred in background training speech set collected from many speakers, excluding all of said particular speakers;
  
  producing a speaker log-likelihood ratio from each of said speaker log-likelihood scores and said background log-likelihood scores;
  
  combining all of said corresponding speaker log-likelihood ratios to produce a single speaker score; and
  
  determining a speaker identity based on an evaluation of all of said single speaker scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States of America As Represented By The Director National Security Agency
Original Assignee
The United States of America As Represented By The Director National Security Agency
Inventors
Kohler, Mary Antoinette, Andrews, Walter Doyle III, Campbell, Joseph Paul Jr.
Primary Examiner(s)
Foster, Roland G.

Application Number

US10/064,155
Time in Patent Office

452 Days
Field of Search

379/88-1-, 704/231, 704/236, 704/239, 704/240, 704/243, 704/246, 704/250, 704/251, 704/255, 704/257
US Class Current

704/250
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 15/14   using statistical models, e...

G10L 17/20   Pattern transformations or ...

G10L 2015/025   Phonemes, fenemes or fenone...

Method of and device for phone-based speaker recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

58 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method of and device for phone-based speaker recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

58 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links