Temporal decorrelation method for robust speaker verification

US 5,167,004 A
Filed: 02/28/1991
Issued: 11/24/1992
Est. Priority Date: 02/28/1991
Status: Expired due to Term

First Claim

Patent Images

1. An automated temporal decorrelation system for speaker voice verification, comprising:

a collector for receiving speech inputs from an unknown speaker claiming a specific identity into a plurality of input vectors for each word spoken;

a word-level speech feature calculator operable to utilize a temporal decorrelation transformation for generating word-level speech feature vectors from said speech inputs received from said collector thereby creating whole-word vectors which are statistically uncorrelated over entire words with said speech inputs;

word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with said specific identity;

a word-level vector scorer to calculate a similarity score between said word-level speech feature vectors received from said word-level speech feature calculator with those received from said word-level speech feature storage; and

speaker verification decision circuitry for determining, based on said similarity score received from said word-level vector scorer, whether said unknown speaker is said speaker with said specific identity.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker voice verification system uses temporal decorrelation linear transformation and includes a collector for receiving speech inputs from an unknown speaker claiming a specific identity, a word-level speech features calculator operable to use a temporal decorrelation linear transformation for generating word-level speech feature vectors from such speech inputs, word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with the specific identity, a word-level speech feature vectors received from the unknown speaker with those received from the word-level speech feature storage, and speaker verification decision circuitry for determining, based on the similarity score, whether the unknown speaker'"'"'s identity is the same as that claimed. The word-level vector scorer further includes concatenation circuitry as well as a word-specific orthogonalizing linear transformer. Other systems and methods are also disclosed.

194 Citations

9 Claims

1. An automated temporal decorrelation system for speaker voice verification, comprising:
- a collector for receiving speech inputs from an unknown speaker claiming a specific identity into a plurality of input vectors for each word spoken;
  
  a word-level speech feature calculator operable to utilize a temporal decorrelation transformation for generating word-level speech feature vectors from said speech inputs received from said collector thereby creating whole-word vectors which are statistically uncorrelated over entire words with said speech inputs;
  
  word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with said specific identity;
  
  a word-level vector scorer to calculate a similarity score between said word-level speech feature vectors received from said word-level speech feature calculator with those received from said word-level speech feature storage; and
  
  speaker verification decision circuitry for determining, based on said similarity score received from said word-level vector scorer, whether said unknown speaker is said speaker with said specific identity.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein said word-level speech feature calculator employs HMM alignment to map said input speech vectors to speaker independent reference model vectors which correspond to each word associated with the speaker with the claimed identity.
  - 3. The system of claim 1, wherein said word-level vector scorer further comprises concatenation circuitry for concatenating said plurality of input vectors making up a single word to form single vectors representing whole words in said speech inputs.
  - 4. The system of claim 1, wherein said similarity score is a sum, over all words, of Euclidean distances between said word-level speech feature vectors from said word-level speech feature calculator and those which were stored in said word-level speech feature storage.

5. A temporal decorrelation method for speaker voice verification, comprising the steps of:
- collecting into a plurality of input vectors a verification utterance from an unknown speaker claiming a specific identity;
  
  transforming said plurality of input vectors using a temporal decorrelation transformation to establish a word-level speech feature vectors thereby creating whole-word vectors which are statistically uncorrelated over entire words with said utterance;
  
  retrieving previously stored word-level speech feature vectors known to belong to a speaker with said specific identity;
  
  scoring said word-level speech feature vectors generated during said step of establishing with said previously stored word-level speech feature vectors; and
  
  determining whether said unknown speaker is said speaker with said specific identity.
- View Dependent Claims (6, 7, 8)
- - 6. The method of claim 5, wherein said step of establishing word-level speech feature vectors further comprises the step of employing HMM alignment to map said input speech vectors to speaker independent reference model vectors which correspond to each word associated with said speaker with the claimed identity.
  - 7. The method of claim 5, wherein said step of scoring further comprises the step of concatenating said plurality of input vectors making up a single word to form single vectors representing whole words in said utterance.
  - 8. The method of claim 5, wherein said similarity score is a sum, over all words, of Euclidean distances between said word-level speech feature vectors from said word-level speech feature calculator and those which were stored in said word-level speech feature storage.

9. A temporal decorrelation method for reducing the amount of storage necessary for speaker specific speech information, comprising the steps of:
- establishing word-level speech feature vectors having a dimension from a spoken utterance;
  
  reducing the dimension of said word-level speech feature vectors by applying a temporal decorrelation linear transformation to said word-level feature vectors; and
  
  storing said word-level feature vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The General Hospital Corporation (Mass General Brigham, Inc.), Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Netsch, Lorin P., Doddington, George R.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/662,086
Time in Patent Office

635 Days
Field of Search

381/41-43
US Class Current

704/200
CPC Class Codes

G10L 17/02 Preprocessing operations, e...

G10L 17/20 Pattern transformations or ...

Temporal decorrelation method for robust speaker verification

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

194 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Temporal decorrelation method for robust speaker verification

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

194 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links