Temporal decorrelation method for robust speaker verification
First Claim
1. An automated temporal decorrelation system for speaker voice verification, comprising:
- a collector for receiving speech inputs from an unknown speaker claiming a specific identity into a plurality of input vectors for each word spoken;
a word-level speech feature calculator operable to utilize a temporal decorrelation transformation for generating word-level speech feature vectors from said speech inputs received from said collector thereby creating whole-word vectors which are statistically uncorrelated over entire words with said speech inputs;
word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with said specific identity;
a word-level vector scorer to calculate a similarity score between said word-level speech feature vectors received from said word-level speech feature calculator with those received from said word-level speech feature storage; and
speaker verification decision circuitry for determining, based on said similarity score received from said word-level vector scorer, whether said unknown speaker is said speaker with said specific identity.
3 Assignments
0 Petitions
Accused Products
Abstract
A speaker voice verification system uses temporal decorrelation linear transformation and includes a collector for receiving speech inputs from an unknown speaker claiming a specific identity, a word-level speech features calculator operable to use a temporal decorrelation linear transformation for generating word-level speech feature vectors from such speech inputs, word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with the specific identity, a word-level speech feature vectors received from the unknown speaker with those received from the word-level speech feature storage, and speaker verification decision circuitry for determining, based on the similarity score, whether the unknown speaker'"'"'s identity is the same as that claimed. The word-level vector scorer further includes concatenation circuitry as well as a word-specific orthogonalizing linear transformer. Other systems and methods are also disclosed.
194 Citations
9 Claims
-
1. An automated temporal decorrelation system for speaker voice verification, comprising:
-
a collector for receiving speech inputs from an unknown speaker claiming a specific identity into a plurality of input vectors for each word spoken; a word-level speech feature calculator operable to utilize a temporal decorrelation transformation for generating word-level speech feature vectors from said speech inputs received from said collector thereby creating whole-word vectors which are statistically uncorrelated over entire words with said speech inputs; word-level speech feature storage for storing word-level speech feature vectors known to belong to a speaker with said specific identity; a word-level vector scorer to calculate a similarity score between said word-level speech feature vectors received from said word-level speech feature calculator with those received from said word-level speech feature storage; and speaker verification decision circuitry for determining, based on said similarity score received from said word-level vector scorer, whether said unknown speaker is said speaker with said specific identity. - View Dependent Claims (2, 3, 4)
-
-
5. A temporal decorrelation method for speaker voice verification, comprising the steps of:
-
collecting into a plurality of input vectors a verification utterance from an unknown speaker claiming a specific identity; transforming said plurality of input vectors using a temporal decorrelation transformation to establish a word-level speech feature vectors thereby creating whole-word vectors which are statistically uncorrelated over entire words with said utterance; retrieving previously stored word-level speech feature vectors known to belong to a speaker with said specific identity; scoring said word-level speech feature vectors generated during said step of establishing with said previously stored word-level speech feature vectors; and determining whether said unknown speaker is said speaker with said specific identity. - View Dependent Claims (6, 7, 8)
-
-
9. A temporal decorrelation method for reducing the amount of storage necessary for speaker specific speech information, comprising the steps of:
-
establishing word-level speech feature vectors having a dimension from a spoken utterance; reducing the dimension of said word-level speech feature vectors by applying a temporal decorrelation linear transformation to said word-level feature vectors; and storing said word-level feature vectors.
-
Specification