Method and apparatus for automatic recognition using features encoded with product-space vector quantization

US 6,256,607 B1
Filed: 09/08/1998
Issued: 07/03/2001
Est. Priority Date: 09/08/1998
Status: Expired due to Term

First Claim

Patent Images

1. A system for assigning codeword bits among a number of feature vectors to be used in automatic recognition comprising:

a front end encoder for receiving a physical signal;

a feature extraction engine for converting said signal into a series of digitally encoded numerical feature vectors, said feature vectors selected in order to perform recognition, each of said feature vectors comprising at least two separable numerical parameters;

a subvector quantizer for dividing said feature vectors into a number of subvectors and for performing vectors quantization on said subvectors based a first assignment of bit numbers to each subvector in order to assign a codeword to each subvector to approximate said each subvector, a recognition engine for performing recognition using said codewords representative of said quantized subvectors to produce a sequence of labels;

memory for storing a plurality of statistical models with trained parameters;

a tester for measuring recognition performance based on comparison of said labels with the corresponding pre-transcribed labels of said physical signal from a development set of the tester; and

feedback means from said tester to said subvector quantizer, for feeding back performance criteria;

wherein said subvector quantizer is further operative in response to said performance criteria to assign additional bits to said subvectors incrementally until the desired level of recognition performance is reached or a threshold of assigned bits is reached.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic recognition system and method divides observation vectors into subvectors and determines a quantization index for the subvectors. Subvector indices can then be transmitted or otherwise stored and used to perform recognition. In a further embodiment, recognition probabilities are determined for subvectors separately and these probabilities are combined to generate probabilities for the observed vectors. An automatic system for assigning bits to subvector indices can be used to improve recognition.

94 Citations

View as Search Results

19 Claims

1. A system for assigning codeword bits among a number of feature vectors to be used in automatic recognition comprising:
- a front end encoder for receiving a physical signal;
  
  a feature extraction engine for converting said signal into a series of digitally encoded numerical feature vectors, said feature vectors selected in order to perform recognition, each of said feature vectors comprising at least two separable numerical parameters;
  
  a subvector quantizer for dividing said feature vectors into a number of subvectors and for performing vectors quantization on said subvectors based a first assignment of bit numbers to each subvector in order to assign a codeword to each subvector to approximate said each subvector, a recognition engine for performing recognition using said codewords representative of said quantized subvectors to produce a sequence of labels;
  
  memory for storing a plurality of statistical models with trained parameters;
  
  a tester for measuring recognition performance based on comparison of said labels with the corresponding pre-transcribed labels of said physical signal from a development set of the tester; and
  
  feedback means from said tester to said subvector quantizer, for feeding back performance criteria;
  
  wherein said subvector quantizer is further operative in response to said performance criteria to assign additional bits to said subvectors incrementally until the desired level of recognition performance is reached or a threshold of assigned bits is reached.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system according to claim 1 wherein said stored statistical models are associated with a subunit of speech and represent that subunit of speech as a plurality of states, each state having associated with it a probability function, the probability functions having parameters determined from training data, the probability function producing a probability that a given set of speech data is representative of that particular state, the recognized known labels comprising words in the recognition database.
  - 3. The method according to claim 2 wherein the probability functions are stored in the system as a mixture of simple probability functions.
  - 4. The system according to claim 3 wherein the simple probability functions are discrete pre-computed probability values retrieved by a table look-up.
  - 5. The system according to claim 3 wherein the simple probability functions are Gaussians.
  - 6. The system according to claim 5 wherein the speaker independent probability functions are mixtures of Gaussians having the form $p_{SI} (y_{t} \rangle$
    - 
      
      s)=∑
      
      i
      
      ρ
      
      (ω
      
      i
      
      
      
      st)
      
      N
      
      (yt;
      
      μ
      
      ig,Σ
      
      ig).
  - 7. The system according to claim 1 wherein each of said stored statistical models is associated with a subunit of speech.

8. A recognition system for automatically recognizing physical signals and deriving known labels comprising:
- a front end encoder for receiving a physical signal;
  
  a feature extraction engine for converting said signal into a series of digitally encoded numerical feature vectors, said vectors selected in order to perform recognition, each of said vectors comprised of at least two separable numerical parameters;
  
  a subvector quantizer for separating said feature vectors into at least two subvectors and for determining a codeword for each subvector to approximate said subvector;
  
  a channel for transmitting codewords for said subvectors to a recognition engine;
  
  memory for storing a plurality of statistical models with trained parameters; and
  
  a recognition engine capable of using said stored statistical models to recognize known labels from a set of unidentified feature vectors wherein said recognition engine performs vector quantized subvector recognition using discreet HMMs having the form;
  
  $\begin{matrix} P_{s} (X_{t}) = \sum_{i = 1}^{32} λ_{i} \cdot P_{si} ({VQ}_{1} = k_{1}) \cdot P_{si} ({VQ}_{2} = k_{2}) \cdot \\ P_{si} ({VQ}_{3} = k_{3}) \cdot P_{si} ({VQ}_{4} = k_{4}) \cdot P_{si} ({VQ}_{N} = k_{N}) \end{matrix}$ where P_s(X_t) is the probability for a particular model state s that X_twas produced by that state, λ
  
  _iis the weight of the i-th mixture component, k_lis the codebook index observed at time t for the first subvector, and P_si(YQ_l=k_l) is the probability that the first subvector index is k_l, derived from a table lookup for this model state and mixture component i.

9. A method for assigning codewords bits among a number of feature vectors to be used in automatic recognition comprising:
- dividing an observation vector into a number of subvectors;
  
  assigning a first set of bit numbers to each subvector;
  
  performing vector quantization on said subvectors;
  
  performing recognition using said quantized subvectors;
  
  measuring recognition performance;
  
  assigning additional bits to subvectors incrementally until the desired recognition performance is reached or a threshold of assigned bits is reached; and
  
  selecting the bit values that achieve the most desired performance.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The method according to claim 9 wherein said recognition performing step comprises using stored statistical models with trained parameters for computing likelihoods of said quantized subvectors.
  - 11. The method according to claim 10 wherein said stored statistical models are associated with a subunit of speech and represent that subunit of speech as a plurality of states, each state having associated with it a probability function, the probability functions having parameters determined from training data, the probability function producing a probability that a given set of speech data is representative of that particular state, the recognized known labels comprising words in the recognition database.
  - 12. The method according to claim 11 wherein the probability functions are stored in the system as a mixture of simple probability functions.
  - 13. The method according to claim 12 wherein the simple probability functions are discrete pre-computed probability values retrieved by a table look-up.
  - 14. The system according to claim 12 wherein the simple probability functions are Gaussians.
  - 15. The method according to claim 14 wherein the speaker independent probability functions are mixtures of Gaussians having the form:
    - $p_{SI} (y_{t} \rangle s) = \sum_{i} ρ (ω_{i} \rangle s_{t}) N (y_{t}; μ_{ig}, Σ_{ig}) .$
  - 16. The method according to claim 10 wherein each of said stored statistical models is associated with a subunit of speech.
  - 17. The method according to claim 10 wherein said recognition performing step is vector quantized subvector recognition using discreet HMMs having the form:
    - $\begin{matrix} P_{s} (X_{t}) = \sum_{i = 1}^{32} λ_{i} \cdot P_{si} ({VQ}_{1} = k_{1}) \cdot P_{si} ({VQ}_{2} = k_{2}) \cdot \\ P_{si} ({VQ}_{3} = k_{3}) \cdot P_{si} ({VQ}_{4} = k_{4}) \cdot P_{si} ({VQ}_{N} = k_{N}) \end{matrix}$

18. A method for developing models in a recognition system for responding to data representative of captured physical speech, comprising the steps of:
- selecting a multi-state model with state probability functions, said state probability functions being of a general form with initially undetermined parameters, said models divided into subvector models for recognizing subparts of observation vectors;
  
  creating individual instances of a model for each subunit of speech to be processed;
  
  using training data from a plurality of speakers to determine acoustic features of states of said models and to estimate probability density functions for said models;
  
  clustering states based on their acoustic similarity;
  
  creating a plurality of cluster codebooks, said cluster codebooks consisting of probability density functions that are shared by each cluster'"'"'s states; and
  
  reestimating the probability densities of each cluster codebook and the parameters of the probability equations in each cluster.

19. A method for developing models in a recognition system for responding to data representative of captured physical speech, comprising the steps of:
- selecting a multi-state model with state probability functions, said state probability functions being of a general form with initially undetermined parameters, said models divided into subvector models for recognizing subparts of observation vectors, wherein said observation computation is based on performing an iteration of a forward-backward algorithm on the training speech data and is of the following form for every state s and mixture component i and at every time t;
  
  $γ_{t} (s, i) = \frac{α_{t} (s) β_{t} (s)}{\sum_{s^{'}} α_{t} (s^{'}) β_{t} (s^{'})} \cdot \frac{\begin{matrix} λ_{i} \cdot P_{si} (V Q_{1} = k_{1}) \cdot P_{si} (V Q_{2} = k_{2}) \cdot \\ P_{si} (V Q_{3} = k_{3}) \cdot P_{si} (V Q_{4} = k_{4}) \cdot P_{si} (V Q_{N} = k_{N}) \end{matrix}}{\begin{matrix} \sum_{j = 1}^{32} λ_{j} \cdot P_{sj} (V Q_{1} = k_{1}) \cdot P_{sj} (V Q_{2} = k_{2}) \cdot \\ P_{sj} (V Q_{3} = k_{3}) \cdot P_{sj} (V Q_{4} = k_{4}) \cdot P_{sj} (V Q_{N} = k_{N}) \end{matrix}}$ where the quantities α
  
  _t(s),β
  
  _t(s) are the alpha and beta probabilities that are computed with the forward-backward algorithm, and the probabilities of the subvectors are computed using the previous estimates of the model parameters;
  
  thereafter computing new estimates for the subvector probabilities using the following formula;
  
  $P_{si} (V Q_{1} = k_{1}) = \frac{\sum_{Times t where index of first subvector is k1} γ_{t} (s, i)}{\sum_{Times t} γ_{t} (s, i)}$ thereafter updating similarly the probabilities of all the subvectors for all states s and mixtures i;
  
  thereafter replacing previous values of said subvector probabilities with new estimates until a predefined convergence criterion is not met, thereafter creating individual instances of a model for each subunit of speech to be processed;
  
  using training data from a plurality of speakers to determine acoustic features of states of said models and to estimate probability density functions for said models;
  
  clustering states based on their acoustic similarity;
  
  creating a plurality of cluster codebooks said cluster codebooks consisting of probability density functions that are shared by each clusters states; and
  
  reestimating the probability densities of each cluster codebook and the parameters of the probability equations in each cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Digalakis, Vassilios, Neumeyer, Leonardo, Tsakalidis, Stavros, Perakakis, Manolis
Primary Examiner(s)
Korzuch, William R.
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/149,844
Time in Patent Office

1,029 Days
Field of Search

704/201, 704/222, 704/229, 704/230, 704/240, 704/245, 704/254, 704/255, 704/243, 704/244
US Class Current

704/222
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 15/144 Training of HMMs

Method and apparatus for automatic recognition using features encoded with product-space vector quantization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

94 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for automatic recognition using features encoded with product-space vector quantization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

94 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others