Word model candidate preselection for speech recognition using precomputed matrix of thresholded distance values

US 5,682,464 A
Filed: 01/25/1995
Issued: 10/28/1997
Est. Priority Date: 06/29/1992
Status: Expired due to Term

First Claim

Patent Images

1. In a computer implemented system for recognizing spoken utterances which compares an unknown speech segment represented by a fine sequence of frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a fine sequence of prototype states selected from a preselected set of prototype states, a method of preselecting candidate models comprising:

providing a precalculated matrix of distance metrics relating said prototype frames with said prototype states;

thresholding said matrix by assigning a default value to metrics which do not meet a preselected criterion for being meaningful;

for each prototype frame, forming a list of prototype states for which the distance metric is meaningful;

for each input utterance, generating a fine sequence of prototype frames and a coarse set of input representative frames selected from said fine sequence, the number of representatives being a minor fraction of the number of frames in the corresponding fine sequence of frames and being distributed in position along said fine sequence;

for each input utterance, generating a temporary matrix of distance metrics relating each of said sequence of input representatives to said states by performing the following steps;

(a) setting all entries in said temporary matrix to the default value;

(b) sequentially scanning said input representatives to locate the corresponding lists for included prototype states;

(c) adjusting those entries in said temporary matrix which are included in said corresponding lists; and

subsampling at least a selected portion of said vocabulary models and scoring the subsampled prototype states from said selected models using distance metrics obtained from said temporary matrix, the scoring providing a basis for preselection of candidate models for further processing.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the large vocabulary speech recognition system disclosed herein, a preliminary screening of vocabulary models is provided by applying high speed distance measuring functions. The distance measuring functions utilize subsampled or otherwise reduced representations of the unknown speech segment and the vocabulary models. The initial screening functions achieve very high speed by precalculating, for each utterance, a comparison table of distance values which can be used for all vocabulary models. The building of each comparison table is facilitated by a method which utilizes default values as initial entries and only adjusts entries which are meaningfully different from the default value.

36 Citations

View as Search Results

10 Claims

1. In a computer implemented system for recognizing spoken utterances which compares an unknown speech segment represented by a fine sequence of frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a fine sequence of prototype states selected from a preselected set of prototype states, a method of preselecting candidate models comprising:
- providing a precalculated matrix of distance metrics relating said prototype frames with said prototype states;
  
  thresholding said matrix by assigning a default value to metrics which do not meet a preselected criterion for being meaningful;
  
  for each prototype frame, forming a list of prototype states for which the distance metric is meaningful;
  
  for each input utterance, generating a fine sequence of prototype frames and a coarse set of input representative frames selected from said fine sequence, the number of representatives being a minor fraction of the number of frames in the corresponding fine sequence of frames and being distributed in position along said fine sequence;
  
  for each input utterance, generating a temporary matrix of distance metrics relating each of said sequence of input representatives to said states by performing the following steps;
  
  (a) setting all entries in said temporary matrix to the default value;
  
  (b) sequentially scanning said input representatives to locate the corresponding lists for included prototype states;
  
  (c) adjusting those entries in said temporary matrix which are included in said corresponding lists; and
  
  subsampling at least a selected portion of said vocabulary models and scoring the subsampled prototype states from said selected models using distance metrics obtained from said temporary matrix, the scoring providing a basis for preselection of candidate models for further processing.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as set forth in claim 1 wherein the generation of said coarse set includes subsampling of the fine sequence at a number of spaced positions along said fine sequence.
  - 3. A method as set forth in claim 2 wherein the generation of said coarse set further includes the combining of distances obtained at said subsampled positions with distances obtained at positions adjacent to said subsampled positions thereby to effect a time averaging.
  - 4. A method as set forth in claim 1 wherein the scoring performs a time warping of the coarse set of representatives with the subsampled states.
  - 5. A method as set forth in claim 1 wherein said scoring includes determining, for each subsampled state, the input representative within a predetermined range of representatives which provides the best match.

6. In a computer implemented system for recognizing spoken utterances which compares an unknown speech segment represented by a fine sequence of frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a fine sequence of prototype states selected from a preselected set of prototype states, a method of preselecting candidate models comprising:
- providing a precalculated matrix of distance metrics relating said prototype frames with said prototype states;
  
  thresholding said matrix by assigning a default value to metrics which do not meet a preselected criteria for being meaningful;
  
  for each prototype frame, forming a list of prototype states for which the distance metric is meaningful;
  
  for each input utterance, generating a fine sequence of prototype frames;
  
  dividing said fine sequence into a series of equal segments thereby to obtain a coarse set of input sample positions along said fine sequence, the number of sample positions being a minor fraction of the number of frames in the corresponding fine sequence of frames;
  
  for each input utterance, generating a temporary matrix of distance metrics relating each of said sequence of input sample positions to said states by performing the following steps;
  
  (a) setting all entries in said temporary matrix to the default value;
  
  (b) sequentially scanning a predetermined number of input frames adjacent to and including each input sample position to locate the corresponding lists for included prototype states;
  
  (c) determining the one of said predetermined number of frames which best matches each included prototype state; and
  
  (d) adjusting those entries in said temporary matrix which correspond to said best matches; and
  
  subsampling at least a selected portion of said vocabulary models and scoring the subsampled prototype states from said selected models using distance metrics obtained from said temporary matrix, the scoring providing a basis for preselection of candidate models for further processing.

7. In a computer implemented system for recognizing spoken utterances which compares an unknown speech segment represented by a fine sequence of frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a fine sequence of prototype states selected from a preselected set of prototype states, a method of preselecting candidate models comprising:
- precalculating a matrix of distance metrics relating said prototype frames with said prototype states;
  
  thresholding said matrix by assigning a default value to metrics which do not meet a preselected criterion for being meaningful;
  
  for each prototype frame, forming a list of prototype states for which the distance metric is meaningful;
  
  for each input utterance, generating a fine sequence of prototype frames and a coarse set of a predetermined number of input representative frames selected from said fine sequence, the predetermined number of representatives being a minor fraction of the number of frames in the corresponding fine sequence of frames;
  
  for each input utterance, generating a temporary matrix of distance metrics relating each of said sequence of input representatives to said states by performing the following steps;
  
  (a) setting all entries in said temporary matrix to the default value;
  
  (b) sequentially scanning said input representatives to locate the corresponding lists for included prototype states;
  
  (c) adjusting those entries in said temporary matrix which are included in said corresponding lists; and
  
  for each model to be considered, subsampling the corresponding fine sequence of states to obtain a respective coarse sequence comprising a predetermined number of states;
  
  said predetermined numbers together defining a comparison matrix, there being a preselected region within said matrix which is examined by said method;
  
  for each state in said limited collection, determining for each state position in said comparison matrix the input representative which provides the best match with that state, considering and examining only frames which lie within said preselected region, a measure of the match being stored in a table;
  
  calculating, using said table, for each model to be considered a value representing the overall match of said coarse sequence of frames with the respective coarse sequence of states;
  
  preselecting for accurate comparison those models with the better overall match values as so calculated.
- View Dependent Claims (8)
- - 8. The method as set forth in claim 7 wherein, in determining the input frame which provides the best match for each possible state in each possible matrix position, the method examines not only the respective subsampled frame but also a preselected number of frames which precede and follow the respective subsampled frame in said fine sequence of frames.

9. In a computer implemented system for recognizing spoken utterances which compares an unknown speech segment represented by a fine sequence of frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a fine sequence of prototype states selected from a preselected set of prototype states, a method of preselecting candidate models comprising:
- providing a precalculated matrix of distance metrics relating said prototype frames with said prototype states;
  
  thresholding said matrix by assigning a default value to metrics which do not meet a preselected criteria for being meaningful;
  
  for each prototype frame, forming a list of prototype states for which the distance metric is meaningful;
  
  for each input utterance, generating a fine sequence of prototype frames;
  
  dividing said fine sequence into a series of equal segments thereby to obtain a coarse set of input sample positions along said fine sequence, the number of sample positions being a minor fraction of the number of frames in the corresponding fine sequence of frames;
  
  for each input utterance, generating a first temporary matrix of distance metrics relating each of said sequence of input sample positions to said states by performing the following steps;
  
  (a) setting all entries in said first temporary matrix to a default value;
  
  (b) sequentially scanning said input representatives to locate the corresponding lists for included prototype states;
  
  (c) adjusting those entries in said temporary matrix which are included in said corresponding lists;
  
  for each input utterance, also generating a second temporary matrix of distance metrics relating each of said sequence of input sample positions to said states by performing the following steps;
  
  (d) setting all entries in said second temporary matrix to a default value;
  
  (e) sequentially scanning a predetermined number of input frames adjacent to and including each input sample position to locate the corresponding lists for included prototype states;
  
  (f) determining the one of said predetermined number of frames which best matches each included prototype state; and
  
  (g) adjusting those entries in said temporary matrix which correspond to said best matches; and
  
  subsampling at least a selected portion of said vocabulary models;
  
  scoring the subsampled prototype states from said selected models first using distance metrics obtained from said second temporary matrix; and
  
  selecting a group of the models scoring higher using said second matrix for scoring using distance metrics obtained from said first matrix.
- View Dependent Claims (10)
- - 10. A method as set forth in claim 9 wherein the scoring using distance metrics obtained from said first matrix follows a time warping of said sample positions against said subsampled prototyped states.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Kurzweil Applied Intelligence, Inc. (Intel Corporation)
Inventors
Sejnoha, Vladimir
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/377,948
Time in Patent Office

1,007 Days
Field of Search

395/2.47, 395/2.48, 395/2.49, 395/2.5, 395/2.6, 395/2.61, 395/2.52
US Class Current

704/238
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 2015/085 Methods for reducing search...

Word model candidate preselection for speech recognition using precomputed matrix of thresholded distance values

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

36 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Word model candidate preselection for speech recognition using precomputed matrix of thresholded distance values

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links