Method and apparatus for recognizing spoken words in a speech signal by organizing the vocabulary in the form of a tree

US 5,995,930 A
Filed: 11/19/1996
Issued: 11/30/1999
Est. Priority Date: 09/14/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method for processing a sequence of words in a speech signal for speech recognition, said method comprising the steps of:

sampling, at recurrent instants, said speech signal for generating a series of test signals;

generating a signal-by-signal matching and scoring between said test signals and a series of reference signals, each of said series of reference signals forming one of a plurality of vocabulary words arranged as a vocabulary tree with a root, and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end;

determining at least one complete word for a particular test signal;

for each completed word, separately;

forming a word result including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words;

storing said aggregate score starting at said root with a reference to said completed word;

proceeding with said signal-by-signal matching and scoring between subsequent test signals and said series of reference signals for each of a plurality of words completed for a particular test signal.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for processing a sequence of words in a speech signal for speech recognition. The method includes the steps of sampling, at recurrent instants, said speech signal for generating a series of test signals. Signal-by-signal matching and scoring is generated between the test signals and a series of reference signals, where each of the series of reference signals forms one of a plurality of vocabulary words arranged as a vocabulary tree. The vocabulary tree includes a root and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end. Acoustic recombination determines both continuations of branches and the most probable partial hypotheses within a word because of the use of a vocabulary built up as a tree with branches having reference signals. At least one complete word for a particular test signal is determined, and, separately, for each completed word, there is: I) a word result formed including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words.

Citations

14 Claims

1. A method for processing a sequence of words in a speech signal for speech recognition, said method comprising the steps of:
- sampling, at recurrent instants, said speech signal for generating a series of test signals;
  
  generating a signal-by-signal matching and scoring between said test signals and a series of reference signals, each of said series of reference signals forming one of a plurality of vocabulary words arranged as a vocabulary tree with a root, and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end;
  
  determining at least one complete word for a particular test signal;
  
  for each completed word, separately;
  
  forming a word result including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words;
  
  storing said aggregate score starting at said root with a reference to said completed word;
  
  proceeding with said signal-by-signal matching and scoring between subsequent test signals and said series of reference signals for each of a plurality of words completed for a particular test signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method as claimed in claim 1, wherein the step of storing includes intermediately storing a first list having a varying number of lines, each line including an indication to a new part of a first memory and an indication to said completed word.
  - 3. The method as claimed in claim 2, wherein each new part of said first memory includes a predetermined number of memory locations, each of said memory locations containing an indication to a reference signal in said vocabulary tree and a respective score.
  - 4. The method as claimed in claim 2 wherein:
    - each indication in the first list to said new part of said first memory, comprises an indication to a number of lines of a branch list;
      
      each branch list line comprises an indication to a number of lines of a search list;
      
      each search list line comprises an indication to at least one reference signal to be used for matching and scoring with a next test signal;
      
      an indication is given to a predecessor word and score; and
      
      addressing is via the first list and the branch list.
  - 5. The method as claimed in claim 4, further comprising the steps of:
    - after executing said matching and scoring for each of said test signals, reading out all lines of said search list and adding scores of each reference signal indicated by each line of the search list, comparing the aggregate score of the search list with a first threshold, and writing back only such lines of said search list for which the aggregate score does not surpass said first threshold; and
      
      discarding any line of said branch list indicating only discarded lines of said search list.
  - 6. The method as claimed in claim 5, wherein:
    - each time a search list line of the search list which has an indication to a last reference signal of a series of reference signals corresponding to a word is written back, an indication to this word and an indication to the corresponding at least one predecessor word and the respective score is written into a new location of a word end list;
      
      a language model value is added to each respective score to obtain a resulting score, thereby discarding all same ending words except an end word having a lowest resulting score; and
      
      not discarded locations of the word end list are transferred into new locations of a second memory.
  - 7. The method as claimed in claim 6, further comprising the steps of:
    - generating for each line written back into the search list and indicating a last reference signal of a first tree branch, a line of the branch list containing indications to a number of lines in the search list, each line of the search list containing indications to reference signals for further tree branches following said first tree branch; and
      
      generating for each new word contained in a location of the word end list transferred to the second memory, a new line in the first list containing indications to a number of new lines of the branch list, each line of the branch list containing indications to a number of lines of the search list, each line of the search list containing indications to reference values of the first branches of the vocabulary tree.
  - 8. The method as claimed in claim 5, further comprising the steps of:
    - generating for each line written back into the search list and indicating a last reference signal of a first tree branch, a line of the branch list containing indications to a number of lines in the search list, each line of the search list containing indications to reference signals of further tree branches following said first tree branch; and
      
      generating for each new word contained in a location of the word end list transferred to the second memory a new line in the first list containing indications to a number of lines of the branch list, each line of the branch list containing indications to a number of lines of the search list, each line of the search list containing indications to reference values of the first branches of the vocabulary tree.
  - 9. The method as claimed in claim 2, further including the step of comparing all scores with a first threshold derived from a minimum score and discarding stored scores for which the stored score surpasses said first threshold.
  - 10. The method as claimed in claim 1, further including the step of comparing all stored scores with a first threshold derived from a minimum score and discarding stored scores for which the stored score surpasses said first threshold.
  - 11. The method as claimed in claim 10, wherein:
    - each indication in the first list to said new part of said first memory comprises an indication to a number of lines of a branch list;
      
      each branch line list comprises an indication to a number of lines of search list;
      
      each search list line comprises an indication to at least one reference signal to be used for matching and scoring with a next test signal;
      
      an indication is given to the predecessor word test signal; and
      
      addressing is via the first list and the branch list.
  - 12. The method as claimed in claim 10, wherein:
    - each indication in the first list to said new part of said first memory, comprises an indication to a number of lines of a branch list;
      
      each branch list line comprises an indication to a number of lines of search list;
      
      each search list line comprises an indication to at least one reference signal to be used for matching and scoring with the next test signal;
      
      an indication is given to the predecessor word test signal; and
      
      addressing is via the first list and the branch list.

13. An apparatus for processing a sequence of words in a speech signal for speech recognition, comprising:
- means for sampling, at recurrent instants, said speech signal for generating a series of test signals;
  
  means for generating a signal-by-signal matching and scoring between said test signals and a series of reference signals, each of said series of reference signals forming one of a plurality of vocabulary words arranged as a vocabulary tree with a root, and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and as assigned to speech element and any vocabulary word is assigned to a particular branch junction or branch end;
  
  means for determining at least one complete word for a particular test signal;
  
  for each completed word, means for separately;
  
  forming a word result including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words;
  
  storing said aggregate score starting at said root with a reference to said completed word;
  
  proceeding with said signal-by-signal matching and scoring between subsequent test signals and said series of reference signals for each of a plurality of completed words for a particular test signal.

14. An apparatus for processing a sequence of words in a speech signal for speech recognition, comprising:
- sampling means for, at recurrent instants, sampling said speech signal for generating a set of test signals;
  
  tree storage means for storing a vocabulary tree that has a root and a plurality of branches, any branch comprising a series of one or more reference signals and being assigned to a speech element and any vocabulary word being assigned to a particular branch junction or branch end as being represented by a string of series of reference signals from said root to the particular branch junction or particular branch end;
  
  model storage means for storing a plurality of language model values, each value uniquely assigned to a particular vocabulary word and a uniform-length string of prior completed words;
  
  matching-and-scoring means fed by said sampling means, by said tree storage means and by said model storage means for executing a signal-by-signal matching and scoring between subsequent test signals and various strings thus determining at least one complete word, deriving a word result comprising a word score, an aggregate score derived from said word score and the language model score assigned to the particular word and a respective string of prior completed words, and a reference to a last word of said respective string of prior completed words;
  
  copying means fed by said matching-and-scoring means for separately copying further non-identical word results from said matching-and-scoring means between subsequent test signals and said series of reference signals for each of a plurality of words completed for a particular test signal from the root of said tree into an intermediate memory;
  
  decision means fed by said matching-and-scoring means and by said intermediate memory for selectively continuing or not continuing said series of reference signals based on the derived aggregate scores; and
  
  recognition decision means fed by said matching-and-scoring means and by said intermediate memory for recognizing speech of said speech signal based on a minimum score among the derived aggregate score for each non-identical word from the root of the tree.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Ney, Hermann, Hab-Umbach, Reinhold
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US08/751,377
Time in Patent Office

1,106 Days
Field of Search

395/2.5, 395/2.54, 395/2.6-2.66, 395/2.4, 395/2.09, 395/2.49, 382/170
US Class Current

704/254
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/187 Phonemic context, e.g. pron...

Method and apparatus for recognizing spoken words in a speech signal by organizing the vocabulary in the form of a tree

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for recognizing spoken words in a speech signal by organizing the vocabulary in the form of a tree

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links