Method and apparatus for recognizing spoken words in a speech signal

US 5,613,034 A
Filed: 09/26/1994
Issued: 03/18/1997
Est. Priority Date: 09/14/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method for recognizing a sequence of words in a speech signal, said method comprising the steps of:

at recurrent instants, sampling said speech signal for generating a series of test signals and executing a signal-by-signal matching and scoring between said test signals and various series of reference signals from a unitary set of reference signal series that each represent a vocabulary word;

assigning a first score to any first string based on a first test signal of preliminarily recognized words terminating at said first test signal;

as from a particular second test signal subsequent to said first test signal, continuing said signal-by-signal matching and scoring for appropriate further series of reference signals of the above set so as to take along a backpointer to said first test signal indicating the terminated words until attainment of a subsequent word termination at a further first test signal, each such further series representing said subsequent word so producing a sub-score;

for each such further series, retrieving an n-gram language model score (n≧

2) determined through a combined identity of said subsequent word and of (n-1) most recent vocabulary words at the preliminary recognized word string indicated by said backpointer;

adding said first score, said sub-score and said language model score for producing further said first score, an indication of said further first score, an indication of said subsequent word and said backpointer being stored in a region of a result list associated to said further first test signal;

selecting at least one minimum first score for so preliminarily recognizing a word string associated with said minimum first score.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the recognition of coherent speech, language models are favourably used to increase the reliability of recognition, which models, for example, take into account the probabilities of word combinations, especially of word pairs. For this purpose, a language model value corresponding to this probability is added at boundaries between words. In several recognition methods, for example, when the vocabulary is built up from phonemes in the shape of a tree, it is not known at the start of the continuation of a hypothesis after a word end which word will actually follow, so that a language model value cannot be taken into account until at the end of the next word. Measures are given for achieving this in such a manner that as far as possible the optimal preceding word or the optimal preceding word sequence is taken into account for the language model value without the necessity of constructing a copy of the searching tree for each and every simultaneously ending preceding word sequence.

Citations

8 Claims

1. A method for recognizing a sequence of words in a speech signal, said method comprising the steps of:
- at recurrent instants, sampling said speech signal for generating a series of test signals and executing a signal-by-signal matching and scoring between said test signals and various series of reference signals from a unitary set of reference signal series that each represent a vocabulary word;
  
  assigning a first score to any first string based on a first test signal of preliminarily recognized words terminating at said first test signal;
  
  as from a particular second test signal subsequent to said first test signal, continuing said signal-by-signal matching and scoring for appropriate further series of reference signals of the above set so as to take along a backpointer to said first test signal indicating the terminated words until attainment of a subsequent word termination at a further first test signal, each such further series representing said subsequent word so producing a sub-score;
  
  for each such further series, retrieving an n-gram language model score (n≧
  
  2) determined through a combined identity of said subsequent word and of (n-1) most recent vocabulary words at the preliminary recognized word string indicated by said backpointer;
  
  adding said first score, said sub-score and said language model score for producing further said first score, an indication of said further first score, an indication of said subsequent word and said backpointer being stored in a region of a result list associated to said further first test signal;
  
  selecting at least one minimum first score for so preliminarily recognizing a word string associated with said minimum first score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method as claimed in claim 1, characterized in that an indication of every region of the results list is stored in a separate list portion of an auxiliary list for each test signal.
  - 3. A method as claimed in claim 1, characterized in that with each further first score an indication as to the corresponding further first test signal is stored in the results list.
  - 4. A method as claimed in claim 1, characterized in that a list position within the region of the results list at which the indication of the minimum further first score is stored is carried along during continuing said matching and scoring as the backpointer.
  - 5. A method as claimed in claim 2, characterized in that the list position of the auxiliary list associated to the further first test signal is carried along as the backpointer.
  - 6. A method as claimed in claim 5, characterized in that to determine the sub-score, the minimum first score is subtracted from the score at the relevant subsequent word termination.
  - 7. A method as claimed in claim 5, characterized in that the indication of the further first score is the difference between the score obtained at the terminating particular word and the minimum score of all words ending at the further first test signal, and in that each further first score is formed from the sum of the score at the end of a sequence, from the first score, and from the relevant language model value.

8. A device recognizing a sequence of words in a speech signal comprising:
- input means for receiving a speech signal;
  
  sampling means fed by said input means for at recurrent sampling said speech signal and at each instant;
  
  matching and scoring means fed by said sampling means and provided with first storage means for storing a unitary set of series of reference signals, each such series representing a vocabulary word, and second storage means for storing a set of n-gram (n≧
  
  2) language model scores, each score pertaining to a sub-string of (n) most recent vocabulary words;
  
  first score means fed by said matching and scoring means for, at a particular first test signal, assigning a first score to any string of preliminary recognized words terminating at said first test signal;
  
  second score means fed by said first score means and by said second storage means, for incrementing any such first score by the appropriate language model score pertaining to the n most recently recognized vocabulary words of said string;
  
  memory means for storing incremented scores and associated words and backpointers;
  
  selecting means fed by said first memory means for selecting minimal score strings among those coexistently assigned and incremented by said first and second score means; and
  
  output means fed by said selecting means for outputting a selected absolute minimal score string for further usage.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Steinbiss, Volker, Ney, Hermann
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/312,495
Time in Patent Office

904 Days
Field of Search

395/2, 395/2.63, 395/2.6-2.62, 381/43
US Class Current

704/251
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/197 Probabilistic grammars, e.g...

Method and apparatus for recognizing spoken words in a speech signal

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for recognizing spoken words in a speech signal

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links