Single-count backing-off method of determining N-gram language model values

US 5,745,876 A
Filed: 05/02/1996
Issued: 04/28/1998
Est. Priority Date: 05/05/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A method of determining the language model values for deriving word sequences from a speech signal by deriving training signals from the speech signal, comparing the training signals with sequences of reference signals which each correspond to a respective word of a predetermined vocabulary in order to derive scores, and incrementing each score by a language model value for each transition from one word to another word, the language model value indicating the relative probability of word sequences of a predetermined number of defined, successive words, the method comprising:

(a) in a training phase, determining the language model values of at least a part of all feasible word sequences from a predetermined training speech signal by counting the frequency of occurrence of individual word sequences, and(b) deriving the language model values for complete word sequences which are not present in the training speech signal from the frequencies of word sequences which have been reduced by the first word and which are present in complete word sequences which have occurred at least once in the training speech signal, in such a manner that each different, complete word sequence is taken into account no more than once for determining the frequency of the reduced word sequences present therein, irrespective of the actual frequency of occurrence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

For the recognition of coherently spoken speech with a large vocabulary, language model values which take into account the probability of word sequences are considered at word transitions. Prior to the recognition, these language model values are derived on the basis of training speech signals. If the amount of training data is kept within sensible limits, not all word sequences will actually occur, so that the language model values for, for example an N-gram language model must be determined from word sequences of N-1 words actually occurring. In accordance with the invention, these reduced word sequences from each different, complete word sequence are counted only once, irrespective of the actual frequency of occurrence of the complete word sequence or only reduced training sequences which occur exactly once in the training data are taken into account.

Citations

3 Claims

1. A method of determining the language model values for deriving word sequences from a speech signal by deriving training signals from the speech signal, comparing the training signals with sequences of reference signals which each correspond to a respective word of a predetermined vocabulary in order to derive scores, and incrementing each score by a language model value for each transition from one word to another word, the language model value indicating the relative probability of word sequences of a predetermined number of defined, successive words, the method comprising:
- (a) in a training phase, determining the language model values of at least a part of all feasible word sequences from a predetermined training speech signal by counting the frequency of occurrence of individual word sequences, and(b) deriving the language model values for complete word sequences which are not present in the training speech signal from the frequencies of word sequences which have been reduced by the first word and which are present in complete word sequences which have occurred at least once in the training speech signal, in such a manner that each different, complete word sequence is taken into account no more than once for determining the frequency of the reduced word sequences present therein, irrespective of the actual frequency of occurrence.
- View Dependent Claims (2, 3)
- - 2. A method as claimed in claim 1, wherein reduced word sequences exclusively from complete sequences having occurred exactly once in the training speech signals are taken into account for the language model values for word sequences which have not occurred.
  - 3. A method as claimed in claim 1, wherein the reduced word sequences from each different complete word sequence occurring in the training speech signal are taken into account exactly once.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Kneser, Reinhard, Ney, Hermann
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/642,012
Time in Patent Office

726 Days
Field of Search

395/2.64, 395/2.66, 704/255, 704/257
US Class Current

704/255
CPC Class Codes

G10L 15/197 Probabilistic grammars, e.g...

Single-count backing-off method of determining N-gram language model values

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Single-count backing-off method of determining N-gram language model values

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links