Method and system using separate context and constituent probabilities for speech recognition in languages with compound words

US 5,797,122 A
Filed: 11/18/1996
Issued: 08/18/1998
Est. Priority Date: 03/20/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A method for speech recognition in languages with compound words, comprising the following steps:

storing phonetic transcriptions of words and components of compound words in a first storage area,calculating n-gram frequencies (language model) for the probability of a compound word within a sequence of N words with use of a previously processed body of text, and storing the frequencies in a second storage area;

recording and digitizing the acoustic speech signal and storing the digitized speech signal in a third storage area, wherein by means of signal processing based on the phonetic transcriptions, approximately determining the words and boundaries of compound words and deriving hypothetical sequences of words or candidates for compound words therefrom;

establishing separate processing paths for sequences of candidates for words and compound words;

statistically evaluating the processing paths by means of the n-gram frequencies, where likelihood profiles are generated from the sequence of n-gram frequencies of words or components of compound words of each processing path; and

fully evaluating the processing paths with regard to the goodness of acoustic fit and the statistical probability of the language model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a method and system for speech recognition in the case of languages containing compound words only components of compound words are stored in a language model. Only these components are handled in the vocabulary.

In recognizing possible compound words separate processing paths are set up for the corresponding components of compound words and for possible individual words, in which specific language model statistics are calculated. The basis for the language model statistics is the breakdown of the probabilities, in which the context and the constituents of a compound word are taken into account separately. For this, use is made of the fact, known from linguistics, that grammar-determining components of a compound word are, as a rule, to be found at the end of the compound word, where this constituent of the compound word provides information on gender, case and number of the compound word.

The invention is particularly suitable for real-time speech recognition in discrete and continuous dictation.

36 Citations

16 Claims

1. A method for speech recognition in languages with compound words, comprising the following steps:
- storing phonetic transcriptions of words and components of compound words in a first storage area,calculating n-gram frequencies (language model) for the probability of a compound word within a sequence of N words with use of a previously processed body of text, and storing the frequencies in a second storage area;
  
  recording and digitizing the acoustic speech signal and storing the digitized speech signal in a third storage area, wherein by means of signal processing based on the phonetic transcriptions, approximately determining the words and boundaries of compound words and deriving hypothetical sequences of words or candidates for compound words therefrom;
  
  establishing separate processing paths for sequences of candidates for words and compound words;
  
  statistically evaluating the processing paths by means of the n-gram frequencies, where likelihood profiles are generated from the sequence of n-gram frequencies of words or components of compound words of each processing path; and
  
  fully evaluating the processing paths with regard to the goodness of acoustic fit and the statistical probability of the language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A speech recognition method in accordance with claim 1, characterized in that distant N-gram frequencies Pr(W/C) in the language model of non-vicinal parts of a sequence of words are formed for a candidate compound word component W, given a context C.
  - 3. A speech recognition method in accordance claim 2, characterized in that in the language model internal N-gram frequencies Pr(A/W) in inverse time sequence of compound word components are formed for a candidate compound word end component W, given a compound word start A.
  - 4. A speech recognition method in accordance with claim 3, characterized in that the evaluation of the language context takes into account both compound words and components of composite words.
  - 5. A speech recognition method in accordance with claim 4, characterized in that acoustic slurring or contractions of neighboring words are taken into account by means of a context function.
  - 6. A speech recognition method in accordance with claim 5, characterized in that a processing path is set up for candidate compound words if a potential starting component is observed on the basis of an evaluation of a specific path to a compound word hypothesis.
  - 7. A speech recognition method in accordance with claim 6, characterized in that the speech signal is evaluated by means of a coarse matching to determine the likelihood of word or compound word boundaries and a fine matching is subsequently carried out between the acoustic signal and the corresponding word or compound word candidates.
  - 8. A speech recognition method in accordance with claim 7, characterized in that for each processing path there are accesses to relevant language model data blocks.
  - 9. A speech recognition method in accordance with claim 8, characterized in that for calculating the probability of a component of a compound word use is made of the preceding context and the initial component of the compound word.
  - 10. A speech recognition method in accordance with claim 9, characterized in that a probability Pr(W/CA) of a constituent of a compound word W as an end component of a compound word behind a starting component of a compound word A is determined taking into account the preceding context C composed of two words or compound words, from the standardized product of the probability Pr(A/W) of an internal bigram formed within the compound word and the probability Pr(W/C) of a distant trigram formed outside the compound word.
  - 11. A speech recognition method in accordance with claim 9 for languages containing multiple compound words, characterized in that assuming that with a given termination the start of a compound word is independent of the context, that an initial component of a multiple compound word not standing at the beginning of a compound word is determined by the probability Pr(A₁ /A_1-2) of its sequence on the immediately preceding starting component, and that the influence of the termination on all starting components of the compound word can be broken down into independent contributions of the terminal part on the last starting component and the remaining starting components on their corresponding predecessors, for calculating the standardized probability of the compound word termination on a processing path is multiplied by the path coefficients appearing through the compound word.

12. A system for speech recognition in languages containing compound words comprising:
- recording means for recording acoustic speech signals;
  
  A/D converter means for digitizing the analog acoustic speech signal;
  
  phonetic transcription means for constructing a number of phonetic transcriptions of words and components of compound words;
  
  listing means for constructing lists relating to single words, beginnings of compound words and endings of compound words;
  
  probability means for determining the speech pattern probabilities for each on a processing path for the lists;
  
  profiling means for determining likelihood profiles for hypothetical word or compound word sequences; and
  
  processing path means for producing and cancelling processing paths and for deciding on the production and cancellation of processing paths.
- View Dependent Claims (13, 14, 15, 16)
- - 13. A speech recognition system in accordance with claim 12, including means for characterizing compound word constituents as starting or terminating components.
  - 14. A speech recognition system in accordance with claim 13 with means for setting up and loading of data blocks of language model probabilities.
  - 15. A speech recognition system in accordance with claim 14 including means for preparing any desired number of composite models in the form of language model classes.
  - 16. A speech recognition system in accordance with claim 15 including means for setting up a context function.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Spies, Marcus
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/737,840
Time in Patent Office

638 Days
Field of Search

704/251, 704/252, 704/255, 704/256, 704/257
US Class Current

704/255
CPC Class Codes

G10L 15/197 Probabilistic grammars, e.g...

Method and system using separate context and constituent probabilities for speech recognition in languages with compound words

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Method and system using separate context and constituent probabilities for speech recognition in languages with compound words

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others