System for building a language model network for speech recognition

US 5,765,133 A
Filed: 03/15/1996
Issued: 06/09/1998
Est. Priority Date: 03/17/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A system for recognizing continuous speech configured so as to perform the following steps:

to acquire an acoustic signal comprising words spoken by a speaker,to process the acoustic signal so as to generate a signal indicative of acoustic parameters present in the acoustic signal, andto decode the signal indicative of acoustic parameters so as to generate an output signal indicative of the words pronounced by the speaker, the decoding step comprising a step of comparing the signal indicative of a language and with a lexicon relating to the words spoken by the speaker, the language model being represented by means of a tree-like probabilistic network of finite states of the lexicon,wherein said network is constructed, in a preliminary stage, with the use of a linear interpolated language model to assign the probabilities to the network,wherein said language model is based on bigrammes, and wherein said system uses the following function to assign the respective probability to each bigramme;

##EQU1## PR(z/y) being the probability of a generic bigramme yz, γ

(y) being the total probability assigned to the bigrammes with zero frequency in the context y, Pr(z), the a priori probability of z, f'"'"'(z/y) being given by;
space="preserve" listing-type="equation">f'"'"'(z/y)=(1-γ

(y))f(z/y) f(z/y) being the relative frequency of the bigramme yz and c(y) being the number of occurrences of y in a sample acoustic signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for recognizing continuous speech, for example for automatic dictation applications, uses a bigramme language model organized as a network with finite probability states. The system also uses methods of estimating the probabilities associated with the bigrammes and of representing the model of the language in a tree-like probability network.

50 Citations

View as Search Results

17 Claims

1. A system for recognizing continuous speech configured so as to perform the following steps:
- to acquire an acoustic signal comprising words spoken by a speaker,to process the acoustic signal so as to generate a signal indicative of acoustic parameters present in the acoustic signal, and
  to decode the signal indicative of acoustic parameters so as to generate an output signal indicative of the words pronounced by the speaker, the decoding step comprising a step of comparing the signal indicative of a language and with a lexicon relating to the words spoken by the speaker, the language model being represented by means of a tree-like probabilistic network of finite states of the lexicon,wherein said network is constructed, in a preliminary stage, with the use of a linear interpolated language model to assign the probabilities to the network,
  wherein said language model is based on bigrammes, and wherein said system uses the following function to assign the respective probability to each bigramme;
  
  ##EQU1## PR(z/y) being the probability of a generic bigramme yz, γ
  
  (y) being the total probability assigned to the bigrammes with zero frequency in the context y, Pr(z), the a priori probability of z, f'"'"'(z/y) being given by;
  space="preserve" listing-type="equation">f'"'"'(z/y)=(1-γ
  
  (y))f(z/y)
  f(z/y) being the relative frequency of the bigramme yz and c(y) being the number of occurrences of y in a sample acoustic signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. A system according to claim 1, wherein the linear interpolated model is calculated using the following function:
    - space="preserve" listing-type="equation">Pr(z|y)=(1-λ
      
      )(y)f(z|y)+λ
      
      (y)Pr(z))
      where 0<
      
      λ
      
      (y)≦
      
      1 ∀
      
      y and λ
      
      (y)=1 if c(y)=0.
  - 3. A system according to claim 2, wherein said linear interpolated model involves the estimation of the parameter λ
    - (y) for each word y of the lexicon and uses a cross-validation estimation method and a stacked estimation method of interpolation between estimators to estimate the parameters λ
      
      (y).
  - 4. A system according to claim 3, wherein each parameter λ
    - (y) is estimated in a manner such as to maximize a function of the "leaving-one-out likelihood" type, known as LL, defined by the following formula;
      
      on the learning text, indicated W, f*(z|y) being the relative frequency calculated on the sample signal W after one occurrence of yz has been subtracted, and V being the lexicon.
  - 5. A system according to claim 4, wherein said system uses the following iterative formula to calculate the values of the parameters λ
    - (y) which maximize LL locally in comparison with initial values;
      
      in which S_y indicates the set of occurrences of the bigrammes which start with y in the learning text W.
  - 6. A system according to claim 5, wherein, before the estimation of the parameters λ
    - (y) is started, the occurrences of bigrammes in the sample signal are divided randomly into two portions W₁ and W₂, substantially in the ratio 3;
      
      4, the maximization of LL taking place on W₁ and the iterations of a generic parameter λ
      
      (y) being interrupted if they lead to a decrease in the likelihood of the bigrammes which start with y in the portion W₂.
  - 7. A system according to claim 6, wherein said system uses an estimation method based on the interpolation of several estimators, in which m different interpolated linear models Pr¹, . . . , Pr^m, with m>
    - 1, are estimated and are then combined as follows;
      
      each model of the language being estimated on a different random division of the learning text into the two sets W₁ and W₂ in the same proportions.
  - 8. A system according to claim 7, wherein the language models estimated are combined by calculating their average in a manner such that the resulting model is the following:
    - in which λ
      
      ₁ is a vector of parameters calculated with an i-th division of the learning text.
  - 9. A system according to claim 8, wherein the estimation of the parameter λ
    - (y) comprises the steps of;
      
      calculating a random division of W into two sets W₁ and W₂ in a proportion of 2;
      
      3,calculating the vector of parameters λ
      
      ⁱ ={(y);
      
      y ε
      
      V} by means of a cross-validation estimation method (W₁ W₂) calculating the average vector λ
      
      =(1/m)Σ
      
      ^m_i=i λ
      
      ⁱ, andcalculating the relative frequencies f(z|y) on W,where W is a random sample of bigrammes and i=1, . . . , m,the cross-validation estimation method comprising the following steps, where W₁ and W₂ being two random samples of bigrammes and W₂ /y being a subset of the bigrammes in W₂, starting with y;
      
      calculate the relative frequencies f(z|y) on W₁,initializing all the parameters λ
      
      (y)=0.5, andfor each parameter λ
      
      (y), iterating the iterative formula as long as the likelihood of W₂ /y calculated by the formula;
      
      increases.
  - 10. A system according to claim 9, wherein the network of finite states is constructed with the imposition of two sets of constraints:
    - an acoustic set limiting sequences of phonemes permitted to correspond with the phonetic transcriptions of the words, anda linguistic set, associating the estimated probabilities with the pairs of words.
  - 11. A system according to claim 10, wherein said first set of constraints is imposed in a manner such as to make use of the acoustic similarity of the words and the set of words is organized in a tree.
  - 12. A system according to claim 10, wherein second set of constraints is imposed in a manner such that, for each word in the lexicon, the set of successors actually observed in the learning text is organized in a tree.
  - 13. A system according to and claim 12, wherein said system carries out a factorization of the probabilities of the network of finite states by the application of a method of factorizing probabilities on a tree of the entire lexicon (AL) and on trees of successors (as(w)).
  - 14. A system according to claim 13, wherein, in order to construct the network representing the model of the language, said carries out the steps of:
    - constructing the tree of the entire lexicon (AL),constructing, for each word of the lexicon, the tree of successors appearing in the learning text,inserting the probabilities provided by the language model by means of empty transitions,transfering the probabilities into the trees,factorizing the probabilities in the trees,eliminating the superfluous empty transitions,labelling the remaining empty transitions with a fictitious symbol,labelling each limb with a string obtained by linking the phoneme or the fictitious symbol, the probability and the word, if present,optimizing the network, andreassigning to each limb the phoneme or symbol, the probability, and possibly the word, starting from the string obtained in the step of labelling the limbs.
  - 15. A system according to claim 14, wherein the step of factorizing the probabilities in the trees comprises a method constituted by the following steps:
    - in which;
      
      T is the tree to be factorized;
      
      a,b,n,s are states of T;
      
      r is the root of T;
      
      F(n) is the set of successor states of the state n;
      
      p(a,b) is the probability of the limb from a to b.

16. A system for recognizing continuous speech configured so as to perform the following steps,acquisition of an acoustic signal comprising words spoken by a speaker in form of numeric samples,transformation of the sampled speech into a sequence of acoustic feature vectors, conveying spectral information;
- decoding of the feature sequence into a word sequence by employing a beam search based algorithm on a network of finite states that represents a linearly interpolated bigramme language model and embodies linguistic and lexical constraints;
  
  wherein the language model is estimated by means of a stacked interpolation algorithm and wherein the language model network of finite states is built by optimizing a tree-based network.
- View Dependent Claims (17)
- - 17. A system according to claim 16, wherein both the estimation method and the network optimization algorithm can be extended to a generic n-grammes, with n>
    - 2.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fondazione Bruno Kessler
Original Assignee
Istituto Trentino Di Cultura
Inventors
Cettolo, Mauro, Antoniol, Giuliano, Federico, Marcello, Brugnara, Fabio
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
COLLINS, ALPHONSO

Application Number

US08/616,343
Time in Patent Office

816 Days
Field of Search

395/2.4, 395/2.45, 395/2.49, 395/2.51, 395/2.6, 395/2.64, 395/2.65, 395/2.67, 395/2.75
US Class Current

704/255
CPC Class Codes

G10L 15/197 Probabilistic grammars, e.g...

System for building a language model network for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

50 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

System for building a language model network for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

50 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others