Method of estimating probabilities of occurrence of speech vocabulary elements

US 6,314,400 B1
Filed: 09/14/1999
Issued: 11/06/2001
Est. Priority Date: 09/16/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of estimating probabilities of occurrence of speech vocabulary elements in a speech recognition system, wherein, in the estimation of a probability of occurrence of a speech vocabulary element, several M-gram probabilities of this element are raised to a higher power by means of an M-gram-specific optimized parameter value, and the powers thus obtained are multiplied by each other, in which the estimation of the probability of occurrence of a speech vocabulary element does not include the case where an M-gram probability with M>

1 estimated by means of a first training vocabulary corpus for the speech vocabulary element is multiplied by a quotient raised to the power of an optimized parameter value, which optimized parameter value is determined by means of the GIS algorithm, and a unigram probability of the element estimated by means of a second training vocabulary corpus serves as a dividend of the quotient, and a unigram probability of the element estimated by means of the first training vocabulary corpus serves as a divisor of the quotient.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a method of estimating probabilities of occurrence of speech vocabulary elements in a speech recognition system. By modification of the linguistic speech modeling, further alternatives for reducing the error rate and perplexity of a speech recognition system are proposed. The method according to the invention is characterized in that, in the estimation of a probability of occurrence of a speech vocabulary element, several M-gram probabilities of this element are raised to a higher power by means of an M-gram-specific optimized parameter value, and the powers thus obtained are multiplied by each other, in which the estimation of the probability of occurrence of a speech vocabulary element does not include the case where an M-gram probability with M>1 estimated by means of a first training vocabulary corpus for the speech vocabulary element is multiplied by a quotient raised to a higher power by means of an optimized parameter value, which optimized parameter value is determined by means of the GIS algorithm, and a unigram probability of the element estimated by means of a second training vocabulary corpus serves as a dividend of the quotient, and a unigram probability of the element estimated by means of the first training vocabulary corpus serves as a divisor of the quotient.

68 Citations

View as Search Results

6 Claims

1. A method of estimating probabilities of occurrence of speech vocabulary elements in a speech recognition system, wherein, in the estimation of a probability of occurrence of a speech vocabulary element, several M-gram probabilities of this element are raised to a higher power by means of an M-gram-specific optimized parameter value, and the powers thus obtained are multiplied by each other, in which the estimation of the probability of occurrence of a speech vocabulary element does not include the case where an M-gram probability with M>
- 1 estimated by means of a first training vocabulary corpus for the speech vocabulary element is multiplied by a quotient raised to the power of an optimized parameter value, which optimized parameter value is determined by means of the GIS algorithm, and a unigram probability of the element estimated by means of a second training vocabulary corpus serves as a dividend of the quotient, and a unigram probability of the element estimated by means of the first training vocabulary corpus serves as a divisor of the quotient.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method as claimed in claim 1, wherein a first training vocabulary corpus is used for estimating a first part of the M-gram probabilities, and a first part of a second training vocabulary corpus is used for estimating a second part of the M-gram probabilities, and a second part of the second training vocabulary corpus is used for determining the optimized parameter values assigned to the M-gram probabilities.
  - 3. A method as claimed in claim 2, wherein, for determining the optimized parameter values, the optimizing function $F$
    - ({λ
      
      i})=∑
      
      hw
      
      f
      
      (hw)
      
      log
      
      
      
      (1Zλ
      
      
      
      (h)
      
      ∏
      
      i
      
      pi(w
      
      
      
      h)λ
      
      i)is minimized, whereinλ
      
      _irepresents the parameter values to be optimized, hw represents M-grams for a vocabulary element w with a history h of previous vocabulary elements, f(hw) represents the quotient with the number of counted M-grams occurring in a training phase of the second part of the second vocabulary as a dividend, and the number of vocabulary elements of the second vocabulary as a divisor, 1/Z_λ(h) represents a scaling factor, and pi represents the estimated probability of occurrence of the vocabulary element w, given the history h.
  - 4. A method as claimed in claim 1, wherein only M-gram probabilities with M<
    - 3 are used.
  - 5. A method as claimed in claim 4, wherein gap bigram probabilities are used.
  - 6. A speech recognition system using a speech vocabulary with vocabulary elements to which probabilities of occurrence are assigned, which are estimated by means of a method as claimed in claim 1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Klakow, Dietrich
Primary Examiner(s)
{haeck over (S)}mits, Ta̅livaldis Ivars
Assistant Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/396,086
Time in Patent Office

784 Days
Field of Search

704/257, 704/231, 704/232, 704/240, 704/241, 704/243, 704/251, 704/256, 704/250, 704/255
US Class Current

704/257
CPC Class Codes

G10L 15/197 Probabilistic grammars, e.g...

Method of estimating probabilities of occurrence of speech vocabulary elements

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

68 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method of estimating probabilities of occurrence of speech vocabulary elements

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

68 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links