Optimized speech recognition system and method

US 5,054,074 A
Filed: 09/17/1990
Issued: 10/01/1991
Est. Priority Date: 03/02/1989
Status: Expired due to Fees

First Claim

Patent Images

1. A method of modeling the sounds produced by speaking at least first and second portions of speech, said method comprising the steps of:

uttering at least the first portion of speech N times in a time interval having a series of successive subintervals, where N is an integer greater than or equal to one;

measuring the value of at least one feature of the utterance of the first portion of speech during each of the series of successive subintervals to produce a series of feature vector signals representing the feature values;

estimating the expected number of occurrences of the first portion of speech in the time interval as a combination of the values for each subinterval of a first model function of the measured value of the feature of the utterance of the first portion of speech, said first model function having at least a first parameter having an initial value;

estimating the expected number of occurrences of the second portion of speech in the time interval as a combination of the values for each subinterval of a second model function of the measured value of the feature of the utterance of the first portion of speech, said second model function having at least a second parameter having an initial value;

estimating the probability of exactly N occurrences of the first portion of speech in the time interval given the estimated expected number of occurrences of the first portion of speech;

estimating the probability of exactly zero occurrences of the second portion of speech in the time interval given the estimated expected number of occurrences of the second portion of speech;

calculating revised values of the first and second parameters to improve the value of an objective function comprising a combination of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech;

modeling the first portion of speech with the first model function with the revised value of the first parameter; and

modeling the second portion of speech with the second model function with the revised value of the second parameter.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system estimates a set of Poisson intensities for a spoken word, each intensity representing a respectively different word from a vocabulary of words. Each of the functions used to calculate these intensities has two variable parameter values. In a training mode, the system changes the values of the respective variable parameters to optimize the likelihood that the results predicted by the estimates correspond to the actual spoken words. These optimized parameter values are then used by the system, in an operational mode, to recognize spoken words.

41 Citations

View as Search Results

8 Claims

1. A method of modeling the sounds produced by speaking at least first and second portions of speech, said method comprising the steps of:
- uttering at least the first portion of speech N times in a time interval having a series of successive subintervals, where N is an integer greater than or equal to one;
  
  measuring the value of at least one feature of the utterance of the first portion of speech during each of the series of successive subintervals to produce a series of feature vector signals representing the feature values;
  
  estimating the expected number of occurrences of the first portion of speech in the time interval as a combination of the values for each subinterval of a first model function of the measured value of the feature of the utterance of the first portion of speech, said first model function having at least a first parameter having an initial value;
  
  estimating the expected number of occurrences of the second portion of speech in the time interval as a combination of the values for each subinterval of a second model function of the measured value of the feature of the utterance of the first portion of speech, said second model function having at least a second parameter having an initial value;
  
  estimating the probability of exactly N occurrences of the first portion of speech in the time interval given the estimated expected number of occurrences of the first portion of speech;
  
  estimating the probability of exactly zero occurrences of the second portion of speech in the time interval given the estimated expected number of occurrences of the second portion of speech;
  
  calculating revised values of the first and second parameters to improve the value of an objective function comprising a combination of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech;
  
  modeling the first portion of speech with the first model function with the revised value of the first parameter; and
  
  modeling the second portion of speech with the second model function with the revised value of the second parameter.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method as claimed in claim 1, characterized in that:
    - each portion of speech is a word; and
      
      the revised values of the first and second parameters are calculated to substantially optimize the value of the objective function.
  - 3. A method as claimed in claim 2, characterized in that:
    - N is equal to one;
      
      the values of the first model function are combined by arithmetic averaging to estimate the expected number of occurrences of the first portion of speech; and
      
      the values of the second model function are combined by arithmetic averaging to estimate the expected number of occurrences of the second portion of speech.
  - 4. A method as claimed in claim 2, characterized in that the model function of a word W_i has the form
    
    space="preserve" listing-type="equation">m.sub.i (t)=e.sup.-d.sbsp.i.spsp.2.sup.(t)+β
    
    .sbsp.i,
    where ##EQU7## q is the number of acoustic features of the utterance being measured, and α
    
    _j,i, β
    
    , and μ
    
    _j,i are parameters of the model functions.
- 5. A method as claimed in claim 4, characterized in that the probability of exactly n_i occurrences of a word W_i in a time interval T having subintervals Δ
  - t given the estimated expected number of occurrences of the word is estimated from a function of the form ##EQU8##
- 6. A method as claimed in claim 2, characterized in that the objective function comprises the product of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech.
- 7. A method as claimed in claim 2, characterized in that the objective function comprises the sum of the logarithms of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech.

8. A method of modeling the sounds produced by speaking at least a first portion of speech, said method comprising the steps of:
- uttering at least the first portion of speech N times in a time interval having a series of successive subintervals, where N is an integer greater than or equal to one;
  
  measuring the value of at least one feature of the utterance of the first portion of speech during each of the series of successive subintervals to produce a series of feature vector signals representing the feature values;
  
  estimating the expected number of occurrences of the first portion of speech in the time interval as a combination of the values for each subinterval of a first model function of the measured value of the feature of the utterance of the first portion of speech, said first model function having at least a first parameter having an initial value;
  
  estimating the probability of exactly N occurrences of the first portion of speech in the time interval given the estimated expected number of occurrences of the first portion of speech;
  
  calculating a revised value of the first parameter to improve the value of an objective function comprising at least the estimated probability of exactly N occurrences of the first portion of speech; and
  
  modeling the first portion of speech with the first model function with the revised value of the first parameter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bakis, Raimo
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US07/586,338
Time in Patent Office

379 Days
Field of Search

381/41-43, 364/513.5
US Class Current

704/240
CPC Class Codes

G10L 15/14 using statistical models, e...

Optimized speech recognition system and method

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Optimized speech recognition system and method

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links