Boundary estimation method of speech recognition and speech recognition apparatus

US 5,940,794 A
Filed: 07/15/1996
Issued: 08/17/1999
Est. Priority Date: 10/02/1992
Status: Expired due to Fees

First Claim

Patent Images

1. A boundary estimation method of speech recognition comprising the steps of:

(a) analyzing an input speech sample to extract a time window of speech parameters.(b) calculating a first probability that a phonetic boundary of the input speech exists at a center of the time window;

(c) calculating a second probability that the phonetic boundary of the input speech does not exist at the center of the time window; and

(d) calculating a value indicative of the likelihood that the phonetic boundary of the speech exists at the center of the time window on the basis of the first probability and the second probability.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A boundary estimation method capable of readily learning the probability of existence of a boundary in speech and a speech recognition apparatus with high precision and less model calculation. In a learning mode, an estimator estimates distributions of boundary samples and non-boundary samples. In an estimation mode, a likelihood calculator calculates a likelihood of a boundary from a boundary probability density and a non-boundary probability density. In the speech recognition apparatus, a feature extractor analyzes the input speech to convert it into feature parameters of time series, a boundary detector detects phonetic boundary equivalent areas in the input speech from the output of the feature extractor, a model calculator prepares a plurality of phonetic model series corresponding to the feature parameters and restricts a time when the boundaries of the phonetic model series are formed to the phonetic boundary equivalent areas detected by the boundary detector, and a phonetic series transform selects suitable phonetic model series corresponding to the input speech from the result of the model calculator.

Citations

18 Claims

1. A boundary estimation method of speech recognition comprising the steps of:
- (a) analyzing an input speech sample to extract a time window of speech parameters.(b) calculating a first probability that a phonetic boundary of the input speech exists at a center of the time window;
  
  (c) calculating a second probability that the phonetic boundary of the input speech does not exist at the center of the time window; and
  
  (d) calculating a value indicative of the likelihood that the phonetic boundary of the speech exists at the center of the time window on the basis of the first probability and the second probability.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The boundary estimation method of claim 1, further comprising steps performed prior to step (a), of:
    - receiving, in a learning mode, a plurality of learning samples;
      
      classifying each of the plurality of learning samples as one of a boundary sample and a non-boundary sample; and
      
      applying a model of a probability distribution to the boundary samples and the non-boundary samples to estimate a parameter of the probability distribution.
  - 3. The boundary estimation method of claim 2, wherein the step (b) includes the step of calculating the first probability on the basis of the parameter of the probability distribution applied to the non-boundary samples, and the step (c) includes a step of calculating the second probability on the basis of the parameter of the probability distribution applied to the boundary samples.
  - 4. The boundary estimation method of claim 3, wherein the samples, the first probability and the second probability are defined as B_t, Pr₁ (B_t) and Pr₂ (B_t), respectively, and the step (d) includes a step of determining a degree of existence of a phonetic boundary of speech at the center of the time window to be equal to Pr₁ (B_t)/Pr₂ (B_t).
  - 5. The speech recognition method of claim 1, wherein step (b) includes determining a probability that a center of the time window corresponds to one of a predetermined plurality of phonetic boundaries.
  - 6. The speech recognition method of claim 5, further comprising the steps, performed prior to step (a), of:
    - receiving, in a learning mode, a plurality of learning samples;
      
      classifying each of the plurality of learning samples as one of a predetermined plurality of phonetic boundaries; and
      
      applying a model of a probability distribution to the plurality of learning samples to estimate a parameter of the probability distribution for each of the predetermined plurality of phonetic boundaries;
      
      and wherein the step of determining includes calculating the probability based upon the parameter of the probability distribution.

7. A speech recognition apparatus, comprising:
- feature extracting means for converting a sample of input speech to a series of feature parameters;
  
  boundary detecting means for detecting phonetic boundary areas in the sample of input speech based upon the series of feature parameters, the boundary detecting means including;
  
  an analyzer for extracting a time window from the series of speech parameters;
  
  a first calculator for calculating a first probability of existence of a boundary of the input speech at a center of the time window;
  
  a second calculator for calculating a second probability of nonexistence of the boundary of the input speech at the center of the time window; and
  
  a detector for detecting either the phonetic boundaries in the input speech or the areas near the phonetic boundaries by calculating a degree of existence of the boundaries of the speech at the center of the time window on the basis of a calculation including the first probability and the second probability;
  
  model arithmetic means for determining a phonetic series which matches the input speech by restricting times when boundaries of a plurality of phonetic models are formed based upon the phonetic boundary areas detected by the boundary detecting means, and for determining probabilities respectively corresponding to the boundary areas; and
  
  phonetic series transform means for selecting a suitable phonetic model series corresponding to the input speech based upon the probabilities determined by the model arithmetic means.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The speech recognition apparatus of claim 7, wherein:
    - the boundary detecting means includes a likelihood calculator for simultaneously calculating a likelihood of one phonetic boundary area when detecting the one phonetic boundary area;
      
      the model arithmetic means includes a promoter for promoting the occurrence of a phonetic transition of the phonetic model series in the phonetic boundary areas when preparing the phonetic model series corresponding to the feature parameters; and
      
      means for proportioning a promotion rate of the promoter to the likelihood of the phonetic boundaries obtained by the boundary detecting means.
  - 9. The speech recognition apparatus of claim 7, wherein the model arithmetic means includes:
    - a promoter for promoting the occurrence of a phonetic transition of the phonetic model series in the phonetic boundary areas; and
      
      means for restricting the phonetic transition to the phonetic boundary areas by determining the probabilities respectively corresponding to the boundary areas detected by the boundary detecting means.
  - 10. The apparatus of claim 7, wherein the boundary detecting means includes means for detecting a plurality of phonetic boundaries and a plurality of areas near the phonetic boundaries.
  - 11. The speech recognition apparatus of claim 7, wherein the boundary detecting means includes:
    - an analyzer for extracting a time window from the series of speech parameters; and
      
      means for determining a probability that a center of the time window corresponds to one of a predetermined plurality of phonetic boundaries.

12. An apparatus for performing boundary estimation of speech, comprising:
- a feature extractor, having an input that receives an input speech series, and an output that provides a series of feature values that correspond to the input speech series;
  
  a first probability calculator, having an input that receives the feature values and an output that provides a first probability indicative of a likelihood that a speech boundary exists within the input speech series; and
  
  a second probability calculator, having an input that receives the feature values and an output that provides a second probability indicative of a likelihood that a speech boundary does not exist within the input speech series; and
  
  a likelihood calculator, having a first input coupled to the first probability calculator, a second input coupled to the second probability calculator, and an output that provides a likelihood that a speech boundary exists within the input speech series based upon the first probability and the second probability.
- View Dependent Claims (13, 14)
- - 13. The apparatus of claim 12, wherein the output of the likelihood calculator is equal to a value received at the first input divided by a value received at the second input.
  - 14. The apparatus of claim 12, further comprising:
    - a first parameter storage element, coupled to the first probability detector, that stores a first plurality of parameters of a probability density function relating to the likelihood that a speech boundary exists;
      
      a second parameter storage element, coupled to the second probability detector, that stores a second plurality of parameters of the probability density function;
      
      and wherein;
      
      the first probability calculator calculates the first probability based upon the feature values and the first plurality of parameters; and
      
      the second probability calculator calculates the second probability based upon the feature values and the second plurality of parameters.

15. A speech recognition method, comprising the steps of:
- converting a sample of input speech to a series of feature parameters;
  
  detecting phonetic boundary areas in the sample of input speech based upon the series of feature parameters;
  
  determining a phonetic series which matches the input speech based upon the phonetic boundary areas detected by restricting times when boundaries of a plurality of phonetic models are formed and determining probabilities respectively corresponding to the phonetic boundary areas; and
  
  selecting a suitable phonetic model series corresponding to the input speech based upon the probabilities corresponding to the phonetic boundary areas;
  
  wherein the step of detecting boundary areas includes the steps of;
  
  extracting a time window from the series of speech parameters;
  
  calculating a first probability that a phonetic boundary of the input speech exists at a center of the time window;
  
  calculating a second probability that the phonetic boundary of the input speech does not exist at the center of the time window;
  
  calculating a value indicative of the likelihood that the boundary of the speech exists at the center of the time window on the basis of the first probability and the second probability;
  
  detecting a phonetic boundary area based upon the value indicative of the likelihood.
- View Dependent Claims (16)
- - 16. The speech recognition method of claim 15, further comprising the steps, performed prior to the step of extracting, of:
    - receiving, in a learning mode, a plurality of learning samples;
      
      classifying each of the plurality of learning samples as one of a boundary sample and a non-boundary sample; and
      
      applying a model of a probability distribution to the boundary samples to estimate a first parameter of the probability distribution; and
      
      applying the model of a probability distribution to the non-boundary samples to estimate a second parameter of the probability distribution;
      
      and wherein the step of calculating the first probability is performed based upon the first parameter, and the step of calculating a second probability are performed based upon the second parameter.

17. A speech recognition method, comprising the steps of:
- converting a sample of input speech to a series of feature parameters;
  
  detecting phonetic boundary areas in the sample of input speech based upon the series of feature parameters;
  
  determining a phonetic series which matches the input speech based upon the phonetic boundary areas detected by restricting times when boundaries of a plurality of phonetic models are formed and determining probabilities respectively corresponding to the phonetic boundary areas; and
  
  selecting a suitable phonetic model series corresponding to the input speech based upon the probabilities corresponding to the phonetic boundary areas;
  
  wherein the step of detecting boundary areas includes the steps of;
  
  extracting a time window from the series of speech parameters;
  
  calculating a probability that a center of the time window corresponds to one of a predetermined plurality of phonetic boundaries.
- View Dependent Claims (18)
- - 18. The speech recognition method of claim 17, further comprising the steps, performed prior to the step of extracting, of:
    - receiving, in a learning mode, a plurality of learning samples;
      
      classifying each of the plurality of learning samples as one of a predetermined plurality of phonetic boundaries; and
      
      applying a model of a probability distribution to the plurality of learning samples to estimate a parameter of the probability distribution for each of the predetermined plurality of phonetic boundaries;
      
      and wherein the step of calculating includes calculating the probability based upon the parameter of the probability distribution.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mitsubishi Denki Kabushiki Kaisha (Mitsubishi Electric Corporation)
Original Assignee
Mitsubishi Denki Kabushiki Kaisha (Mitsubishi Electric Corporation)
Inventors
Abe, Yoshiharu
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
LERNER, MARTIN

Application Number

US08/679,861
Time in Patent Office

1,128 Days
Field of Search

395/2.62, 395/2.49, 395/2.6-2.66
US Class Current

704/253
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 15/142 Hidden Markov Models [HMMs]

Boundary estimation method of speech recognition and speech recognition apparatus

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Boundary estimation method of speech recognition and speech recognition apparatus

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links