Boundary estimation method of speech recognition and speech recognition apparatus
First Claim
1. A boundary estimation method of speech recognition comprising the steps of:
- (a) analyzing an input speech sample to extract a time window of speech parameters.(b) calculating a first probability that a phonetic boundary of the input speech exists at a center of the time window;
(c) calculating a second probability that the phonetic boundary of the input speech does not exist at the center of the time window; and
(d) calculating a value indicative of the likelihood that the phonetic boundary of the speech exists at the center of the time window on the basis of the first probability and the second probability.
0 Assignments
0 Petitions
Accused Products
Abstract
A boundary estimation method capable of readily learning the probability of existence of a boundary in speech and a speech recognition apparatus with high precision and less model calculation. In a learning mode, an estimator estimates distributions of boundary samples and non-boundary samples. In an estimation mode, a likelihood calculator calculates a likelihood of a boundary from a boundary probability density and a non-boundary probability density. In the speech recognition apparatus, a feature extractor analyzes the input speech to convert it into feature parameters of time series, a boundary detector detects phonetic boundary equivalent areas in the input speech from the output of the feature extractor, a model calculator prepares a plurality of phonetic model series corresponding to the feature parameters and restricts a time when the boundaries of the phonetic model series are formed to the phonetic boundary equivalent areas detected by the boundary detector, and a phonetic series transform selects suitable phonetic model series corresponding to the input speech from the result of the model calculator.
-
Citations
18 Claims
-
1. A boundary estimation method of speech recognition comprising the steps of:
-
(a) analyzing an input speech sample to extract a time window of speech parameters. (b) calculating a first probability that a phonetic boundary of the input speech exists at a center of the time window; (c) calculating a second probability that the phonetic boundary of the input speech does not exist at the center of the time window; and (d) calculating a value indicative of the likelihood that the phonetic boundary of the speech exists at the center of the time window on the basis of the first probability and the second probability. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech recognition apparatus, comprising:
-
feature extracting means for converting a sample of input speech to a series of feature parameters; boundary detecting means for detecting phonetic boundary areas in the sample of input speech based upon the series of feature parameters, the boundary detecting means including; an analyzer for extracting a time window from the series of speech parameters; a first calculator for calculating a first probability of existence of a boundary of the input speech at a center of the time window; a second calculator for calculating a second probability of nonexistence of the boundary of the input speech at the center of the time window; and a detector for detecting either the phonetic boundaries in the input speech or the areas near the phonetic boundaries by calculating a degree of existence of the boundaries of the speech at the center of the time window on the basis of a calculation including the first probability and the second probability; model arithmetic means for determining a phonetic series which matches the input speech by restricting times when boundaries of a plurality of phonetic models are formed based upon the phonetic boundary areas detected by the boundary detecting means, and for determining probabilities respectively corresponding to the boundary areas; and phonetic series transform means for selecting a suitable phonetic model series corresponding to the input speech based upon the probabilities determined by the model arithmetic means. - View Dependent Claims (8, 9, 10, 11)
-
-
12. An apparatus for performing boundary estimation of speech, comprising:
-
a feature extractor, having an input that receives an input speech series, and an output that provides a series of feature values that correspond to the input speech series; a first probability calculator, having an input that receives the feature values and an output that provides a first probability indicative of a likelihood that a speech boundary exists within the input speech series; and a second probability calculator, having an input that receives the feature values and an output that provides a second probability indicative of a likelihood that a speech boundary does not exist within the input speech series; and a likelihood calculator, having a first input coupled to the first probability calculator, a second input coupled to the second probability calculator, and an output that provides a likelihood that a speech boundary exists within the input speech series based upon the first probability and the second probability. - View Dependent Claims (13, 14)
-
-
15. A speech recognition method, comprising the steps of:
-
converting a sample of input speech to a series of feature parameters; detecting phonetic boundary areas in the sample of input speech based upon the series of feature parameters; determining a phonetic series which matches the input speech based upon the phonetic boundary areas detected by restricting times when boundaries of a plurality of phonetic models are formed and determining probabilities respectively corresponding to the phonetic boundary areas; and selecting a suitable phonetic model series corresponding to the input speech based upon the probabilities corresponding to the phonetic boundary areas; wherein the step of detecting boundary areas includes the steps of; extracting a time window from the series of speech parameters; calculating a first probability that a phonetic boundary of the input speech exists at a center of the time window; calculating a second probability that the phonetic boundary of the input speech does not exist at the center of the time window; calculating a value indicative of the likelihood that the boundary of the speech exists at the center of the time window on the basis of the first probability and the second probability; detecting a phonetic boundary area based upon the value indicative of the likelihood. - View Dependent Claims (16)
-
-
17. A speech recognition method, comprising the steps of:
-
converting a sample of input speech to a series of feature parameters; detecting phonetic boundary areas in the sample of input speech based upon the series of feature parameters; determining a phonetic series which matches the input speech based upon the phonetic boundary areas detected by restricting times when boundaries of a plurality of phonetic models are formed and determining probabilities respectively corresponding to the phonetic boundary areas; and selecting a suitable phonetic model series corresponding to the input speech based upon the probabilities corresponding to the phonetic boundary areas; wherein the step of detecting boundary areas includes the steps of; extracting a time window from the series of speech parameters; calculating a probability that a center of the time window corresponds to one of a predetermined plurality of phonetic boundaries. - View Dependent Claims (18)
-
Specification