Engine For Speech Recognition
First Claim
1. A computerized method for speech recognition in a computer system, the method comprising the steps of:
- (a) storing a plurality of reference word segments, wherein said reference word segments when concatenated form a plurality of spoken words in a language;
wherein each of said reference word segments is a combination of at least two phonemes including at least one vowel sound in said language;
(b) inputting and digitizing a temporal speech signal, thereby producing a digitized temporal speech signal;
(c) transforming piecewise said digitized temporal speech signal into the frequency domain, thereby producing a time and frequency dependent transform function;
wherein the the energy spectral density of said temporal speech signal is proportional to the absolute value squared of said transform function;
(d) cutting the energy spectral density into a plurality of input time segments of the energy spectral density;
wherein each of said input time segments includes at least two phonemes including at least one vowel sound of the temporal speech signal; and
(e) for each of said input time segments;
(i) extracting a fundamental frequency from the energy spectral density during the input time segment;
(ii) selecting a target segment from the reference word segments thereby inputting a target energy spectral density of said target segment;
(iii) performing a correlation between the energy spectral density during said time segment and said target energy spectral density of said target segment after calibrating said fundamental frequency to said target energy spectral density thereby improving said correlation.
1 Assignment
0 Petitions
Accused Products
Abstract
A computerized method for speech recognition in a computer system. Reference word segments are stored in memory. The reference word segments when concatenated form spoken words in a language. Each of the reference word segments is a combination of at least two phonemes, including a vowel sound in the language. A temporal speech signal is input and digitized to produced a digitized temporal speech signal The digitized temporal speech signal is transformed piecewise into the frequency domain to produce a time and frequency dependent transform function. The energy spectral density of the temporal speech signal is proportional to the absolute value squared of the transform function. The energy spectral density is cut into input time segments of the energy spectral density. Each of the input time segments includes at least two phonemes including at least one vowel sound of the temporal speech signal. For each of the input time segments, (i) a fundamental frequency is extracted from the energy spectral density during the input time segment, (ii) a target segment is selected from the reference segments and thereby a target energy spectral density of the target segment is input. A correlation between the energy spectral density during the time segment and the target energy spectral density of the target segment is performed after calibrating the fundamental frequency to the target energy spectral density thereby improving the correlation.
21 Citations
22 Claims
-
1. A computerized method for speech recognition in a computer system, the method comprising the steps of:
-
(a) storing a plurality of reference word segments, wherein said reference word segments when concatenated form a plurality of spoken words in a language;
wherein each of said reference word segments is a combination of at least two phonemes including at least one vowel sound in said language;(b) inputting and digitizing a temporal speech signal, thereby producing a digitized temporal speech signal; (c) transforming piecewise said digitized temporal speech signal into the frequency domain, thereby producing a time and frequency dependent transform function;
wherein the the energy spectral density of said temporal speech signal is proportional to the absolute value squared of said transform function;(d) cutting the energy spectral density into a plurality of input time segments of the energy spectral density;
wherein each of said input time segments includes at least two phonemes including at least one vowel sound of the temporal speech signal; and(e) for each of said input time segments; (i) extracting a fundamental frequency from the energy spectral density during the input time segment; (ii) selecting a target segment from the reference word segments thereby inputting a target energy spectral density of said target segment; (iii) performing a correlation between the energy spectral density during said time segment and said target energy spectral density of said target segment after calibrating said fundamental frequency to said target energy spectral density thereby improving said correlation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21)
-
-
12. A computerized method for speech recognition in a computer system, the method comprising the steps of:
-
(a) storing a plurality of reference word segments, wherein said reference word segments when concatenated form a plurality of spoken words in a language;
wherein each of said reference word segments is a combination of at least two phonemes including at least one vowel sound in said language;(b) classifying said reference word segments into a plurality of classes; (c) inputting and digitizing a temporal speech signal, thereby producing a digitized temporal speech signal; (d) transforming piecewise said digitized temporal speech signal into the frequency domain, thereby producing a time and frequency dependent transform function;
wherein the the energy spectral density of said temporal speech signal is proportional to the absolute value squared of said transform function;(e) cutting the energy spectral density into a plurality of input time segments of the energy spectral density;
wherein each of said input time segments includes at least two phonemes including at least one vowel sound of the temporal speech signal;(f) for each of said input time segments; (i) selecting a target segment from the reference word segments thereby inputting a target energy spectral density of said target segment; (ii) performing a correlation between the energy spectral density during said time segment and said target energy spectral density of said target segment; (g) based on a correlation result of said correlation, second selecting a second target segment from at least one of said classes. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 22)
-
Specification