Speech recognition system using normalized voiced segment spectrogram analysis
First Claim
1. A data processing method for recognizing a sound record of a human utterance, comprising:
- dividing the sound record into a sequence of one or more segments;
comparing a plurality of dictionary entries with the sound record, each dictionary entry being incrementally compared with a continuous stretch of segments of the sound record; and
wherein vocalized parts of the sound record are represented as a spectrogram, optimized for comparison with the dictionary entries using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof.
1 Assignment
0 Petitions
Accused Products
Abstract
Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof, to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.
27 Citations
41 Claims
-
1. A data processing method for recognizing a sound record of a human utterance, comprising:
-
dividing the sound record into a sequence of one or more segments; comparing a plurality of dictionary entries with the sound record, each dictionary entry being incrementally compared with a continuous stretch of segments of the sound record; and wherein vocalized parts of the sound record are represented as a spectrogram, optimized for comparison with the dictionary entries using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A data processing system for recognizing a sound record of a human utterance, comprising:
-
a segmentation engine for dividing the sound record into a sequence of one or more segments; a comparison engine for comparing a plurality of dictionary entries with the sound record, each dictionary entry bring incrementally compared with a continuous stretch of segments of the sound record; and wherein vocalized parts of the sound record are represented as a spectrogram, optimized for comparison with the dictionary entries using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform and combinations thereof. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A computer program product comprising:
-
A computer usable medium; and A data processing method stored on the medium for recognizing a sound record of a human utterance, comprising computer instructions for; dividing the sound record into a sequence of one or more segments; comparing a plurality of dictionary entries with the sound record, each dictionary entry being incrementally compared with a continuous stretch of segments of the sound record; and wherein vocalized parts of the sound record are represented as a spectrogram, optimized for comparison with the dictionary entries using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof.
-
Specification