Word hypothesizer based on reliably detected phoneme similarity regions
First Claim
Patent Images
1. A word hypothesizer for processing an input speech utterance in a speech recognition system comprising:
- a phoneme model database for storing phoneme model speech data corresponding to a plurality of phonemes;
a phoneme similarity module coupled to said phoneme model database and receptive of said input speech utterance for producing phoneme similarity data indicative of the correlation between said input speech utterance and said phoneme model speech data with respect to time;
a word prototype database for storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each target corresponding to a different phoneme, wherein each of said plurality of targets represents the occurrence of at least one phoneme similarity peak as compared with a predefined speech database;
a prototype comparator coupled to said word prototype database and to said phoneme similarity module for correlating said phoneme similarity data and said word prototype data to select at least one of said predetermined words as a word hypothesis for said input speech utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
The word hypothesizer reduces the search space for more computationally expensive word recognizers. Each periodic interval of input speech is represented as a vector of phoneme similarity values from which the high similarity regions are selected and parameterized. The hypothesizer computes alignment parameters for each of a plurality of previously stored word prototypes, vis-a-vis the high similarity regions of the input speech utterance. Those word prototypes having the highest recognition scores are selected as word candidates for the fine match recognizer.
65 Citations
23 Claims
-
1. A word hypothesizer for processing an input speech utterance in a speech recognition system comprising:
-
a phoneme model database for storing phoneme model speech data corresponding to a plurality of phonemes; a phoneme similarity module coupled to said phoneme model database and receptive of said input speech utterance for producing phoneme similarity data indicative of the correlation between said input speech utterance and said phoneme model speech data with respect to time; a word prototype database for storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each target corresponding to a different phoneme, wherein each of said plurality of targets represents the occurrence of at least one phoneme similarity peak as compared with a predefined speech database; a prototype comparator coupled to said word prototype database and to said phoneme similarity module for correlating said phoneme similarity data and said word prototype data to select at least one of said predetermined words as a word hypothesis for said input speech utterance.
-
-
2. A method for hypothesizing word candidates based on an input speech utterance for use in a speech recognition system comprising:
-
(a) providing a phoneme template representing a database of calibration speech; (b) comparing said input speech utterance with said phoneme template to produce speaker phoneme similarity data as a function of time; (c) processing said speaker phoneme similarity data to extract speech regions that exceed a predetermined similarity threshold, thereby defining extracted speaker features; (d) storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each target corresponding to a different phoneme, wherein each of said plurality of targets represents the occurrence of at least one phoneme similarity peak as compared with a predefined speech database; (e) aligning the extracted speaker features and word prototype data and selecting at least one word from said word prototype data which achieves a predetermined degree of correlation between the extracted speaker features and said word prototype data. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for hypothesizing word candidates based on an input speech utterance for use in a speech recognition system comprising:
-
providing a phoneme template representing a database of calibration speech; comparing said input speech utterance of said speaker with said phoneme template to produce speaker phoneme similarity data as a function of time; processing said speaker phoneme similarity data to extract speaker features that exceed a predetermined similarity threshold; storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each corresponding to a different phoneme, wherein each of said targets represents the occurrence of at least one phoneme similarity region as compared with a predefined speech database; iteratively performing the steps; (a) determining speech match characteristics; (b) selecting said extracted speaker features which satisfy said speech match characteristics; (c) aligning the selected speaker features and word prototype data and selecting at least one from said word prototype data which achieves a predetermined degree of correlation between said selected speaker features and said word prototype data; and storing said selected word prototype data as hypothesized word candidates of said input speech utterance. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for aligning a first speech utterance and second speech utterance to determine a degree of correlation between said first and second speech utterance comprising:
-
providing a phoneme template representing a database of calibration speech; for said first speech utterance, comparing the first speech utterance with said phoneme template to produce first speech utterance similarity data as a function of time; for said second speech utterance, comparing the second speech utterance with said phoneme template to produce a second speech utterance similarity data as a function of time; aligning regions of the first speech utterance phoneme similarity data and the second speech utterance phoneme similarity data that achieve a predetermined degree of correlation; and building an alignment structure that defines a set of aligned high similarity regions between said first speech utterance and said second speech utterance, and defines data indicative of degree of correlation for each of said aligned regions. - View Dependent Claims (22, 23)
-
Specification