Multistage word recognizer based on reliably detected phoneme similarity regions
First Claim
1. A word recognition processor for processing an input speech utterance in a speech recognition system, comprising:
- a phoneme similarity module receptive of said input speech utterance for producing phone similarity data indicative of the correlation between said input speech utterance and predetermined phone model speech data;
a high similarity module coupled to said phoneme similarity module for identifying those regions of the phone similarity data that exceed a predetermined threshold;
a region count stage having a first word prototype database for storing similarity region count data for a plurality of predetermined words;
said region count stage coupled to said high similarity module and generating a first list of word candidates selected from said first word prototype database based on similarity regions;
a target congruence stage having a second word prototype database for storing word prototype data corresponding to a said plurality of predetermined words;
said target congruence stage being receptive of said first list of word candidates and being coupled to said high similarity module for generating a second list of at least one word candidate, selected from said first list based on similarity regions.
1 Assignment
0 Petitions
Accused Products
Abstract
The multistage word recognizer uses a word reference representation based on reliably detected peaks of phoneme similarity values. The word reference representation captures the basic features of the words by targets that describe the location and shape of stable peaks of phoneme similarity values. The first stage of the word hypothesizer represents each reference word with statistical information on the number of high similarity regions over a predefined number of time intervals. The second stage represents each word by a prototype that consists of a series of phoneme targets and global statistics, namely the average word duration and average match rate. These represent the degree of fit of the word prototype to its training data. Word recognition scores generated in the two stages are converted to dimensionless normalized values and combined by averaging for use in selecting the most probable word candidates.
135 Citations
34 Claims
-
1. A word recognition processor for processing an input speech utterance in a speech recognition system, comprising:
-
a phoneme similarity module receptive of said input speech utterance for producing phone similarity data indicative of the correlation between said input speech utterance and predetermined phone model speech data; a high similarity module coupled to said phoneme similarity module for identifying those regions of the phone similarity data that exceed a predetermined threshold; a region count stage having a first word prototype database for storing similarity region count data for a plurality of predetermined words; said region count stage coupled to said high similarity module and generating a first list of word candidates selected from said first word prototype database based on similarity regions; a target congruence stage having a second word prototype database for storing word prototype data corresponding to a said plurality of predetermined words; said target congruence stage being receptive of said first list of word candidates and being coupled to said high similarity module for generating a second list of at least one word candidate, selected from said first list based on similarity regions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for processing an input speech utterance for word recognition, comprising:
-
representing the input speech utterance as a phone similarity data indicative of the correlation between the input speech utterance and predetermined phone model speech data; selecting from said phone similarity data those regions of high similarity that exceed a predetermined threshold; testing the high similarity regions against a first predetermined word prototype database using a region count procedure that selects first list of word candidates minimizing the region count distortion with respect to the input speech utterance; testing the high similarity regions of words in said first list against a second predetermined word prototype database using a target congruence procedure that selects from the first list a second list of word candidates having high similarity regions substantially congruent with the input speech utterance. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification