Speech recognition system employing multiple grammar networks
First Claim
1. A single pass method of processing acoustic speech data for word recognition, comprising:
- processing said acoustic speech data using a recognizer based on a first grammar model to segment said acoustic speech data in a first way and thereby extract a first plurality of recognition candidates;
processing said acoustic speech data using a recognizer based on a second grammar model different that said first grammar model to segment said acoustic speech data in a second way different than said first way and thereby extract a second plurality of recognition candidates;
aligning said first plurality of recognition candidates with a dictionary of predetermined words to generate a first list of word candidates;
aligning said second plurality of recognition candidates with said dictionary of predetermined words to generate a second list of word candidates;
building a dynamic grammar model from said first and second lists of word candidates; and
processing said acoustic speech data using a recognizer based on said dynamic grammar model to extract the recognized word.
1 Assignment
0 Petitions
Accused Products
Abstract
The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.
-
Citations
21 Claims
-
1. A single pass method of processing acoustic speech data for word recognition, comprising:
-
processing said acoustic speech data using a recognizer based on a first grammar model to segment said acoustic speech data in a first way and thereby extract a first plurality of recognition candidates; processing said acoustic speech data using a recognizer based on a second grammar model different that said first grammar model to segment said acoustic speech data in a second way different than said first way and thereby extract a second plurality of recognition candidates; aligning said first plurality of recognition candidates with a dictionary of predetermined words to generate a first list of word candidates; aligning said second plurality of recognition candidates with said dictionary of predetermined words to generate a second list of word candidates; building a dynamic grammar model from said first and second lists of word candidates; and processing said acoustic speech data using a recognizer based on said dynamic grammar model to extract the recognized word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A single pass method of processing acoustic speech data for spelled name recognition, comprising:
-
processing said acoustic speech data using a recognizer based on a first grammar model to segment said acoustic speech data in a first way and thereby extract a first plurality of letter candidates; processing said acoustic speech data using a recognizer based on a second grammar model to segment said acoustic speech data in a second way different than said first way and thereby extract a second plurality of letter candidates; aligning said first plurality of letter candidates with a dictionary of predetermined names to generate a first list of name candidates; aligning said second plurality of recognition candidates with said dictionary of predetermined words to generate a second list of name candidates; and processing said acoustic speech data using a recognizer based on said dynamic grammar model to extract the recognized name. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A single pass method of processing acoustic speech data for recognition, comprising:
-
processing said acoustic speech data using a recognizer based on a first grammar network to segment said acoustic speech data in a first way and thereby extract a first plurality of recognition candidates according to a first speech input criteria; processing said acoustic speech data using a recognizer based on a second grammar network to segment said acoustic speech data on a second way different than said first way and thereby extract a second plurality of recognition candidates according to a second speech input criteria; transforming said first and second plurality of recognition candidates into transformed candidates based on at least one set of a priori constraints on said acoustic speech data; making recognition decision based on said transformed candidates.
-
-
21. A single pass method of processing acoustic speech data for recognition, comprising:
-
separately processing said acoustic speech data using different first and second grammar networks that result in different segmentation of said acoustic speech data to extract speech that has utility from speech that does not; generating a first plurality of recognition candidates using said first grammar network and a second plurality of recognition candidates using said second grammar network; transforming said first and second plurality of recognition candidates based on at least one set of a priori constraints about the speech that has utility to generate transformed recognition candidates; and making a recognition decision based on said transformed recognition candidates.
-
Specification