System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy
First Claim
1. A method for use with an automatic speech recognition system configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, the method comprising acts of:
- (A) recognizing first speech input submitted to the first document section using a first language model;
(B) detecting, based on content of second speech input, that the second speech input is submitted to the second document section; and
(C) in response to detecting that the second speech input is submitted to the second document section, recognizing the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.
34 Citations
17 Claims
-
1. A method for use with an automatic speech recognition system configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, the method comprising acts of:
-
(A) recognizing first speech input submitted to the first document section using a first language model; (B) detecting, based on content of second speech input, that the second speech input is submitted to the second document section; and (C) in response to detecting that the second speech input is submitted to the second document section, recognizing the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. At least one non-transitory computer-readable storage medium having instructions encoded thereon which, when executed in a system comprising at least one automatic speech recognition component configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, perform a method comprising acts of:
-
(A) recognizing first speech input submitted to the first document section using a first language model; (B) detecting, based on content of second speech input, that the second speech input is submitted to the second document section; and (C) in response to detecting that the second speech input is submitted to the second document section, recognizing the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for use with at least one automatic speech recognition component configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, the system comprising:
at least one processor programmed to; recognize first speech input submitted to the first document section using a first language model; detect, based on content of second speech input, that the second speech input is submitted to the second document section; and in response to detecting that the second speech input is submitted to the second document section, recognize the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section. - View Dependent Claims (16, 17)
Specification