System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

US 8,301,448 B2
Filed: 03/29/2006
Issued: 10/30/2012
Est. Priority Date: 03/29/2006
Status: Active Grant

First Claim

Patent Images

1. A method for use with an automatic speech recognition system, the method comprising acts of:

analyzing content of a body of speech submitted to a structured document to identify a first section of the structured document to which the body of speech is submitted;

in response to identifying the first section, loading a grammar and/or language model for use in recognizing the speech in the body submitted to the first section; and

performing speech recognition on the speech in the body using the grammar and/or language model.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.

Citations

16 Claims

1. A method for use with an automatic speech recognition system, the method comprising acts of:
- analyzing content of a body of speech submitted to a structured document to identify a first section of the structured document to which the body of speech is submitted;
  
  in response to identifying the first section, loading a grammar and/or language model for use in recognizing the speech in the body submitted to the first section; and
  
  performing speech recognition on the speech in the body using the grammar and/or language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method according to claim 1, wherein the body of speech is submitted by a first user, the loading comprises loading a language model, and the language model is trained using content selected from a group consisting of content from the first user previously submitted to other structured documents and content from one or more other users previously submitted to the first section.
  - 3. The method according to claim 2, wherein the language model comprises a smoothed section language model, and wherein the performing comprises conducting speech recognition with said smoothed section language model.
  - 4. The method according to claim 2, wherein the language model comprises an unsmoothed section language model, and wherein the performing comprises conducting speech recognition with said unsmoothed section language model.
  - 5. The method according to claim 4, wherein the performing generates a recognition output having an associated confidence level, and wherein the method further comprises an act of conducting a confidence level evaluation to determine whether the confidence level meets a predetermined threshold value.
  - 6. The method according to claim 5, further comprising, if the confidence level evaluation meets the predetermined threshold value, assembling the identified documents sections and determined automatic section headings into at least one document.
  - 7. The method according to claim 5, further comprising, if the confidence level evaluation does not meet the predetermined threshold value, loading a generic language model for use in recognizing the speech content submitted to the first section.
  - 8. The method according to claim 7 where the generic language model is derived from one or more of a factory, site or user specific language model.
  - 9. The method according to claim 8 further comprising an act of conducting speech recognition with said generic language model.
  - 10. The method according to claim 9 further comprising an act of comparing speech recognition results from the language model loaded based on the identifying and speech recognition results from the generic language model.
  - 11. The method according to claim 10 further comprising an act of selecting either the speech recognition results from the language model loaded based on the identifying or from the generic language model to assemble at least one finished document.
  - 12. The method according to claim 1 wherein the analyzing is further based on a section heading for the first section.

13. At least one computer-readable medium having instructions encoded thereon which, when executed in a system comprising at least one automatic speech recognition component, perform a method comprising acts of:
- analyzing content of a body of speech submitted to a structured document to identify a first section of the structured document to which the body of speech is submitted;
  
  in response to identifying the first section, loading a grammar and/or language model for use in recognizing the speech in the body submitted to the first section; and
  
  performing speech recognition on the speech in the body using the grammar and/or language model.
- View Dependent Claims (14)
- - 14. The at least one computer-readable medium of claim 13, wherein the body of speech is submitted by a first user, the loading comprises loading a language model, and the language model is trained using content selected from a group consisting of content from the first user previously submitted to other structured documents and content from one or more users previously submitted to the first section.

15. A system for use with at least one automatic speech recognition component, the system comprising at least one processor programmed to:
- analyze content of a body of speech submitted to a structured document to identify a first section of the structured document to which the body of speech is submitted;
  
  in response to identifying the first section, load a grammar and/or language model for use in recognizing the speech in the body submitted to the first section; and
  
  perform speech recognition on the speech in the body using the grammar and/or language model.
- View Dependent Claims (16)
- - 16. The system of claim 15, wherein the body of speech is submitted by a first user, the at least one processor is programmed to load a language model, and the language model is trained using content selected from a group consisting of content from the first user previously submitted to other structured documents and content from one or more users previously submitted to the first section.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Carus, Alwin B., Lapshina, Larissa, Vemula, Raghu
Primary Examiner(s)
Smits, Talivaldis Ivars

Application Number

US11/392,900
Publication Number

US 20070233488A1
Time in Patent Office

2,407 Days
Field of Search

None
US Class Current

704/257
CPC Class Codes

G10L 15/183 using context dependencies,...

System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links