System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

US 9,002,710 B2
Filed: 09/12/2012
Issued: 04/07/2015
Est. Priority Date: 03/29/2006
Status: Active Grant

First Claim

Patent Images

1. A method for use with an automatic speech recognition system configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, the method comprising acts of:

(A) recognizing first speech input submitted to the first document section using a first language model;

(B) detecting, based on content of second speech input, that the second speech input is submitted to the second document section; and

(C) in response to detecting that the second speech input is submitted to the second document section, recognizing the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.

34 Citations

View as Search Results

17 Claims

1. A method for use with an automatic speech recognition system configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, the method comprising acts of:
- (A) recognizing first speech input submitted to the first document section using a first language model;
  
  (B) detecting, based on content of second speech input, that the second speech input is submitted to the second document section; and
  
  (C) in response to detecting that the second speech input is submitted to the second document section, recognizing the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the act (C) comprises identifying the second language model for use in recognizing the second speech input from among a plurality of language models.
  - 3. The method of claim 1, wherein the act (C) comprises selecting the second language model for use in response to the detecting in the act (B) that the second speech input is submitted to the second document section.
  - 4. The method of claim 1, further comprising acts of:
    - (D) generating a confidence score indicating a level of confidence that a result of the recognizing in the act (C) matches the second speech input;
      
      (E) determining whether the confidence score satisfies a confidence threshold; and
      
      (F) if it is determined in the act (E) that the confidence score does not satisfy the confidence threshold, recognizing the second speech input using a third language model, the third language model being different than the first and second language models.
  - 5. The method of claim 4, wherein the third language model is a generic language model that is not specifically directed to the second document section.
  - 6. The method of claim 1, wherein the first and second language models are stochastic language models.
  - 7. The method of claim 1, wherein the structured document comprises a medical report.

8. At least one non-transitory computer-readable storage medium having instructions encoded thereon which, when executed in a system comprising at least one automatic speech recognition component configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, perform a method comprising acts of:
- (A) recognizing first speech input submitted to the first document section using a first language model;
  
  (B) detecting, based on content of second speech input, that the second speech input is submitted to the second document section; and
  
  (C) in response to detecting that the second speech input is submitted to the second document section, recognizing the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The at least one non-transitory computer-readable storage medium of claim 8, wherein the act (C) comprises identifying the second language model for use in recognizing the second speech input from among a plurality of language models.
  - 10. The at least one non-transitory computer-readable storage medium of claim 8, wherein the act (C) comprises selecting the second language model for use in response to the detecting in the act (B) that the second speech input is submitted to the second document section.
  - 11. The at least one non-transitory computer-readable storage medium of claim 8, further comprising acts of:
    - (D) generating a confidence score indicating a level of confidence that a result of the recognizing in the act (C) matches the second speech input;
      
      (E) determining whether the confidence score satisfies a confidence threshold; and
      
      (F) if it is determined in the act (E) that the confidence score does not satisfy the confidence threshold, recognizing the second speech input using a third language model, the third language model being different than the first and second language models.
  - 12. The at least one non-transitory computer-readable storage medium of claim 11, wherein the third language model is a generic language model that is not specifically directed to the second document section.
  - 13. The at least one non-transitory computer-readable storage medium of claim 8, wherein the first and second language models are stochastic language models.
  - 14. The at least one non-transitory computer-readable storage medium of claim 8, wherein the structured document comprises a medical report.

15. A system for use with at least one automatic speech recognition component configured to recognize speech submitted to a structured document comprising a plurality of document sections, the plurality of document sections comprising a first document section and a second document section that is different than the first document section, the system comprising:
- at least one processor programmed to;
  
  recognize first speech input submitted to the first document section using a first language model;
  
  detect, based on content of second speech input, that the second speech input is submitted to the second document section; and
  
  in response to detecting that the second speech input is submitted to the second document section, recognize the second speech input using a second language model, different from the first language model, that is specifically directed to the second document section.
- View Dependent Claims (16, 17)
- - 16. The system of claim 15, wherein recognizing the second speech input comprises identifying the second language model for use in recognizing the second speech input from among a plurality of language models.
  - 17. The system of claim 15, wherein recognizing the second speech input comprises selecting the second language model for use in response to the detecting that the second speech input is submitted to the second document section.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Carus, Alwin B., Lapshina, Larissa, Vemula, Raghu
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US13/611,351
Publication Number

US 20130006632A1
Time in Patent Office

937 Days
Field of Search

704/246, 704/247, 704/251, 704/252, 704/257
US Class Current

704/257
CPC Class Codes

G10L 15/183 using context dependencies,...

System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links