Incorporating an exogenous large-vocabulary model into rule-based speech recognition

US 10,311,878 B2
Filed: 02/07/2017
Issued: 06/04/2019
Est. Priority Date: 01/17/2014
Status: Active Grant

First Claim

Patent Images

1. A method for providing incorporation of an exogenous large-vocabulary model into rule-based speech recognition, comprising:

receiving a first recognition result from a rule-based speech recognition system, the first recognition result including a mark-up that specifies a portion of a received audio stream was not recognized by the rule-based speech recognition system;

performing a statistical model-based recognition of the marked-up portion of the audio stream to create a second recognition result;

combining the second recognition result with the first recognition result to create a combined recognition result; and

sending the combined recognition result to the rule-based speech recognition system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Incorporation of an exogenous large-vocabulary model into rule-based speech recognition is provided. An audio stream is received by a local small-vocabulary rule-based speech recognition system (SVSRS), and is streamed to a large-vocabulary statistically-modeled speech recognition system (LVSRS). The SVSRS and LVSRS perform recognitions of the audio. If a portion of the audio is not recognized by the SVSRS, a rule is triggered that inserts a mark-up in the recognition result. The recognition result is sent to the LVSRS. If a mark-up is detected, recognition of a specified portion of the audio is performed. The LVSRS result is unified with the SVSRS result and sent as a hybrid response back to the SVSRS. If the hybrid-recognition rule is not triggered, an arbitration algorithm is evoked to determine whether the SVSRS or the LVSRS recognition has a lesser word error rate. The determined recognition is sent as a response to the SVSRS.

Citations

19 Claims

1. A method for providing incorporation of an exogenous large-vocabulary model into rule-based speech recognition, comprising:
- receiving a first recognition result from a rule-based speech recognition system, the first recognition result including a mark-up that specifies a portion of a received audio stream was not recognized by the rule-based speech recognition system;
  
  performing a statistical model-based recognition of the marked-up portion of the audio stream to create a second recognition result;
  
  combining the second recognition result with the first recognition result to create a combined recognition result; and
  
  sending the combined recognition result to the rule-based speech recognition system.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the second recognition result is a statistical model-based recognition result.
  - 3. The method of claim 1, wherein combining the second recognition result with the first recognition result comprises replacing the mark-up specifying that the portion of the audio stream is not recognized by the rule-based speech recognition system with the second recognition result.
  - 4. The method of claim 1, further comprising performing a statistical model-based recognition of the audio stream prior to receiving the first recognition result.
  - 5. The method of claim 4, wherein if the first recognition result comprises a mark-up specifying that the portion of the audio stream is not recognized by the rule-based speech recognition system:
    - cancelling performing the statistical model-based recognition of the audio stream; and
      
      performing a statistical model-based recognition of the specified portion of the audio stream.
  - 6. The method of claim 1, further comprising:
    - if the first recognition result does not comprise a mark-up specifying that the portion of the audio stream is not recognized by a rule-based speech recognition system;
      
      performing a statistical model-based recognition of the audio stream;
      
      analyzing a result of the statistical model-based recognition of the audio stream and the first recognition result;
      
      determining whether the result of the statistical model-based recognition of the audio stream or whether the first recognition result has a better recognition quality; and
      
      sending the recognition result with the better recognition quality to the rule-based speech recognition system.

7. A system for providing speech recognition, comprising:
- one or more processors; and
  
  a memory coupled to the one or more processors, the one or more processors operable to;
  
  receive a first recognition result for a received audio stream, the first recognition results being received from a rule-based speech recognition system;
  
  determine if the first recognition result comprises a mark-up that indicates a portion of the audio stream was not recognized by the rule-based speech recognition system;
  
  when it is determined the first recognition result comprises the mark-up, performing a statistical model-based recognition of the marked-up portion of the audio stream to create a second recognition result;
  
  combine the second recognition result with the first recognition result to create a combined recognition result; and
  
  send the combined recognition result to the rule-based speech recognition system.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the second recognition is a statistical model-based recognition result.
  - 9. The system of claim 7, wherein the one or more processors are further operable to replace the mark-up specifying that the portion of the audio stream is not recognized by the rule-based speech recognition system with the second recognition result.
  - 10. The system of claim 7, wherein the one or more processors are further operable to perform a statistical model-based recognition of the audio stream prior to receiving the first recognition result.
  - 11. The system of claim 10, wherein if the first recognition result comprises a mark-up specifying that the portion of the audio stream is not recognized by the rule-based speech recognition system, the one or more processors are further operable to:
    - cancel performing the statistical model-based recognition of the audio stream; and
      
      perform a statistical model-based recognition of the specified portion of the audio stream.
  - 12. The system of claim 7, further comprising:
    - if the first recognition result does not comprise a mark-up specifying that the portion of the audio stream is not recognized by the rule-based speech recognition system, the one or more processors are further operable to;
      
      perform a statistical model-based recognition of the audio stream;
      
      analyze a result of the statistical model-based recognition of the audio stream and the first recognition result;
      
      determine whether the result of the statistical model-based recognition of the audio stream or whether the first recognition result has a better recognition quality; and
      
      send the recognition result with the better recognition quality to the rule-based speech recognition system.

13. A computer-readable storage device encoding computer executable instructions that, when executed by a processing unit, perform a method, comprising:
- receiving a first recognition result of an audio stream from a rule-based speech recognition system, the first recognition result containinga mark-up that indicates a portion of the audio stream was not recognized by the rule-based speech recognition system;
  
  performing a statistical model-based recognition of the marked-up portion of the audio stream to create a second recognition result;
  
  combining the second recognition result with the first recognition result to form a combined recognition result; and
  
  sending the combined recognition result to the rule-based speech recognition system.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computer-readable storage device of claim 13, wherein the second recognition result is a statistical model-based recognition result.
  - 15. The computer-readable storage device of claim 13, wherein combining the second recognition result with the first recognition result comprises replacing the mark-up portion of the audio stream with the second recognition result.
  - 16. The computer-readable storage device of claim 13, further comprising performing a statistical model-based recognition of the audio stream prior to receiving the first recognition result.
  - 17. The computer-readable storage device of claim 16, wherein if the first recognition result comprises a mark-up specifying that the portion of the audio stream is not recognized by the rule-based speech recognition system:
    - cancelling performing the statistical model-based recognition of the audio stream; and
      
      performing a statistical model-based recognition of the specified portion of the audio stream.
  - 18. The computer-readable storage device of claim 14, further comprising instructions for:
    - performing a statistical model-based recognition of the audio stream;
      
      analyzing a result of the statistical model-based recognition of the audio stream and the first recognition result;
      
      determining whether the result of the statistical model-based recognition of the audio stream or whether the first recognition result has a better recognition quality; and
      
      sending the recognition result with the better recognition quality to the rule-based speech recognition system when it is determined that the first recognition result does not comprise a mark-up specifying that the portion of the audio stream is not recognized by a rule-based speech recognition system.
  - 19. The computer-readable storage device of claim 13, further comprising instructions for performing a task based on the combined recognition result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Wilson, Travis, Quazi, Salman, Vicondoa, John, Fatehpuria, Pradip
Primary Examiner(s)
Singh, Satwant K

Application Number

US15/426,640
Publication Number

US 20170162204A1
Time in Patent Office

847 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/18   using natural language mode...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Incorporating an exogenous large-vocabulary model into rule-based speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Incorporating an exogenous large-vocabulary model into rule-based speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links