Incorporating an Exogenous Large-Vocabulary Model into Rule-Based Speech Recognition

US 20150206528A1
Filed: 01/17/2014
Published: 07/23/2015
Est. Priority Date: 01/17/2014
Status: Active Grant

First Claim

Patent Images

1. A method for providing speech recognition, the method comprising:

receiving an audio stream;

performing a rule-based speech recognition of the audio stream;

if a portion of the audio stream is recognized, inserting a recognition result of the recognized portion of the audio stream in a first recognition result;

if a portion of the audio stream is not recognized, inserting a mark-up in the first recognition result specifying the portion of the audio stream that is not recognized; and

sending the first recognition result to a large vocabulary speech recognition system for applying a statistical-based recognition of the portion of the audio stream that is not recognized.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Incorporation of an exogenous large-vocabulary model into rule-based speech recognition is provided. An audio stream is received by a local small-vocabulary rule-based speech recognition system (SVSRS), and is streamed to a large-vocabulary statistically-modeled speech recognition system (LVSRS). The SVSRS and LVSRS perform recognitions of the audio. If a portion of the audio is not recognized by the SVSRS, a rule is triggered that inserts a mark-up in the recognition result. The recognition result is sent to the LVSRS. If a mark-up is detected, recognition of a specified portion of the audio is performed. The LVSRS result is unified with the SVSRS result and sent as a hybrid response back to the SVSRS. If the hybrid-recognition rule is not triggered, an arbitration algorithm is evoked to determine whether the SVSRS or the LVSRS recognition has a lesser word error rate. The determined recognition is sent as a response to the SVSRS.

18 Citations

View as Search Results

20 Claims

1. A method for providing speech recognition, the method comprising:
- receiving an audio stream;
  
  performing a rule-based speech recognition of the audio stream;
  
  if a portion of the audio stream is recognized, inserting a recognition result of the recognized portion of the audio stream in a first recognition result;
  
  if a portion of the audio stream is not recognized, inserting a mark-up in the first recognition result specifying the portion of the audio stream that is not recognized; and
  
  sending the first recognition result to a large vocabulary speech recognition system for applying a statistical-based recognition of the portion of the audio stream that is not recognized.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further comprising sending the audio stream and metadata to the large vocabulary speech recognition system.
  - 3. The method of claim 1, further comprising receiving a response, the response comprising the rule-matched portion of the audio stream and a second recognition result.
  - 4. The method of claim 3, wherein the second recognition result comprises a large vocabulary speech recognition system recognition result.

5. A system for providing incorporation of an exogenous large-vocabulary model into rule-based speech recognition, the system comprising:
- one or more processors; and
  
  a memory coupled to the one or more processors, the one or more processors operable to;
  
  receive an audio stream;
  
  perform a rule-based speech recognition of the audio stream;
  
  if a portion of the audio stream is matched with a rule, insert a recognition result of the rule-matched portion of the audio stream in a first recognition result;
  
  if a portion of the audio stream is not matched with a rule, insert a mark-up in the first recognition result specifying the portion of the audio stream that is not matched with a rule; and
  
  send the first recognition result to a large vocabulary speech recognition system for applying a statistical-based recognition of the portion of the audio stream that is not recognized.
- View Dependent Claims (6, 7, 8)
- - 6. The system of claim 5, wherein the one or more processors are further operable to send the audio stream and metadata to the large vocabulary speech recognition system.
  - 7. The system of claim 5, wherein the one or more processors are further operable to receive a response, the response comprising the rule-matched portion of the audio stream and a second recognition result, wherein the second recognition result comprises a large vocabulary speech recognition system recognition result.
  - 8. The system of claim 5, wherein the first recognition result comprises a plurality of mark-ups specifying a plurality of portions of the audio stream that are not matched with a rule.

9. A method for providing incorporation of an exogenous large-vocabulary model into rule-based speech recognition, the method comprising:
- receiving an audio stream;
  
  receiving a first recognition result;
  
  determining if the first recognition result comprises a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system;
  
  if the first recognition result comprises a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system, performing a statistical model-based recognition of the specified portion of the audio stream;
  
  combining a second recognition result with the first recognition result; and
  
  sending a combined recognition result to the rule-based speech recognition system.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The method of claim 9, wherein the second recognition is a statistical model-based recognition result.
  - 11. The method of claim 9, wherein combining a second recognition result with the first recognition result comprises replacing the mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system with the second recognition result.
  - 12. The method of claim 9, wherein prior to receiving a first recognition result, performing a statistical model-based recognition of the audio stream.
  - 13. The method of claim 12, wherein if the first recognition result comprises a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system:
    - cancelling performing the statistical model-based recognition of the audio stream; and
      
      performing a statistical model-based recognition of the specified portion of the audio stream.
  - 14. The method of claim 9, further comprising:
    - if the first recognition result does not comprise a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system;
      
      performing a statistical model-based recognition of the audio stream;
      
      analyzing a result of the statistical model-based recognition of the audio stream and the first recognition result;
      
      determining whether the result of the statistical model-based recognition of the audio stream or whether the first recognition result has a better recognition quality; and
      
      sending the recognition result with the better recognition quality to the rule-based speech recognition system.

15. A system for providing speech recognition, the system comprising:
- one or more processors; and
  
  a memory coupled to the one or more processors, the one or more processors operable to;
  
  receive an audio stream;
  
  receive a first recognition result;
  
  determine if the first recognition result comprises a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system;
  
  if the first recognition result comprises a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system, performing a statistical model-based recognition of the specified portion of the audio stream;
  
  combine a second recognition result with the first recognition result; and
  
  send a combined recognition result to the rule-based speech recognition system.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the second recognition is a statistical model-based recognition result.
  - 17. The system of claim 15, wherein in combining a second recognition result with the first recognition result, the one or more processors are further operable to replace the mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system with the second recognition result.
  - 18. The system of claim 15, wherein prior to receiving a first recognition result, the one or more processors are further operable to perform a statistical model-based recognition of the audio stream.
  - 19. The system of claim 18, wherein if the first recognition result comprises a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system, the one or more processors are further operable to:
    - cancel performing the statistical model-based recognition of the audio stream; and
      
      perform a statistical model-based recognition of the specified portion of the audio stream.
  - 20. The system of claim 15, further comprising:
    - if the first recognition result does not comprise a mark-up specifying a portion of the audio stream is not recognized by a rule-based speech recognition system, the one or more processors are further operable to;
      
      perform a statistical model-based recognition of the audio stream;
      
      analyze a result of the statistical model-based recognition of the audio stream and the first recognition result;
      
      determine whether the result of the statistical model-based recognition of the audio stream or whether the first recognition result has a better recognition quality; and
      
      send the recognition result with the better recognition quality to the rule-based speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Wilson, Travis, Fatehpuria, Pradip, Quazi, Salman, Vicondoa, John

Granted Patent

US 9,601,108 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/18   using natural language mode...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Incorporating an Exogenous Large-Vocabulary Model into Rule-Based Speech Recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Incorporating an Exogenous Large-Vocabulary Model into Rule-Based Speech Recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links