Hybrid Speech Recognition

US 20100057450A1
Filed: 08/30/2009
Published: 03/04/2010
Est. Priority Date: 08/29/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by a client device, the method comprising:

(A) receiving a request from a requester to apply automatic speech recognition to an audio signal;

(B) providing the audio signal to a first automatic speech recognition engine in the client device;

(C) receiving first speech recognition results from the first automatic speech recognition engine;

(D) determining whether a second automatic speech recognition engine, in a server device, is accessible to the client device;

(E) if the second automatic speech recognition engine is determined not to be accessible to the client device, then providing the first speech recognition results to the requester in response to the request.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.

Citations

21 Claims

1. A computer-implemented method performed by a client device, the method comprising:
- (A) receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  (B) providing the audio signal to a first automatic speech recognition engine in the client device;
  
  (C) receiving first speech recognition results from the first automatic speech recognition engine;
  
  (D) determining whether a second automatic speech recognition engine, in a server device, is accessible to the client device;
  
  (E) if the second automatic speech recognition engine is determined not to be accessible to the client device, then providing the first speech recognition results to the requester in response to the request.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the requester comprises a machine coupled to the client device.
  - 3. The method of claim 1, wherein the requester comprises software executing on the client device.

4. An apparatus comprising:
- means for receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  means for providing the audio signal to a first automatic speech recognition engine in the client device;
  
  means for receiving first speech recognition results from the first automatic speech recognition engine;
  
  means for determining whether a second automatic speech recognition engine, in a server device, is accessible to the client device; and
  
  means for providing the first speech recognition results to the requester in response to the request if the second automatic speech recognition engine is determined not to be accessible to the client device.

5. A computer-implemented method performed by a client device, the method comprising:
- (A) receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  (B) providing the audio signal to a first automatic speech recognition engine in a server device;
  
  (C) receiving first speech recognition results from the first automatic speech recognition engine;
  
  (D) determining whether a second automatic speech recognition engine, in the client device, is accessible to the client device;
  
  (E) if the second automatic speech recognition engine is determined not to be accessible to the client device, then providing the first speech recognition results to the requester in response to the request.

6. An apparatus comprising:
- means for receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  means for providing the audio signal to a first automatic speech recognition engine in a server device;
  
  means for receiving first speech recognition results from the first automatic speech recognition engine;
  
  means for determining whether a second automatic speech recognition engine, in the client device, is accessible to the client device;
  
  means for providing the first speech recognition results to the requester in response to the request if the second automatic speech recognition engine is determined not to be accessible to the client device.

7. A computer-implemented method performed by a client device, the method comprising:
- (A) receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  (B) providing the audio signal to a first automatic speech recognition engine in the client device;
  
  (C) receiving first speech recognition results from the first automatic speech recognition engine at a first time;
  
  (D) providing the audio signal to a second automatic speech recognition engine in a server device;
  
  (E) determining whether second speech recognition results have been received by the client device from the second automatic speech recognition engine within a predetermined time period after the first time;
  
  (F) if the second speech recognition results have been received by the client device within the predetermined time period, then providing the second speech recognition results to the requester in response to the request; and
  
  (G) if the second speech recognition results have not been received by the client device within the predetermined time period, then providing the first speech recognition results to the requester in response to the request.
- View Dependent Claims (8)
- - 8. The method of claim 7, wherein (E) comprises selecting the predetermined time period based on a type of the second speech recognition results.

9. An apparatus comprising:
- means for receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  means for providing the audio signal to a first automatic speech recognition engine in the client device;
  
  means for receiving first speech recognition results from the first automatic speech recognition engine at a first time;
  
  means for providing the audio signal to a second automatic speech recognition engine in a server device;
  
  means for determining whether second speech recognition results have been received by the client device from the second automatic speech recognition engine within a predetermined time period after the first time;
  
  means for providing the second speech recognition results to the requester in response to the request if the second speech recognition results have been received by the client device within the predetermined time period; and
  
  means for providing the first speech recognition results to the requester in response to the request if the second speech recognition results have not been received by the client device within the predetermined time period.

10. A computer-implemented method performed by a client device, the method comprising:
- (A) receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  (B) providing the audio signal to a first automatic speech recognition engine in the client device;
  
  (C) providing the audio signal to a second automatic speech recognition engine in a server device;
  
  (D) receiving first speech recognition results from the first automatic speech recognition engine;
  
  (E) determining whether a confidence measure associated with the first speech recognition results exceeds a predetermined threshold; and
  
  (F) if the confidence measure exceeds the predetermined threshold, then providing the first speech recognition results to the requester in response to the request.
- View Dependent Claims (11)
- - 11. The method of claim 10, further comprising:
    - (G) before (F), receiving second speech recognition results from the second automatic speech recognition engine; and
      
      wherein (F) comprises providing the first speech recognition results but not the second speech recognition results to the requester.

12. An apparatus comprising:
- means for receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  means for providing the audio signal to a first automatic speech recognition engine in the client device;
  
  means for providing the audio signal to a second automatic speech recognition engine in a server device;
  
  means for receiving first speech recognition results from the first automatic speech recognition engine;
  
  means for determining whether a confidence measure associated with the first speech recognition results exceeds a predetermined threshold; and
  
  means for providing the first speech recognition results to the requester in response to the request if the confidence measure exceeds the predetermined threshold.

13. A computer-implemented method performed by a client device, the method comprising:
- (A) receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  (B) providing the audio signal to a first automatic speech recognition engine in the client device;
  
  (C) receiving first speech recognition results from the first automatic speech recognition engine;
  
  (D) providing the audio signal to a second automatic speech recognition engine in a server device;
  
  (E) receiving second speech recognition results from the second automatic speech recognition engine;
  
  (F) producing hybrid speech recognition results based on the first speech recognition results and the second speech recognition results; and
  
  (G) providing the hybrid speech recognition results to the requester in response to the request.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The method of claim 13, wherein (F) comprises combining the first speech recognition results and the second speech recognition results using Recognizer Output Voting Error Reduction.
  - 15. The method of claim 13, wherein the client device is configured to treat one of the first and second automatic speech recognition engines as a preferred speech recognition engine, and:
    - wherein (C) comprises receiving the first speech recognition results at an arbitration engine in the client device at a first time;
      
      wherein (E) comprises receiving the second speech recognition results at the arbitration engine in the client device at a second time that is later than the first time; and
      
      wherein (F) comprises;
      
      (F)(1) including the first recognition results in the hybrid speech recognition results; and
      
      (F)(2) including the second speech recognition results in the hybrid speech recognition results only if the second automatic speech recognition engine is the preferred speech recognition engine.
  - 16. The method of claim 13, wherein the client device is configured to treat one of the first and second automatic speech recognition engines as a preferred speech recognition engine, and:
    - wherein (E) comprises receiving the second speech recognition results at an arbitration engine in the client device at a first time;
      
      wherein (C) comprises receiving the first speech recognition results at the arbitration engine in the client device at a second time that is later than the first time; and
      
      wherein (F) comprises;
      
      (F)(1) including the second recognition results in the hybrid speech recognition results; and
      
      (F)(2) including the first speech recognition results in the hybrid speech recognition results only if the first automatic speech recognition engine is the preferred speech recognition engine.
  - 17. The method of claim 13:
    - wherein (C) comprises receiving the first speech recognition results at an arbitration engine in the client device at a first time;
      
      wherein (E) comprises receiving the second speech recognition results at the arbitration engine in the client device at a second time that is later than the first time; and
      
      wherein (F) comprises;
      
      (F)(1) including the first recognition results in the hybrid speech recognition results;
      
      (F)(2) identifying words in the second speech recognition results that do not overlap in time with any words in the first speech recognition results; and
      
      (F)(3) including only non-overlapping words from the second speech recognition results in the hybrid speech recognition results.
  - 18. The method of claim 13:
    - wherein (E) comprises receiving the second speech recognition results at an arbitration engine in the client device at a first time;
      
      wherein (C) comprises receiving the first speech recognition results at the arbitration engine in the client device at a second time that is later than the first time; and
      
      wherein (F) comprises;
      
      (F) (1) including the second recognition results in the hybrid speech recognition results;
      
      (F) (2) identifying words in the first speech recognition results that do not overlap in time with any words in the second speech recognition results; and
      
      (F) (3) including only non-overlapping words from the first speech recognition results in the hybrid speech recognition results.
  - 19. The method of claim 13, wherein (F) comprises:
    - (F) (1) including the first recognition results in the hybrid speech recognition results; and
      
      (F) (2) replacing the first recognition results with the second recognition results in the hybrid speech recognition results.
  - 20. The method of claim 13, wherein (F) comprises:
    - (F) (1) including the second recognition results in the hybrid speech recognition results; and
      
      (F) (2) replacing the second recognition results with the first recognition results in the hybrid speech recognition results.

21. An apparatus comprising:
- means for receiving a request from a requester to apply automatic speech recognition to an audio signal;
  
  means for providing the audio signal to a first automatic speech recognition engine in the client device;
  
  means for receiving first speech recognition results from the first automatic speech recognition engine;
  
  means for providing the audio signal to a second automatic speech recognition engine in a server device;
  
  means for receiving second speech recognition results from the second automatic speech recognition engine;
  
  means for producing hybrid speech recognition results based on the first speech recognition results and the second speech recognition results; and
  
  means for providing the hybrid speech recognition results to the requester in response to the request.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Solventum Intellectual Properties Company (Solventum Corp.)
Original Assignee
Multimodal Technologies Incorporated (3M Company)
Inventors
Koll, Detlef

Granted Patent

US 7,933,777 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 15/32 Multiple recognisers used i...

Hybrid Speech Recognition

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Hybrid Speech Recognition

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links