Distributed speech recognition using one way communication

US 8,249,878 B2
Filed: 08/02/2011
Issued: 08/21/2012
Est. Priority Date: 08/29/2008
Status: Active Grant

First Claim

Patent Images

1. A method performed by at least one computer processor executing computer program instructions stored on a non-transitory computer-readable medium, the method comprising:

(A) at a speech recognition server;

(A)(1) receiving a speech stream and a control stream from a client;

(A) (2) using an automatic speech recognition engine in a first configuration state to recognize a first portion of the speech stream and thereby to produce a first speech recognition result;

(B) at the speech recognition server, if the first speech recognition result satisfies a first predetermined criterion specified by the control stream, then waiting until the speech recognition engine has been reconfigured before continuing to (C); and

(C) at the speech recognition server, using the automatic speech recognition engine in a second configuration state to recognize a second portion of the speech stream and thereby to produce a second speech recognition result.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes a first portion of the speech stream and, if a predetermined criterion is satisfied by the speech recognition result, waits until the speech recognizer has been reconfigured before recognizing a second portion of the speech stream. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition.

39 Citations

View as Search Results

18 Claims

1. A method performed by at least one computer processor executing computer program instructions stored on a non-transitory computer-readable medium, the method comprising:
- (A) at a speech recognition server;
  
  (A)(1) receiving a speech stream and a control stream from a client;
  
  (A) (2) using an automatic speech recognition engine in a first configuration state to recognize a first portion of the speech stream and thereby to produce a first speech recognition result;
  
  (B) at the speech recognition server, if the first speech recognition result satisfies a first predetermined criterion specified by the control stream, then waiting until the speech recognition engine has been reconfigured before continuing to (C); and
  
  (C) at the speech recognition server, using the automatic speech recognition engine in a second configuration state to recognize a second portion of the speech stream and thereby to produce a second speech recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein (C) further comprises executing a control message in the control stream to reconfigure the speech recognition engine after (B), the control message identifying the second configuration state.
  - 3. The method of claim 2, further comprising:
    - (D) at the client, before (A), transmitting the speech stream and the control stream to the speech recognition server.
  - 4. The method of claim 3, further comprising:
    - (E) at the client, transmitting a speech recognition result request to the speech recognition server; and
      
      (F) at the speech recognition server;
      
      (F)(1) determining whether any speech recognition results are available;
      
      (F)(2) if no speech recognition results are available, returning to (F)(1);
      
      (F)(3) otherwise, transmitting at least one of the first and second speech recognition results to the client.
  - 5. The method of claim 4, wherein the speech recognition server performs (B) in parallel with (F).
  - 6. The method of claim 4, wherein (D) comprises transmitting the speech stream and the control stream using a Hypertext Transfer Protocol (HTTP), and wherein (E) comprises transmitting the speech recognition result request using HTTP.
  - 7. The method of claim 4, wherein (D) comprises transmitting the speech stream and the control stream using a Hypertext Transfer Protocol over Secure Sockets Layer (HTTPS), and wherein (E) comprises transmitting the speech recognition result request using HTTPS.
  - 8. The method of claim 3, wherein (D) comprises:
    - (D)(1) transmitting a first control message in the control stream to the speech recognition server;
      
      (D)(2) detecting a failure of the transmission of the first portion; and
      
      (D)(3) in response to detection of the failure;
      
      (D)(3)(a) creating a second control message specifying a combination of a first state change represented by the first control message and a second state change; and
      
      (D)(3)(b) transmitting the second control message in the control stream to the speech recognition server.
  - 9. The method of claim 2, wherein (C) comprises waiting until the automatic speech recognition engine is in a predetermined configuration state before continuing to (D).

10. A non-transitory computer-readable medium comprising computer program instructions stored on the computer-readable medium, wherein the computer program instructions are executable by at least one computer processor to perform a method comprising:
- (A) at a speech recognition server;
  
  (A)(1) receiving a speech stream and a control stream from a client;
  
  (A) (2) using an automatic speech recognition engine in a first configuration state to recognize a first portion of the speech stream and thereby to produce a first speech recognition result;
  
  (B) at the speech recognition server, if the first speech recognition result satisfies a first predetermined criterion specified by the control stream, then waiting until the speech recognition engine has been reconfigured before continuing to (C); and
  
  (C) at the speech recognition server, using the automatic speech recognition engine in a second configuration state to recognize a second portion of the speech stream and thereby to produce a second speech recognition result.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The computer-readable medium of claim 10, wherein (C) further comprises executing a control message in the control stream to reconfigure the speech recognition engine after (B), the control message identifying the second configuration state.
  - 12. The computer-readable medium of claim 11, wherein the method further comprises:
    - (D) at the client, before (A), transmitting the speech stream and the control stream to the speech recognition server.
  - 13. The computer-readable medium of claim 12, wherein the method further comprises:
    - (E) at the client, transmitting a speech recognition result request to the speech recognition server; and
      
      (F) at the speech recognition server;
      
      (F)(1) determining whether any speech recognition results are available;
      
      (F)(2) if no speech recognition results are available, returning to (F)(1);
      
      (F)(3) otherwise, transmitting at least one of the first and second speech recognition results to the client.
  - 14. The computer-readable medium of claim 13, wherein the speech recognition server performs (B) in parallel with (F).
  - 15. The computer-readable medium of claim 13, wherein (D) comprises transmitting the speech stream and the control stream using a Hypertext Transfer Protocol (HTTP), and wherein (E) comprises transmitting the speech recognition result request using HTTP.
  - 16. The computer-readable medium of claim 13, wherein (D) comprises transmitting the speech stream and the control stream using a Hypertext Transfer Protocol over Secure Sockets Layer (HTTPS), and wherein (E) comprises transmitting the speech recognition result request using HTTPS.
  - 17. The computer-readable medium of claim 12, wherein (D) comprises:
    - (D)(1) transmitting a first control message in the control stream to the speech recognition server;
      
      (D)(2) detecting a failure of the transmission of the first portion; and
      
      (D)(3) in response to detection of the failure;
      
      (D)(3)(a) creating a second control message specifying a combination of a first state change represented by the first control message and a second state change; and
      
      (D)(3)(b) transmitting the second control message in the control stream to the speech recognition server.
  - 18. The computer-readable medium of claim 11, wherein (C) comprises waiting until the automatic speech recognition engine is in a predetermined configuration state before continuing to (D).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Solventum Intellectual Properties Company (Solventum Corp.)
Original Assignee
Multimodal Technologies Incorporated (3M Company)
Inventors
Carraux, Eric, Koll, Detlef
Primary Examiner(s)
Smits, Talivaldis Ivars

Application Number

US13/196,188
Publication Number

US 20110288857A1
Time in Patent Office

385 Days
Field of Search

None
US Class Current

704/270.1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

Distributed speech recognition using one way communication

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed speech recognition using one way communication

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links