Method and system for network-based speech recognition

US 20030046065A1
Filed: 07/19/2002
Published: 03/06/2003
Est. Priority Date: 10/04/1999
Status: Active Grant

First Claim

Patent Images

1. A method for supporting speech recognition on a user processing device, comprising:

receiving a stream of audio speech data;

storing a portion of the stream of audio speech data into a buffer of a linked list of buffers as it is received;

at a time t₁, wherein t₁is prior to the time when the entirety of the stream of audio speech data is received, encoding a buffer of audio speech data into a smaller file representation;

at a time t₂, wherein t₂is prior to the time when the entirety of the stream of audio speech data is received, formatting a portion of the smaller file representation into a packet for transmitting over the internet; and

, at a time t₃, wherein t₃is prior to the time when the entirety of the stream of audio speech data is received, transmitting the packet over the internet.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.

Citations

20 Claims

1. A method for supporting speech recognition on a user processing device, comprising:
- receiving a stream of audio speech data;
  
  storing a portion of the stream of audio speech data into a buffer of a linked list of buffers as it is received;
  
  at a time t₁, wherein t₁is prior to the time when the entirety of the stream of audio speech data is received, encoding a buffer of audio speech data into a smaller file representation;
  
  at a time t₂, wherein t₂is prior to the time when the entirety of the stream of audio speech data is received, formatting a portion of the smaller file representation into a packet for transmitting over the internet; and
  
  , at a time t₃, wherein t₃is prior to the time when the entirety of the stream of audio speech data is received, transmitting the packet over the internet.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the stream of audio speech data is transmitted to a server for processing a language learning exercise.
  - 3. The method of claim 2, wherein the stream of audio speech data is received by a client, and a packet of encoded audio speech data is transmitted by the client to the server remotely located from the client.
  - 4. The method of claim 1, further comprising:
    - writing a portion of the audio stream data contained in a buffer in the linked list of buffers to a second buffer in a second linked list of buffers; and
      
      , freeing the buffer in the linked list of buffers to receive another portion of the stream of audio speech data.
  - 5. The method of claim 1, further comprising, at a time t₄, receiving one or more packets of a text response, decoding a packet of text response after it is received, and displaying the text response.
  - 6. The method of claim 1, further comprising:
    - transmitting a URL comprising a grammar context number which is indicative of a speech recognition exercise that the stream of audio speech data is for; and
      
      , establishing an internet connection prior to time t₃.

7. A method for supporting speech recognition on a server, comprising:
- receiving a URL from a client remote from the server, the URL comprising a grammar context number;
  
  receiving one or more input packets of encoded audio speech data from the client;
  
  decoding each of the one or more input packets of encoded audio speech data into a portion of raw speech data upon receipt of the respective input packet;
  
  storing each portion of raw speech data into a buffer of a linked list of buffers;
  
  indicating a grammar associated with the grammar context number to a speech recognition engine;
  
  providing each buffer containing a portion of raw speech data to the speech recognition engine as the speech recognition engine is ready to accept it; and
  
  , receiving a response from the speech recognition engine, wherein the response is based on an evaluation of the raw speech data in relation to the grammar.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, further comprising:
    - packaging a text response based on the evaluation of the raw speech data in relation to the grammar into one or more transmission packets to be transmitted over the internet; and
      
      , transmitting the one or more transmission packets to the client.
  - 9. The method of claim 8, wherein the one or more input packets of encoded audio speech data received from the client are transmitted via an internet connection, and the one or more transmission packets are transmitted to the client via the same internet connection.
  - 10. The method of claim 7, wherein the one or more input packets of encoded audio speech data are received from the client for a language learning exercise.
  - 11. The method of claim 7, wherein a first buffer containing a first portion of raw speech data is provided to the speech recognition engine before all of the input packets of encoded audio speech data are received from the client.
  - 12. The method of claim 7, further comprising:
    - identifying a speech file stored on the server based on receiving a second URL from the client;
      
      encoding the speech file into a smaller file representation;
      
      packaging the encoded speech file into one or more transmission packets; and
      
      , transmitting each of the one or more transmission packets to the client.

13. A method for supporting speech recognition on a server, comprising:
- receiving a first URL from a first client remote from the server, the first URL comprising a first grammar context;
  
  associating a first set of a plurality of buffers with the first client;
  
  generating a first instance of a speech recognition engine for the first client;
  
  indicating a first grammar associated with the first grammar context to the first instance of the speech recognition engine;
  
  receiving a second URL from a second client remote from the server, the second URL comprising a second grammar context;
  
  associating a second set of a plurality of buffers with the second client;
  
  generating a second instance of the speech recognition engine for the second client;
  
  indicating a second grammar associated with the second grammar context to the second instance of the speech recognition engine;
  
  receiving a packet of encoded audio speech data from the first client;
  
  decoding the packet of encoded audio speech data from the first client into a first client portion of raw data;
  
  storing the first client portion of raw data into a buffer of the first set of a plurality of buffers;
  
  providing the buffer containing the first client portion of raw data to the first instance of a speech recognition engine for processing with the first grammar;
  
  receiving a packet of encoded audio speech data from the second client;
  
  decoding the packet of encoded audio speech data from the second client into a second client portion of raw data;
  
  storing the second client portion of raw data into a buffer of the second set of a plurality of buffers; and
  
  , providing the buffer containing the second client portion of raw data to the second instance of the speech recognition engine for processing with the second grammar.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, further comprising:
    - receiving a first response from the first instance of the speech recognition engine, wherein the first response is based on an evaluation of the raw data provided to the first instance of the speech recognition engine in relation to the first grammar;
      
      packaging a first text response based on the evaluation of the raw data provided to the first instance of the speech recognition engine in relation to the first grammar into one or more first client transmission packets;
      
      transmitting the one or more first client transmission packets to the first client over the internet;
      
      receiving a second response from the second instance of the speech recognition engine, wherein the second response is based on an evaluation of the raw data provided to the second instance of the speech recognition engine in relation to the second grammar;
      
      packaging a second text response based on the evaluation of the raw data provided to the second instance of the speech recognition engine in relation to the second grammar into one or more second client transmission packets; and
      
      , transmitting the one or more second client transmission packets to the second client over the internet.
  - 15. The method of claim 14, wherein the packet of encoded audio speech data from the first client is received via a first TCP/IP connection and the one or more first client transmission packets are transmitted to the first client via the same first TCP/IP connection, and wherein the packet of encoded audio speech data from the second client is received via a second TCP/IP connection and the one or more second client transmission packets are transmitted to the second client via the same second TCP/IP connection.
  - 16. The method of claim 15, further comprising:
    - releasing the first TCP/IP connection following the transmission of the last packet of the one or more first client transmission packets to the first client;
      
      terminating the first instance of the speech recognition engine some time after receiving the first response from the first instance of the speech recognition engine;
      
      releasing the second TCP/IP connection following the transmission of the last packet of the one or more second client transmission packets to the second client; and
      
      , terminating the second instance of the speech recognition engine some time after receiving the second response from the second instance of the speech recognition engine.
  - 17. The method of claim 13, wherein the packet of encoded audio speech data from the first client is for a first language learning exercise, and the packet of encoded audio speech data from the second client is for a second language learning exercise.

18. A system supporting speech recognition, comprising:
- a plurality of clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer of received audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and
  
  , a server, said server comprising the capability to receive packets of encoded audio speech from said plurality of clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers, evaluate the resultant raw speech received from each of said plurality of clients, and transmit a feedback response to each of said plurality of clients from which said server received packets of encoded audio speech.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein a feedback response transmitted from said server to a client comprises a text response.
  - 20. The system of claim 18, wherein said server further comprises server associated memory;
    - and, the capability to identify a speech file stored in said server associated memory, encode the identified speech file into a smaller file representation, package the encoded speech file into one or more transmission packets, and transmit each of the one or more transmission packets to a client.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pearson Education Incorporated (Pearson plc)
Original Assignee
Global English Company
Inventors
Jochumson, Christopher S.

Granted Patent

US 6,865,536 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/211
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

G10L 2015/228   of application context

Y10S 707/99933   Query processing, i.e. sear...

Method and system for network-based speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for network-based speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links