Speech recognition and speaker verification using distributed speech processing

US 7,571,100 B2
Filed: 12/03/2002
Issued: 08/04/2009
Est. Priority Date: 12/03/2002
Status: Active Grant

First Claim

Patent Images

1. A method for processing a speech utterance in which a local computer accesses instructions from computer storage and executes the instructions to perform steps of:

recording a speech utterance from a user using the local computer;

communicating between the local computer and a remote computer using a hyper text communication session, including;

sending the recording of the speech utterance from the local computer to the remote computer in the session; and

receiving a result from the remote computer, the result based on a processing of the recording at the remote computer including analyzing the speech utterance in the recording using a speech recognition application at the remote computer;

wherein the computer-implemented method further comprises using the local computer to receive a script that includes a universal resource locator of an application program that is run by the remote computer to process the recording, the script includes an instruction that instructs the local computer to perform a task based on the result received from the remote computer.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Processing a speech utterance by communicating between a local computer and a remote computer using a hyper text communication session. The local computer sends a recording of a speech utterance to the remote computer in the session, and receives a result from the remote computer, the result based on a processing of the recording at the remote computer.

Citations

47 Claims

1. A method for processing a speech utterance in which a local computer accesses instructions from computer storage and executes the instructions to perform steps of:
- recording a speech utterance from a user using the local computer;
  
  communicating between the local computer and a remote computer using a hyper text communication session, including;
  
  sending the recording of the speech utterance from the local computer to the remote computer in the session; and
  
  receiving a result from the remote computer, the result based on a processing of the recording at the remote computer including analyzing the speech utterance in the recording using a speech recognition application at the remote computer;
  
  wherein the computer-implemented method further comprises using the local computer to receive a script that includes a universal resource locator of an application program that is run by the remote computer to process the recording, the script includes an instruction that instructs the local computer to perform a task based on the result received from the remote computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The computer-implemented method of claim 1 in which the result characterizes a similarity of voice characteristics.
  - 3. The computer-implemented method of claim 1 in which the result indicates that a speaker who made the speech utterance is a known person.
  - 4. The computer-implemented method of claim 1 in which the result indicates a match of stored voice characteristics of a speaker.
  - 5. The computer-implemented method of claim 1, further comprising prompting the user.
  - 6. The computer-implemented method of claim 5, further comprising using the local computer to receive a script that includes an instruction that instructs the local computer to prompt the user.
  - 7. The computer-implemented method of claim 1 in which the script includes extensible markup language tags.
  - 8. The computer-implemented method of claim 7 in which the script includes voice extensible markup language tags.
  - 9. The computer-implemented method of claim 1, further comprising processing the recording to determine a similarity of voices.
  - 10. The computer-implemented method of claim 1, further comprising processing the recording to indicate a likelihood that the speaker is a known person.
  - 11. The computer-implemented method of claim 1, further comprising processing the recording to indicate a match of stored voice characteristics.
  - 12. The computer-implemented method of claim 11, further comprising using the local computer to prompt a user to make the speech utterance.
  - 13. The computer-implemented method of claim 1 in which the hyper text communication session follows a hyper text transfer protocol.
  - 14. The computer-implemented method of claim 13 in which the hyper text transfer protocol is an HTTP protocol defined by World Wide Web Consortium.
  - 15. The computer-implemented method of claim 1 in which the local computer is a web browser and the remote computer is a web server.

16. A computer-implemented method in which a computer accesses instructions from computer storage to execute a web browser process comprising steps of:
- receiving a dialog file at the web browser;
  
  controlling a speech dialog using the received dialog file;
  
  receiving a speech utterance from a user as part of the speech dialog;
  
  encoding the speech utterance to generate an encoded speech utterance;
  
  sending a request from the web browser to a web server according to Hypertext Transfer Protocol, the request containing the encoded speech utterance; and
  
  receiving a response from the web server, the response containing a result based on a processing of the encoded speech utterance including analyzing the encoded speech utterance using a speech recognition application at the web server,wherein the computer-implemented method further comprises receiving a script at the web browser that includes a universal resource locator associated with the speech recognition application, the script includes an instruction that instructs the web browser to perform a task based on the result received from the web server.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- - 17. The computer-implemented method of claim 16 in which the request contains an identifier to an application used to process the speech utterance.
  - 18. The computer-implemented method of claim 16 in which receiving the dialog file comprises receiving the dialog file from the web server.
  - 19. The computer-implemented method of claim 16 in which the dialog file comprises a VoiceXML document.
  - 20. The computer-implemented method of claim 16 in which the encoded speech utterance comprises an MIME-encoded message.
  - 21. The computer-implemented method of claim 16 in which the response is sent from the web server to the web browser according to Hypertext Transfer Protocol.
  - 22. The computer-implemented method of claim 16 in which the application comprises a speech recognizer.
  - 23. The computer-implemented method of claim 16 wherein the application is a speech recognizer and the result characterizes a similarity of voice characteristics.

24. A computer-implemented method in which a server accesses instructions from computer storage and executes the instructions to perform steps of:
- sending a dialog file from the server to a client, the dialog file containing statements for processing by the client to control a speech dialog;
  
  receiving at the server a request from the client in response to the client processing one of the statements, the request containing an encoded speech utterance and being sent from the client to the server according to Hypertext Transfer Protocol;
  
  processing the encoded speech utterance by using the server including analyzing the encoded speech utterance using a speech recognition application; and
  
  sending a response from the server to the client, the response containing a result based on the processing of the encoded speech utterance;
  
  wherein the computer-implemented method further comprises sending a script to the client that includes a universal resource locator associated with the speech recognition application, the script includes an instruction that instructs the client to perform a task based on the result received from the server.
- View Dependent Claims (25, 26, 27, 28, 29, 30)
- - 25. The computer-implemented method of claim 24 in which the dialog file comprises a VoiceXML document.
  - 26. The computer-implemented method of claim 24 in which the encoded speech utterance comprises an MIME-encoded message.
  - 27. The computer-implemented method of claim 24 in which the response is sent from the server to the client according to the Hypertext Transfer Protocol.
  - 28. The computer-implemented method of claim 24 in which processing the encoded speech utterance includes applying a speech recognition application to process the encoded speech.
  - 29. The computer-implemented method of claim 24 in which processing the encoded speech utterance includes applying a speaker verification application to process the encoded speech.
  - 30. The computer-implemented method of claim 24 in which the result characterizes a similarity of voice characteristics.

31. A method comprising:
- receiving a speech utterance from a user at a speech browser;
  
  encoding the speech utterance to generate an encoded speech utterance at the speech browser;
  
  sending a request from the speech browser through a network to a server in a hyper text communication session, the request containing the encoded speech utterance and an identifier to a speech recognition application at the server used to process the encoded speech utterance by performing speech recognition on the speech utterance and obtaining recognition results based on the speech recognition; and
  
  receiving at the speech browser a response from the server that contains the recognition result based on the processing of the encoded speech utterance at the server.
- View Dependent Claims (32, 33)
- - 32. The method of claim 31 in which the request comprises an HTTP POST request.
  - 33. The method of claim 31 in which the hyper text communication session comprises a series of related HTTP requests and responses.

34. An apparatus comprising:
- means for receiving a speech utterance from a user and converting the speech utterance into a recording at a local computer;
  
  means for communicating between the local computer and a remote computer using a hyper text communication session;
  
  means for sending the recording of the speech utterance from the local computer to the remote computer in the session;
  
  means for receiving at the local computer a result from the remote computer, the result based on a processing of the recording at the remote computer, wherein the processing of the recording at the remote computer includes analyzing the speech utterance using a speech recognition application at the remote computer; and
  
  means for using the local computer to receive a script that includes a universal resource locator of the speech recognition application that is run by the remote computer to process the recording, the script includes an instruction that instructs the local computer to perform a task based on the result received from the remote computer.
- View Dependent Claims (35, 36, 37, 38, 39)
- - 35. The apparatus of claim 34 wherein the means for sending the recording comprises means for converting the recording to a Multipurpose Internet Mail Extension (MIME)-encoded message.
  - 36. The apparatus of claim 35 wherein the means for sending the recording comprises means for sending the MIME-encoded message using a Hypertext Transfer Protocol (HTTP) POST command.
  - 37. The apparatus of claim 34, further comprising means for receiving a script that includes an instruction that instructs the apparatus to prompt a user.
  - 38. The apparatus of claim 37 wherein the script includes extensible markup language tags.
  - 39. The apparatus of claim 38, further comprising means for interpreting the extensible markup language tags.

40. Computer-readable media comprising software for causing a computer system to perform functions comprising:
- recording a speech utterance from a user using a local computer;
  
  communicating between the local computer and a remote computer using a hyper text communication session, includingsending the recording of a speech utterance from the local computer to the remote computer in the session;
  
  receiving a result from the remote computer, the result based on a processing of the recording at the remote computer, wherein the processing includes analyzing the speech utterance in the recording using a speech recognition application at the remote computer; and
  
  using the local computer to receive a script that includes a universal resource locator associated with the speech recognition application that is run by the remote computer to process the recording, the script includes an instruction that instructs the local computer to perform a task based on the result received from the remote computer.

41. Computer-readable media comprising software for causing a computer system to perform functions comprising:
- receiving a dialog file at a web browser;
  
  controlling a speech dialog using the received dialog file;
  
  receiving a speech utterance from a user as part of the speech dialog;
  
  encoding the speech utterance to generate an encoded speech utterance;
  
  sending a request from the web browser to a web server, the request containing the encoded speech utterance;
  
  receiving a response from the web server, the response containing a result based on a processing of the encoded speech utterance including analyzing the encoded speech utterance using a speech recognition application at the web server; and
  
  using the web browser to receive a script that includes a universal resource locator associated with the speech recognition application, the script includes an instruction that instructs the web browser to perform a task based on the result received from the web server.

42. Computer-readable media comprising software for causing a computer system to perform functions comprising:
- sending a dialog file from a server to a client, the dialog file containing statements for processing by the client to control a speech dialog;
  
  receiving at a server a request from the client in response to the client processing one of the statements, the request containing an encoded speech utterance;
  
  processing the encoded speech utterance by using the server including analyzing the encoded speech utterance using a speech recognition application at the server;
  
  sending a response from the server to the client, the response containing a result based on processing of the encoded speech utterance; and
  
  sending a script from the server to the client, the script includes a universal resource locator associated with the speech recognition application and an instruction that instructs the client to perform a task based on the result received from the server.

43. Computer-readable media comprising software for causing a computer system to perform functions comprising:
- receiving a speech utterance from a user at a speech browser;
  
  encoding the speech utterance to generate an encoded speech utterance at the speech browser;
  
  sending a request from the speech browser through a network to a server in a hyper text communication session, the request containing the encoded speech utterance and an identifier to an application at the server used to process the speech utterance by performing speech recognition on the speech utterance and obtaining recognition results based on the speech recognition; and
  
  receiving a response at the speech browser from the server that contains the recognition result based on the processing of the encoded speech utterance.

44. An apparatus comprising:
- an input port to receive a speech utterance from a user as part of a speech dialog; and
  
  a web browser to receive a dialog file and control the speech dialog using the received dialog file, the web browser being configured to encode the speech utterance to generate an encoded speech utterance, to send a request containing the encoded speech utterance to a web server, and to receive a response from the web server, where the response a speech recognition result based on a speech recognition processing of the encoded speech utterance at the web server;
  
  wherein the web browser receives a script that includes a universal resource locator associated with the web server, the script includes an instruction that instructs the web browser to perform a task based on the speech recognition result received from the web server.

45. A server computer comprising:
- a storage to store a dialog file containing statements for processing by a client to control a speech dialog;
  
  an input/output port to send the dialog file to the client and to receive a request using a hyper text communication session from the client in response to the client processing one of the statements, the request containing an encoded speech utterance; and
  
  a speech recognition application to process the encoded speech utterance and to send a response containing a result based on the speech recognition processing of the encoded speech utterance to the client;
  
  wherein the server computer sends a script to client, the script includes a universal resource locator associated with the server computer and an instruction that instructs the client to perform a task based on the result.

46. A voice-enabled device comprising:
- an input/output interface to receive a speech utterance from a user;
  
  a voice-enabled application at a speech browser configured to encode the speech utterance to generate an encoded speech utterance and send a request from the speech browser through a network to a server in a hyper text communication session, the request containing the encoded speech utterance and an identifier to a speech recognition application at the server used to process the speech utterance, the voice-enabled application further configured to receive a response from the server that contains a speech recognition result based on a processing of the encoded speech utterance at the server and to perform a function at the speech browser based on the speech recognition result;
  
  wherein the voice-enabled application receives a script that includes a universal resource locator associated with the speech recognition application, the script includes an instruction that instructs the voice-enabled application to perform a task based on the speech recognition result received from the server.

47. A telephone call center comprising:
- a call manager to receive a speech utterance of a user transmitted through a telephone network, the call manager configured to determine a telephone number dialed by the user to connect the user to the telephone call center, the call manager further configured determine a universal resource locator (URL) based on the telephone number; and
  
  a client computer to run a speech browser application that performs the functions of;
  
  retrieving a script based on the URL provided by the call manager,encoding the speech utterance into an encoded speech utterance,sending a request through a network to a server in a hyper text communication session, the request containing the encoded speech utterance and an identifier to a speech recognition application at the server used to process the speech utterance;
  
  receiving a response from the server that contains a recognition result based on a speech processing of the encoded speech utterance; and
  
  using the client computer to receive a script that includes a universal resource locator associated with the speech recognition application, the script includes an instruction that instructs the client computer to perform a task based on the recognition result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Inventors
Corriveau, Francois, Lenir, Philip, Hunt, Andrew
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/309,794
Publication Number

US 20040107107A1
Time in Patent Office

2,436 Days
Field of Search

None
US Class Current

704/270.1
CPC Class Codes

G06F 21/32   using biometric data, e.g. ...

G10L 15/30   Distributed recognition, e....

G10L 17/22   Interactive procedures; Man...

Speech recognition and speaker verification using distributed speech processing

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition and speaker verification using distributed speech processing

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links