Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources

US 6,801,604 B2
Filed: 06/25/2002
Issued: 10/05/2004
Est. Priority Date: 06/25/2001
Status: Expired due to Term

First Claim

Patent Images

1. A distributed speech processing system, comprising:

a conversational application and a task manager that abstracts from the conversational application, the discovery and remote control of audio I/O and speech engine services;

an audio I/O processing service, which is programmable by control messages generated by the task manager on behalf of the conversational application to provide audio I/O services for the conversational application; and

a speech engine service, which is programmable by control messages generated by the task manager on behalf of the conversational application to provide speech processing services for the conversational application.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

Citations

22 Claims

1. A distributed speech processing system, comprising:
- a conversational application and a task manager that abstracts from the conversational application, the discovery and remote control of audio I/O and speech engine services;
  
  an audio I/O processing service, which is programmable by control messages generated by the task manager on behalf of the conversational application to provide audio I/O services for the conversational application; and
  
  a speech engine service, which is programmable by control messages generated by the task manager on behalf of the conversational application to provide speech processing services for the conversational application.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The system of claim 1, wherein the audio I/O processing service and speech engine service comprise Web services.
  - 3. The system of claim 1, wherein the control messages are encoded using XML (eXtensible Markup Language) and wherein the control messages are exchanged using SOAP (Simple Object Access Protocol).
  - 4. The system of claim 1, wherein each service comprises interfaces that are described using WSDL (Web Services Description Language).
  - 5. The system of claim 4, wherein WSFL (web services flow language) or an extension thereof is used to dynamically configure the processing flow of the system.
  - 6. The system of claim 1, wherein the speech engine service provides one of automatic speech processing (ASR) services, text-to-speech (TTS) synthesis services, natural language understanding (NLU) services, and a combination thereof.
  - 7. The system of claim 1, wherein the audio I/O processing service provides speech encoding/decoding services, audio recording services, audio playback services, and a combination thereof.
  - 8. The system of claim 1, further comprising a load manager that dynamically allocates and assigns the services for the conversational application, based on control messages generated by the task manager on behalf of the conversational application.
  - 9. The system of claim 1, wherein the services are programmed to negotiate uplink and downlink audio codecs for generating RTP-based audio streams.
  - 10. The system of claim 1, wherein the speech engine services are dynamically allocated to the conversational application on one of a call, session, utterance and persistent basis.
  - 11. The system of claim 1, wherein the services are discoverable using UDDI (Universal Description, Discovery and Integration) or an extension thereof.
  - 12. The system of claim 1, wherein services provided by the speech engine service and audio I/O processing service are defined as a collection of ports.
  - 13. The system of claim 12, wherein types of ports comprise audio in, audio out, control in, and control out.
  - 14. The system of claim 1, wherein the audio I/O service comprises a gateway that connects audio streams from a network to the speech processing services.
  - 15. The system of claim 14, wherein the network comprises a PSTN (public switched telephone network).
  - 16. The system of claim 14, wherein the network comprises a VoIP (voice over IP) network.
  - 17. The system of claim 14, wherein the network comprises a wireless network.
  - 18. The system of claim 1, wherein the distributed speech processing system comprises an interactive voice response (IVR) system, and wherein the system further comprises a telephony gateway, wherein the telephony gateway is abstracted from the conversational application and wherein the telephony gateway receives and processes an incoming call to assign the call to a conversational application.

19. A speech processing web service, comprising:
- a listener for receiving and parsing control messages that are used for programming the speech processing web service, wherein the control message are encoded using XML (eXtensible Markup Language) and exchanged using SOAP (Simple Object Access Protocol);
  
  a business interface layer for exposing speech processing services offered by the web service, wherein the services are described and accessed using WSDL (web services description language); and
  
  a business logic layer for providing speech processing services, the speech processing services comprising one of automatic speech recognition, speech synthesis, natural language understanding, acoustic feature extraction, audio encoding/decoding, audio recording, audio playback, and any combination thereof.
- View Dependent Claims (20, 21)
- - 20. The speech processing web service of claim 19, wherein a service of the speech processing web service is dynamically allocated and assigned to a conversational application and programmed by the conversational application.
  - 21. The speech processing web service of claim 19, wherein the web service is advertised via UDDI.

22. A method for providing distributed speech processing, comprising the steps of:
- receiving an incoming call by a client application;
  
  assigning the call to an application having a task manager that is abstracted from the application for discovering and controlling speech processing services including audio I/O and speech engine services;
  
  the task manager generating a control message to a router/load manager for requesting a speech processing service on behalf of the application to service the incoming call;
  
  the router/load manager dynamically allocating a speech processing service to the application and providing an address of the allocated speech processing service to the task manager;
  
  the task manager generating a control message for dynamically programming the allocated speech service based on requirements of the application; and
  
  the application processing the incoming call using the programmed speech service.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Lubensky, David M., Sakrajda, Andrzej, Maes, Stephane H.
Primary Examiner(s)
FOSTER, ROLAND G

Application Number

US10/183,125
Publication Number

US 20030088421A1
Time in Patent Office

833 Days
Field of Search

379/88-1-, 379/88.16, 379/88.17, 379/88-3-, 704/270-275, 717/114, 717/116, 709/228-231, 709/201-203, 709/249, 709/250
US Class Current

379/88.17
CPC Class Codes

G10L 15/30   Distributed recognition, e....

H04M 2201/40   using speech recognition

H04M 3/42178   by downloading data to subs...

H04M 3/4938   comprising a voice browser ...

H04M 7/006   Networks other than PSTN/IS...

Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links