Filter architecture for rapid enablement of voice access to data repositories
First Claim
1. A communication architecture for rapid enablement of bi-directional communication between a communication device and a data repository operative with a browser, said browser interacting with an automatic speech recognizer (ASR) and a text-to-speech (TTS) engine, said architecture comprising:
- a voice application server operatively linked with said data repository and said browser, said voice application server comprising;
a. one or more filters for;
converting said voice inputs or validated voice inputs into an audio format for storage in said data repository, converting data in said audio format to said text inputs, and creating a spoken format from data in said data repository for rendering in said communication device, b. one or more dictionaries for correct pronunciation of said text inputs and identifying variations in said voice inputs, and c. one or more core voice modules providing core application functionality based upon interacting with said filters, dictionaries, and said browser to provide for rapid bi-directional communication to said data repository, said interaction between said core voice module(s) and said browser based upon a markup based specification language.
9 Assignments
0 Petitions
Accused Products
Abstract
A combination of a series of filters and a specification language is used to rapidly enable voice access to an external data repository. The filters include a data-to-voice filter operating on data values that flow from the data repository to a communication system, a voice-to-data filter capturing voice inputs and storing data related to such inputs in the data repository, an utterance filter normalizing spoken data inputs in a particular format, a validation filter checking values returned by the speech recognizer, and a data description filter creating spoken formats of data labels or descriptions. Also included is a pair of dictionaries: name pronunciation dictionary for ensuring correct pronunciation of words and a name grammar dictionary providing variations associated with a name.
27 Citations
18 Claims
-
1. A communication architecture for rapid enablement of bi-directional communication between a communication device and a data repository operative with a browser, said browser interacting with an automatic speech recognizer (ASR) and a text-to-speech (TTS) engine, said architecture comprising:
a voice application server operatively linked with said data repository and said browser, said voice application server comprising;
a. one or more filters for;
converting said voice inputs or validated voice inputs into an audio format for storage in said data repository, converting data in said audio format to said text inputs, and creating a spoken format from data in said data repository for rendering in said communication device,b. one or more dictionaries for correct pronunciation of said text inputs and identifying variations in said voice inputs, and c. one or more core voice modules providing core application functionality based upon interacting with said filters, dictionaries, and said browser to provide for rapid bi-directional communication to said data repository, said interaction between said core voice module(s) and said browser based upon a markup based specification language. - View Dependent Claims (2, 3)
-
4. A communication architecture for rapid enablement of bi-directional communication between a communication device and a data repository, said architecture comprising:
-
i. a browser interacting with;
a. an automatic speech recognizer (ASR) working in conjunction with a validation filter, said ASR receiving voice inputs from said communication device and validating said voice inputs using said validation filter, and b. a text-to-speech (TTS) engine working in conjunction with an utterance filter, said TTS engine converting text inputs to voice outputs for rendering in said browser and said utterance filter normalizing said text inputs in a format compatible for rendering in said communication device; and
ii. a voice application server operatively linked with said data repository and said browser, said voice application server comprising;
a. one or more filters for;
converting said voice inputs or validated voice inputs into an audio format for storage in said data repository, converting data in said audio format to said text inputs, and creating a spoken format from data in said data repository for rendering in said communication device,b. one or more dictionaries for correct pronunciation of said text inputs and identifying variations in said voice inputs, and c. one or more core voice modules providing core application functionality based upon interacting with said filters, dictionaries, and said browser to provide for rapid bi-directional communication to said data repository, said interaction between said core voice module(s) and said browser based upon a markup based specification language. - View Dependent Claims (5, 6)
-
-
7. A set of customizable filters and dictionaries for rapid enablement of bi-directional communication between a voice application server and a data repository, said voice application server interacting with a communication device via a browser, said filters and dictionaries interacting with a core voice module, said browser implementing an automatic speech recognition (ASR) engine and a text to speech (TTS) engine, said filters and dictionaries comprising:
-
i. a validation filter working in conjunction with said ASR, said ASR receiving voice inputs from said communication device and validating said voice inputs using said validation filter;
ii. an utterance filter working in conjunction with said TTS engine, said TTS engine converting text inputs to voice outputs for rendering in said browser and said utterance filter normalizing said text inputs in a format compatible for rendering in said communication device;
iii. a voice-to-data (V2D) filter for converting said voice inputs or validated voice inputs into an audio format for storage in said data repository;
iv. a data-to-voice (D2V) filter converting data in said audio format to said text inputs;
v. a data description filter creating a spoken format for data labels obtained from metadata in said data repository for rendering in said communication device;
vi. a pronunciation dictionary for correct pronunciation of said text inputs; and
vii. a name grammar and synonym dictionary for identifying variations in said voice inputs. - View Dependent Claims (8, 9, 10)
-
-
11. A method for customizing a set of filters and dictionaries for rapid enablement of bi-directional communication between a voice application server and a data repository, said voice application server interacting with a communication device via a browser, said filters and dictionaries interacting with a core voice module, said browser implementing an automatic speech recognition (ASR) engine and a text to speech (TTS) engine, said method comprising the steps of:
-
i. customizing a validation filter working in conjunction with said ASR, said ASR receiving voice inputs from said communication device and validating said voice inputs using said validation filter;
ii. customizing an utterance filter working in conjunction with said TTS engine, said TTS engine converting text inputs to voice outputs for rendering in said browser and said utterance filter normalizing said text inputs in a format compatible for rendering in said communication device;
iii. customizing a voice-to-data (V2D) filter converting said voice inputs or validated voice inputs into an audio format for storage in said data repository;
iv. customizing a data-to-voice (D2V) filter converting data in said audio format to said text inputs;
v. customizing a data description filter creating a spoken format for data labels obtained from metadata in said data repository for rendering in said communication device;
vi. customizing a pronunciation dictionary correcting pronunciation of said text inputs; and
vii. customizing a name grammar and synonym dictionary identifying variations in said voice inputs. - View Dependent Claims (12, 13, 14)
-
-
15. A method for providing rapid access to one or more data repositories based upon the interaction of one or more core voice modules with one or more filters and dictionaries, said data repositories, filters, and dictionaries distributed over a network, said method comprising:
-
in a transmission mode;
receiving a voice input via an external browser;
validating said voice input using said one or more filters;
identifying variations in said validated voice inputs via said one or more dictionaries;
converting said validated voice inputs and said variations in voice inputs to an audio format suitable for storage in said one or more data repositories, said conversion done via said one or more filters;
transmitting said converted validated voice inputs and variations in voice inputs in said audio format to said one or more data repositories;
in a receiving mode;
receiving, from said one or more data repositories, voice inputs in said audio format and converting said voice inputs to text inputs via said one or more filters;
correcting pronunciation of said text inputs via said one or more dictionaries;
normalizing said corrected text inputs for rendering in said external browser via said one or more filters, and converting said normalized text inputs to speech inputs and transmitting said speech inputs to said external browser via said one or more filters. - View Dependent Claims (16, 17)
-
-
18. An article of manufacture comprising computer usable medium embodied therein for customizing a set of filters and dictionaries for rapid enablement of bi-directional communication between a voice application server and a data repository, said voice application server interacting with a communication device via a browser, said filters and dictionaries interacting with a core voice module, said browser interacting with an automatic speech recognition (ASR) engine and a text to speech (TTS) engine, said medium comprising:
-
i. computer readable program code implementing an interface customizing a validation filter working in conjunction with said ASR, said ASR receiving voice inputs from said communication device and validating said voice inputs using said validation filter;
ii. computer readable program code implementing an interface customizing an utterance filter working in conjunction with said TTS engine, said TTS engine converting text inputs to voice outputs for rendering in said browser and said utterance filter normalizing said text inputs in a format compatible for rendering in said communication device;
iii. computer readable program code implementing an interface customizing a voice-to-data (V2D) filter converting said voice inputs or validated voice inputs into an audio format for storage in said data repository;
iv. computer readable program code implementing an interface customizing a data-to-voice (D2V) filter converting data in said audio format to said text inputs;
v. computer readable program code implementing an interface customizing a data description filter creating a spoken format from said from data in said data repository for rendering in said communication device;
vi. computer readable program code implementing an interface customizing a pronunciation dictionary correcting pronunciation of said text inputs; and
vii. computer readable program code implementing an interface customizing a name grammar and synonym dictionary identifying variations in said voice inputs.
-
Specification