Distributed Internet Based Speech Recognition System With Natural Language Support

US 20060200353A1
Filed: 05/22/2006
Published: 09/07/2006
Est. Priority Date: 11/12/1999
Status: Active Grant

First Claim

Patent Images

1. A speech-enabled internet based computing system comprising:

a speech recognition engine configured to generate a recognized speech query from an utterance;

said speech recognition engine being further configured to distribute speech processing operations between a portable client device and a server device on a device-by-device basis, such that a plurality of portable client devices having differing computing capabilities can be supported;

wherein individual ones of said plurality of portable client devices can be configured to perform at least part of said speech processing operations to generate said recognized speech query;

a web page having a list of items, at least some of said list of items being selectable through a browser on said portable client device based on said recognized speech query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech-enabled internet based computing system includes a configurable speech recognition engine used for interacting with content on a web accessible page. The speech recognition engine is distributed across a client and server architecture, and is adaptive so that speech processing operations can be allocated as needed between the two. This allows for support for client devices having differing computing capabilities. Natural language operations can also be supported as desired. A user can thus interact with a web page and select items of interest using speech as a mode of input. Dynamic grammars can assist in the recognition operations to improve speed and comprehension.

172 Citations

View as Search Results

32 Claims

1. A speech-enabled internet based computing system comprising:
- a speech recognition engine configured to generate a recognized speech query from an utterance;
  
  said speech recognition engine being further configured to distribute speech processing operations between a portable client device and a server device on a device-by-device basis, such that a plurality of portable client devices having differing computing capabilities can be supported;
  
  wherein individual ones of said plurality of portable client devices can be configured to perform at least part of said speech processing operations to generate said recognized speech query;
  
  a web page having a list of items, at least some of said list of items being selectable through a browser on said portable client device based on said recognized speech query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The speech-enabled internet based computing system of claim 1, wherein said website is adapted so that the user can navigate and locate information of interest using said speech query.
  - 3. The speech-enabled internet based computing system of claim 1, wherein said list of items include products and/or services.
  - 4. The speech-enabled internet based computing system of claim 1, wherein said web page is implemented at least in part in HTML or as a Java applet.
  - 5. The speech-enabled internet based computing system of claim 1, wherein said speech-enabled internet based computing system is further adapted to respond to a speech query concerning said list of items by returning a text or speech articulated response.
  - 6. The speech-enabled internet based computing system of claim 1, wherein said speech-enabled internet based computing system is further adapted to interact on a real-time basis in response to one or more continuous speech queries.
  - 7. The speech-enabled internet based computing system of claim 1, wherein the speech-enabled internet based computing system also controls an interactive character agent presented to the user for assisting in handling said speech query.
  - 8. The speech-enabled internet based computing system of claim 7, wherein said interactive character agent is adapted to have configurable perception parameters based on characteristics of a portable device.
  - 9. The speech-enabled internet based computing system of claim 1 wherein respective speech processing operations to be performed by the client platform and the server computing system are specified by an initialization routine.
  - 10. The speech-enabled internet based computing system of claim 1 wherein the server device transfers speech related data for the web page using a hypertext transfer protocol (HTTP) and a format which includes an embedded NULL character.
  - 11. The speech-enabled internet based computing system of claim 1, wherein the speech recognition engine stores calibration configuration data pertaining to calibrating speech and silence components of a speech utterance for each portable device.
  - 12. The speech-enabled internet based computing system of claim 1 wherein the server device is further configured to perform a natural language processing operation on said recognized speech query to recognize a meaning of a sentence of words contained therein.

13. A system for enabling a browser program to interact with a website using speech utterances, the system comprising:
- a speech recognition engine configured to generate a recognized speech query from an utterance;
  
  said speech recognition engine being further configurable such that speech processing operations can be distributed between a client device and a server device as required to achieve real-time recognition of a speech query; and
  
  a natural language engine configured to determine a meaning of said recognized speech query and provide a first response thereto;
  
  a web page routine for presenting one or more web pages to the browser program, wherein data content for said one or more web pages is controlled by said recognized speech query and/or said first response of said natural language engine;
  
  wherein said recognized speech queries can be presented to both said natural language engine as well as to a text based query database for identifying a meaning of said recognized speech query, such that a second response can be provided by said database for at least some recognized speech queries prior to said first response of said natural language engine.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The system of claim 13, wherein said speech query is recognized by forming a concatenation of words and/or phrases derived from said speech query and using said concatenation as a search query for a database.
  - 15. The system of claim 13 wherein said speech recognition engine is also configured to dynamically change a speech recognition grammar based on input provided by a user to selections available within said web page.
  - 16. The system of claim 13 wherein multiple speech grammars are available and selectable within the web page, and such that speech input provided by the user for an item within the web page using a first grammar dynamically controls which one of a plurality of second grammars is loaded for speech recognition of subsequent speech input by the user.
  - 17. The system of claim 13 further including an electronic conversational agent adapted to interact with a user and mimic behavior of a human agent through a native language interactive real-time dialog session with the user.
  - 18. The system of claim 17, wherein said electronic conversational agent is configured to articulate suggestions to the user for appropriate speech queries.
  - 19. The system of claim 17, wherein said electronic conversational agent is adapted to have configurable perception parameters which are adjusted and tailored to said content pertaining to said list of items.
  - 20. The system of claim 17, wherein said server device causes said interactive character agent to respond in real-time whenever the user provides selected speech input.
  - 21. The system of claim 13, wherein the user can speak a help command while interacting with any web page maintained by the server device to cause an interactive character agent to appear.
  - 22. The system of claim 17, wherein the server device transfers speech related data for the web page using a hypertext transfer protocol (HTTP) and using a format which includes a predetermined NULL character.

23. A World-Wide Web (WWW) accessible natural language computing system comprising:
- a speech recognition engine configured to distribute speech processing operations between a client device and a server device for processing an utterance for a speech based query;
  
  said server device being configured to generate a recognized speech query using a grammar accessible to said server device;
  
  a natural language routine executing on the server device and configured to process said recognized speech query to generate a natural language result in real-time;
  
  a WWW page coupled to the server device and having a list of items, at least some of said list of items being selectable by a user based on said natural language result;
  
  a database coupled to the server device for storing predefined answers which correspond to content for said list of items on said WWW accessible page;
  
  wherein a grammar used to recognize said speech based query can be varied between utterances and loaded dynamically as needed to recognize utterances associated with said content.

24. A method of interacting with a web-connected server using a browser program, the method comprising the steps of:
- providing a distributed speech recognition engine configured to generate recognized speech queries from an utterance;
  
  wherein said distributed speech recognition engine can be further configured to permit partial or full recognition of said utterance at a client device and/or a server device;
  
  presenting one or more web pages to the browser program, such that data content for said one or more web pages transmitted to the browser program is controlled by said recognized speech query.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32)
- - 25. The method of claim 24 further including a step:
    - performing a natural language processing operation to compare a limited set of phrases extracted from said recognized speech query with a separate set of phrases extracted from predefined valid queries from users.
  - 26. The method of claim 24 further including a step:
    - providing an interactive electronic character who provides suggestions for queries which the user can articulate.
  - 27. The method of claim 26, further including a step:
    - configuring said interactive character agent to engage in a dialog of successive questions and answers with the user during an interactive session.
  - 28. The method of claim 24, further including a step:
    - presenting an interactive character agent to the user in real-time in response to a spoken help command presented while interacting with any web page maintained by the web-connected server.
  - 29. The method of claim 24, further including a step:
    - configuring said web page as a single page to a browser to allow a user to ask questions concerning any item identified in said database within said single page.
  - 30. The method of claim 25, further including a step:
    - forming a concatenation of words and/or phrases derived from said speech query and using said concatenation as a search query for a database.
  - 31. The method of claim 24, wherein the server device transfers speech related data for the web page using a hypertext transfer protocol (HTTP).
  - 32. The method of claim 24 further including a step:
    - dynamically changing a speech recognition grammar based on input provided by a user to selections available within said web page.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian M.

Granted Patent

US 7,203,646 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270.100
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G06F 40/289   Phrasal analysis, e.g. fini...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Distributed Internet Based Speech Recognition System With Natural Language Support

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

172 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed Internet Based Speech Recognition System With Natural Language Support

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

172 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links