Method and system for gathering information by voice input

US 20020062216A1
Filed: 11/15/2001
Published: 05/23/2002
Est. Priority Date: 11/23/2000
Status: Active Grant

First Claim

Patent Images

1. A client system for gathering information via a network by voice input comprising:

a speech recognition engine installed on said client system;

a communication component installed on said client system configured to establish communications with a communication component on a server system which provides access to information stored on said server; and

a voice navigation component configured to provide information-dependent grammars from said server to said speech recognition engine via said communication component based on initial information loaded from said server to said client and configured to process results of said speech recognition system.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention allows users to navigate in a Web application or Web pages using a combination of point-and-click and voice-input. At each point of the dialog, the user can use the standard point-and-click interface to perform context-dependent actions, or alternatively, use speech input to navigate and operate in the global application context. The voice input uses a voice navigation component which builds an interface to the installed recognition and synthesis engines. The point-and-click and the voice navigation components can be loaded automatically with the initial Web page of a Web application. Grammars for recognizing vocabulary related to that Web application will be provided with the voice navigation component. The present invention combines the advantages of a context-dependent point-and-click user interface with those of a context-independent speech-input interface. Accordingly, a multi-modal interface can be provided to a Web browser.

124 Citations

29 Claims

1. A client system for gathering information via a network by voice input comprising:
- a speech recognition engine installed on said client system;
  
  a communication component installed on said client system configured to establish communications with a communication component on a server system which provides access to information stored on said server; and
  
  a voice navigation component configured to provide information-dependent grammars from said server to said speech recognition engine via said communication component based on initial information loaded from said server to said client and configured to process results of said speech recognition system.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system according to claim 1, wherein said speech recognition engine further includes a speech synthesis engine.
  - 3. The system according to claim 1, wherein said communication component on said client system and said voice navigation component form an integral component.
  - 4. The system according to claim 1, wherein said communication component on said client system is a browser.
  - 5. The system according to claim 1, wherein said voice navigation component is configured to locate, select, and initialize a speech recognition engine and a speech synthesis engine, and to enable and disable information-dependent grammars, and to process recognition results from said speech recognition engine.
  - 6. The system according to claim 1, wherein said network is an Intranet or an Internet.

7. A client-server system comprising:
- a client having a speech recognition engine and a speech synthesis engine, a client communication component configured to establish communications with a server, and a voice navigation component configured to provide information-dependent grammars from said server to said speech recognition engine via said client communication component based on initial information loaded from said server to said client and further configured to process results of said speech recognition engine; and
  
  a server having a server communication component configured to establish communication with a client, a voice navigation component configured to provide information-dependent grammars from said server to said speech recognition engine based on said initial information and further configured to process said results of said speech recognition engine, wherein said voice navigation component is available for download to and execution on said client, and said information-dependent grammars are available for download to and execution on said client.

8. A method for gathering information via a network by voice input comprising:
- loading an initial information from a server in a client using a communication component;
  
  automatically loading an information-dependent grammar in said client by using access information contained in said initial information and automatically providing said information-dependent grammar to a speech recognition engine disposed in said client for recognizing spoken words defined by said information-dependent grammar;
  
  sending results of said speech recognition engine to a voice navigation component; and
  
  processing results of said speech recognition engine in said voice navigation component.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 9. The method according to claim 8, wherein said information-dependent grammar defines possible input values of Web related Web pages, Web pages belonging to a Web application, or a related Web application.
  - 10. The method according to claim 8, wherein said initial information is a Web page made available by said server.
  - 11. The method according to claim 10, wherein said initial Web page contains a reference to said voice navigation component stored on said server.
  - 12. The method according to claim 11, wherein each initial Web page contains a reference to a point-and-click component stored on said server.
  - 13. The method according to claim 12, further comprising:
    - automatically identifying reference information in said initial Web page for accessing said voice navigation component and said point-and-click component, and automatically loading said voice navigation component and said point-and-click component from said server to said client using said reference information.
  - 14. The method according to claim 13, further comprising:
    - automatically associating said identified reference information with information-dependent grammars in said initiating Web page;
      
      automatically loading said identified information-dependent grammar in said client; and
      
      providing said speech recognition engine with access to said information-dependent grammar via said voice navigation component.
  - 15. The method according to claim 12, wherein said voice navigation component and said point-and-click component have a common user-interface including user selectable options.
  - 16. The method according to claim 15, wherein said voice navigation component user interface includes options for selecting information-dependent grammars stored on said server.
  - 17. The method according to claim 8, wherein said voice navigation component is configured to process a spoken response, a change of browser content, and an HTTP-request to load a new application, applet, or Web page.
  - 18. The method according to claim 8, wherein said voice navigation component is configured to redraw a content frame, to retrieve information from a server, and to initiate a server-based transaction from said speech recognition and synthesis engine.
  - 20. The machine-readable storage according to claim 19, wherein said information-dependent grammar defines possible input values of Web related Web pages, Web pages belonging to a Web application, or related Web applications.
  - 21. The machine-readable storage according to claim 19, wherein said initial information is a Web page made available by said server.
  - 22. The machine-readable storage according to claim 21, wherein said initial Web page contains a reference to said voice navigation component stored on said server.
  - 23. The machine-readable storage according to claim 22, wherein each initial Web page contains a reference to a point-and-click component stored on said server.
  - 24. The machine-readable storage according to claim 23, further comprising:
    - automatically identifying reference information in said initial Web page for accessing said voice navigation component and said point-and-click component and automatically loading said voice navigation component and said point-and-click component from said server to said client using said reference information.
  - 25. The machine-readable storage according to claim 24, further comprising:
    - automatically associating said identified reference information to information-dependent grammars in said initiating Web page;
      
      automatically loading said identified information-dependent grammar in said client; and
      
      providing said speech recognition engine with access to said information-dependent grammar via said voice navigation component.
  - 26. The machine-readable storage according to claim 23, wherein said voice navigation component and said point-and-click component have a common user-interface including user selectable options.
  - 27. The machine-readable storage according to claim 26, wherein said voice navigation component user interface includes options for selecting information-dependent grammars stored on said server.
  - 28. The machine-readable storage according to claim 23, wherein said voice navigation component is configured to process a spoken response, a change of browser content, and an HTTP-request to load a new application, applet, or Web page.
  - 29. The machine-readable storage according to claim 23, wherein said voice navigation component is configured to redraw a content frame, to retrieve information from a server, and to initiate a server-based transaction from said speech recognition and synthesis engine.

19. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- loading an initial information from a server in a client using a communication component;
  
  automatically loading an information-dependent grammar in said client by using access information contained in said initial information and automatically providing said information-dependent grammar to a speech recognition engine disposed in said client for recognizing spoken words defined by said information-dependent grammar;
  
  sending results of said speech recognition engine to a voice navigation component; and
  
  processing results of said speech recognition engine in said voice navigation component.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Schaeck, Thomas, Guenther, Carsten, Haenel, Walter

Granted Patent

US 7,146,323 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270.1
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 2015/223   Execution procedure of a sp...

H04M 2201/40   using speech recognition sp...

H04M 3/4938   comprising a voice browser ...

Method and system for gathering information by voice input

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

124 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

Method and system for gathering information by voice input

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

124 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others