Method and apparatus for voice-enabling an application

US 8,768,711 B2
Filed: 06/17/2004
Issued: 07/01/2014
Est. Priority Date: 06/17/2004
Status: Active Grant

First Claim

Patent Images

1. A method of voice-enabling an application comprising a visual browser that lacks the ability to process audible input and/or output, the method comprising:

dynamically identifying, via the application comprising the visual browser and based on a current state of the application, one or more commands and/or controls that are used to interact with the visual browser via graphic input and that are not responsive to speech;

generating, via the application comprising the visual browser, at least one markup language fragment specifying a voice grammar corresponding to the identified one or more commands and/or controls that are not responsive to speech;

instantiating, by the application and through a voice library of voice markup language functions, an interpreter by calling at least one function in the voice library via a library application programming interface (API) through which the visual browser and the voice library can communicate;

providing the at least one markup language fragment from the application that instantiated the interpreter to the interpreter for use in recognizing speech;

receiving, via the application, a speech input from a user;

receiving, via the application from the interpreter, an event specifying at least one of the identified one or more commands and/or controls, generated as a result of matching, by the interpreter, the speech input with the voice grammar specified by the at least one markup language fragment to resolve the speech input to the at least one of the identified one or more commands and/or controls so that the event includes at least one attribute specifying a semantic interpretation of the speech input thus rendering the at least one of the identified one or more commands and/or controls responsive to speech; and

interpreting the event via the application.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of voice-enabling an application for command and control and content navigation can include the application dynamically generating a markup language fragment specifying a command and control and content navigation grammar for the application, instantiating an interpreter from a voice library, and providing the markup language fragment to the interpreter. The method also can include the interpreter processing a speech input using the command and control and content navigation grammar specified by the markup language fragment and providing an event to the application indicating an instruction representative of the speech input.

62 Citations

View as Search Results

14 Claims

1. A method of voice-enabling an application comprising a visual browser that lacks the ability to process audible input and/or output, the method comprising:
- dynamically identifying, via the application comprising the visual browser and based on a current state of the application, one or more commands and/or controls that are used to interact with the visual browser via graphic input and that are not responsive to speech;
  
  generating, via the application comprising the visual browser, at least one markup language fragment specifying a voice grammar corresponding to the identified one or more commands and/or controls that are not responsive to speech;
  
  instantiating, by the application and through a voice library of voice markup language functions, an interpreter by calling at least one function in the voice library via a library application programming interface (API) through which the visual browser and the voice library can communicate;
  
  providing the at least one markup language fragment from the application that instantiated the interpreter to the interpreter for use in recognizing speech;
  
  receiving, via the application, a speech input from a user;
  
  receiving, via the application from the interpreter, an event specifying at least one of the identified one or more commands and/or controls, generated as a result of matching, by the interpreter, the speech input with the voice grammar specified by the at least one markup language fragment to resolve the speech input to the at least one of the identified one or more commands and/or controls so that the event includes at least one attribute specifying a semantic interpretation of the speech input thus rendering the at least one of the identified one or more commands and/or controls responsive to speech; and
  
  interpreting the event via the application.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the at least one of the identified one or more commands and/or controls specified by the event provided to the visual browser corresponds to a menu command.
  - 3. The method of claim 1, further comprising:
    - determining an identity of a particular device from a plurality of devices using the visual browser; and
      
      substituting the identity for a command object within the grammar specified by the at least one markup language fragment.
  - 4. The method of claim 3, wherein the event provided to the visual browser specifies the substituted identity of the particular device.
  - 5. The method of claim 1, further comprising:
    - generating, by the interpreter instantiated by the visual browser, the at least one of the identified one or more commands and/or controls specified by the event to be provided to the visual browser; and
      
      catching the event using a handler specified by the at least one markup language fragment.
  - 6. The method of claim 5, further comprising interpreting, by the visual browser, the event according to an event hierarchy specified by the grammar specified by the at least one markup language fragment.
  - 7. The method of claim 1, said step of processing the speech input occurs substantially concurrently with rendering content within the visual browser.

8. A non-transitory machine readable storage having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform a method of voice-enabling an application comprising a visual browser that lacks the ability to process audible input and/or output, the method comprising steps of:
- dynamically identifying, via the application comprising the visual browser and based on a current state of the application, one or more commands and/or controls that are used to interact with the visual browser via graphic input and that are not responsive to speech;
  
  generating, via the application comprising the visual browser, at least one markup language fragment specifying a voice grammar corresponding to the identified one or more commands and/or controls that are not responsive to speech;
  
  instantiating, by the application and through a voice library of voice markup language functions, an interpreter, by calling at least one function in the voice library via a library application programming interface (API) through which the visual browser and the voice library can communicate;
  
  providing the at least one markup language fragment from the application that instantiated the interpreter to the interpreter for use in recognizing speech;
  
  receiving, via the application, a speech input from a user;
  
  receiving, via the application from the interpreter, an event specifying the at least one of the identified one or more commands and/or controls generated as a result of matching the speech input with the voice grammar specified by the at least one markup and language fragment to resolve the speech input to the at least one of the identified one or more commands and/or controls so that the event includes at least one attribute specifying a semantic interpretation of the speech input thus rendering the at least one of the identified one or more commands and/or controls responsive to speech; and
  
  interpreting the event via the application.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory machine readable storage of claim 8, wherein the at least one of the identified one or more commands and/or controls specified by the event provided to the visual browser corresponds to a menu command.
  - 10. The non-transitory machine readable storage of claim 8, wherein the method of voice-enabling the visual browser further comprises:
    - determining an identity of a particular device from a plurality of devices that use the visual browser; and
      
      substituting the identity for a command object within a command and control grammar specified by the at least one markup language fragment.
  - 11. The non-transitory machine readable storage of claim 10, wherein the event provided to the visual browser specifies the substituted identity of the device.
  - 12. The non-transitory machine readable storage of claim 8, wherein the method of voice-enabling the visual browser further comprises:
    - generating, by the interpreter instantiated by the visual browser, the event to be provided to the visual browser; and
      
      catching the event using a handler specified by the at least one markup language fragment.
  - 13. The non-transitory machine readable storage of claim 12, further comprising interpreting, by the visual browser, the event according to an event hierarchy specified by the grammar specified by the at least one markup language fragment.
  - 14. The non-transitory machine readable storage of claim 8, wherein said step of processing the speech input occurs substantially concurrently with rendering content within the visual browser.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ativanichayaphong, Soonthorn, Cross, Charles W. Jr., Muschett, Brien H.
Primary Examiner(s)
Shah, Paras D

Application Number

US10/870,517
Publication Number

US 20050283367A1
Time in Patent Office

3,666 Days
Field of Search

704/270, 704/270.1, 704/258, 704/275, 704/231, 704/235, 704/246, 704/251, 715/234, 715/239, 379/88.01, 379/88.14, 379/88.17
US Class Current

704/275
CPC Class Codes

G06F 3/16   Sound input; Sound output s...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/228   of application context

H04M 3/4938   comprising a voice browser ...

Method and apparatus for voice-enabling an application

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

62 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for voice-enabling an application

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links