Voice recognition of commands extracted from user interface screen devices

US 9,858,039 B2
Filed: 01/28/2014
Issued: 01/02/2018
Est. Priority Date: 01/28/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

using a computing system having at least one processor to perform a process, the process comprising;

creating a voice command mapping in response to loading a user interface page at the computing system by;

identifying a markup language description of the user interface page loaded at the computing system, the identification of the markup language description occurring after loading the user interface page at the computing system; and

generating the voice command mapping for the user interface page loaded at the computing system, wherein the voice command mapping maps a recognized word or phrase associated with at least one operation to one or more voice commands by parsing the markup language description identified from the user interface page to identify at least one user interface object specified by the markup language description configured to perform at least one operation responsive to a keyboard, mouse, or pointing device, the parsing of the markup language description being performed after loading the user interface page at the computing system, and the parsing does not create a modified version of the user interface page, wherein the voice command mapping uses a hash map data structure to store a relationship between at least one respective word or phrase to the at least one operation;

processing an utterance in response to receiving the utterance at the computing system, the computing system displaying the user interface page, by;

converting the utterance into a text representation of the utterance;

determining a plurality of matches between the text representation of the utterance and multiple matching voice commands based on the voice command mapping for the user interface page loaded at the computing system; and

performing a confirmation of a single matching voice command from among the plurality of matches.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, system, and computer program product for human interface design. Embodiments proceed upon receiving a markup language description of user interface pages (e.g., HTML pages), then, without modifying the user interface page, parsing the markup language description to identify user interface objects configured to perform an operation responsive to a keyboard or mouse or pointing device. One or more mapping techniques serve to relate the parsed-out operation(s) to one or more voice commands. In some embodiments, the parser recognizes interface objects in forms such as a button, a textbox, a checkbox, or an option menu, and the voice commands correspond to an aspect that is displayed when rendering the interface object (e.g., a button label, a menu option, etc.). After receiving a user utterance, the utterance is converted into a text representation which in turn is mapped to voice commands that were parsed from the user interface page.

27 Citations

View as Search Results

20 Claims

1. A method comprising:
- using a computing system having at least one processor to perform a process, the process comprising;
  
  creating a voice command mapping in response to loading a user interface page at the computing system by;
  
  identifying a markup language description of the user interface page loaded at the computing system, the identification of the markup language description occurring after loading the user interface page at the computing system; and
  
  generating the voice command mapping for the user interface page loaded at the computing system, wherein the voice command mapping maps a recognized word or phrase associated with at least one operation to one or more voice commands by parsing the markup language description identified from the user interface page to identify at least one user interface object specified by the markup language description configured to perform at least one operation responsive to a keyboard, mouse, or pointing device, the parsing of the markup language description being performed after loading the user interface page at the computing system, and the parsing does not create a modified version of the user interface page, wherein the voice command mapping uses a hash map data structure to store a relationship between at least one respective word or phrase to the at least one operation;
  
  processing an utterance in response to receiving the utterance at the computing system, the computing system displaying the user interface page, by;
  
  converting the utterance into a text representation of the utterance;
  
  determining a plurality of matches between the text representation of the utterance and multiple matching voice commands based on the voice command mapping for the user interface page loaded at the computing system; and
  
  performing a confirmation of a single matching voice command from among the plurality of matches.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein a user interface object of the at least one user interface object is at least one of, a button, a textbox, or a checkbox.
  - 3. The method of claim 1, wherein a user interface object of the at least one user interface object comprises an option menu.
  - 4. The method of claim 1, wherein the markup language description comprises HTML.
  - 5. The method of claim 1, wherein the hash map data structure is built after a determination that the page permits voice navigation.
  - 6. The method of claim 1, wherein the at least one operation comprises two or more operations.
  - 7. The method of claim 6, wherein the confirmation of a single matching voice command from among the plurality of matches comprises disambiguation between the two or more operations.
  - 8. The method of claim 1, wherein the voice command mapping uses a phonetic dictionary.
  - 9. The method of claim 8, wherein the phonetic dictionary comprises words or phrases sorted in decreasing order of frequency of use.

10. A computer program product embodied in a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute a process, the process comprising:
- creating a voice command mapping in response to loading a user interface page at a computing system by;
  
  identifying a markup language description of the user interface page loaded at the computing system, the identification of the markup language description occurring after loading the user interface page at the computing system; and
  
  generating the voice command mapping for the user interface page loaded at the computing system, wherein the voice command mapping maps a recognized word or phrase associated with at least one operation to one or more voice commands by parsing the markup language description identified from the user interface page to identify at least one user interface object specified by the markup language description configured to perform at least one operation responsive to a keyboard, mouse, or pointing device, the parsing of the markup language description being performed after loading the user interface page at the computing system, and the parsing does not create a modified version of the user interface page, wherein the voice command mapping uses a hash map data structure to store a relationship between at least one respective word or phrase to the at least one operation;
  
  processing an utterance in response to receiving the utterance at the computing system, the computing system displaying the user interface page, by;
  
  converting the utterance into a text representation of the utterance;
  
  determining a plurality of matches between the text representation of the utterance and multiple matching voice commands based on the voice command mapping for the user interface page loaded at the computing system; and
  
  performing a confirmation of a single matching voice command from among the plurality of matches.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computer program product of claim 10, wherein a user interface object of the at least one user interface object is at least one of, a button, a textbox, or a checkbox.
  - 12. The computer program product of claim 10, wherein a user interface object of the at least one user interface object comprises an option menu.
  - 13. The computer program product of claim 10, wherein the markup language description comprises HTML.
  - 14. The computer program product of claim 10, wherein the hash map data structure is built after a determination that the page permits voice navigation.
  - 15. The computer program product of claim 10, wherein the at least one operation comprises two or more operations.
  - 16. The computer program product of claim 15, wherein the confirmation of a single matching voice command from among the plurality of matches comprises disambiguation between the two or more operations.
  - 17. The computer program product of claim 10, wherein the voice command mapping uses a phonetic dictionary.

18. A computer system comprising:
- a parser module for processing data, wherein the parser module is stored in memory, the parser module to identify a markup language description of a user interface page loaded at the computer system, identification of the markup language description occurring after loading the user interface page at the computer system, and the parser module to generate a voice command mapping for the user interface page loaded at the computer system in response to loading the user interface page at the computer system, wherein the voice command mapping maps a recognized word or phrase associated with at least one operation to one or more voice commands by parsing the markup language description identified from the user interface page to identify at least one user interface object specified by the markup language description configured to perform at least one operation responsive to a keyboard, mouse, or pointing device, the parsing of the markup language description being performed after loading the user interface page at the computer system, and the parsing does not create a modified version of the user interface page, wherein the voice command mapping uses a hash map data structure to store a relationship between at least one respective word or phrase to the at least one operation;
  
  a receiving module for processing data, wherein the receiving module is stored in memory, the receiving module to process an utterance in response to receiving an utterance at the computer system displaying the user interface page and to convert the utterance into a text representation of the utterance wherein the utterance is used to determine a plurality of matches between the text representation of the utterance and multiple matching voice commands based on the voice command mapping for the user interface page loaded at the computer system; and
  
  a confirmation module for processing data, wherein the receiving module is stored in memory, the confirmation module to perform a confirmation of a single matching voice command from among the plurality of matches.
- View Dependent Claims (19, 20)
- - 19. The computer system of claim 18, wherein a user interface object of the at least one user interface object is at least one of, a button, a textbox, or a checkbox.
  - 20. The computer system of claim 18, wherein the markup language description comprises HTML.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Kumar, Saurabh, Kowdeed, Srinivasa Rao, Kuppusamy, Kavin Kumar
Primary Examiner(s)
Pham, Linh K

Application Number

US14/166,806
Publication Number

US 20150212791A1
Time in Patent Office

1,435 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Voice recognition of commands extracted from user interface screen devices

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

27 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Voice recognition of commands extracted from user interface screen devices

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links