Learning intended user actions

US 10,048,935 B2
Filed: 06/24/2015
Issued: 08/14/2018
Est. Priority Date: 02/16/2015
Status: Active Grant

First Claim

Patent Images

1. A method, comprisingreceiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances;

parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts;

recognizing, by a hardware-based recognizer, the user utterances and the associated user gestures based on the sample utterances and descriptions of associated supporting gestures for the sample utterances, said recognizing step comprising sequentially comparing each of the verb parts and each of the noun parts from the user utterances both individually and as pairs to the verb parts and the noun parts of the sample utterances;

tracking words and word pairs used in conjunction with one or more recognized gestures, and determining a frequency of accepted and rejected system actions based on the tracking; and

selectively performing a given one of the user commands responsive to a recognition result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are provided. The method includes receiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances. The method further includes parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts. The method also includes recognizing, by a hardware-based recognizer, the user utterances and the associated user gestures based on the sample utterances and descriptions of associated supporting gestures for the sample utterances. The recognizing step includes comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances. The method additionally includes selectively performing a given one of the user commands responsive to a recognition result.

Citations

17 Claims

1. A method, comprisingreceiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances;
- parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts;
  
  recognizing, by a hardware-based recognizer, the user utterances and the associated user gestures based on the sample utterances and descriptions of associated supporting gestures for the sample utterances, said recognizing step comprising sequentially comparing each of the verb parts and each of the noun parts from the user utterances both individually and as pairs to the verb parts and the noun parts of the sample utterances;
  
  tracking words and word pairs used in conjunction with one or more recognized gestures, and determining a frequency of accepted and rejected system actions based on the tracking; and
  
  selectively performing a given one of the user commands responsive to a recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein said recognizing step comprises forming triples of a verb, a noun, and a gesture from the user utterances of the user commands and the associated user gestures for the user utterances.
  - 3. The method of claim 2, wherein said recognizing step comprises:
    - at least one of,comparing at least one of the verb and the noun in a triple to at least one of a verb and a noun from one or more of the sample utterances, andcomparing at least one synonym of at least one of the verb and the noun from the one or more of the sample utterances; and
      
      determining whether the gesture in the triple fits a description of a corresponding one or more of the associated supporting gestures.
  - 4. The method of claim 2, wherein said recognizing step sequentially compares the verb and the noun to the gesture as a pair and individually.
  - 5. The method of claim 4, wherein the given one of the user commands is selectively performed in an absence of one of the verb or the noun corresponding thereto, responsive to a match between an existing one of the verb or the noun and a lack of contrary intent evidence that the existing one of the verb or the noun is unrelated to the gesture.
  - 6. The method of claim 1, further comprising:
    - learning from multiple recognition sessions by acquiring user accepted examples and user rejected examples of the user utterances and the associated user gestures; and
      
      selectively performing a given one of the user commands responsive to the user accepted examples and the user rejected examples.
  - 7. The method of claim 6, further comprising generating respective confidence values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of user accepted examples and a number of user rejected examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands.
  - 8. The method of claim 7, wherein said recognizing step comprises recognizing multiple possible intended actions, and the method further comprises arbitrating between the possible intended actions based on the respective confidence values corresponding thereto.
  - 9. The method of claim 6, further comprising generating respective error values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of user accepted examples and a number of user rejected examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands.
  - 10. The method of claim 6, wherein said learning step comprises acquiring at least one of user spoken words and user performed gestures potentially applicable to one or more of the user commands, for storing in a memory device as at least one of new sample utterances and new descriptions of associated sample gestures for the new sample utterances.
  - 11. The method of claim 6, wherein said learning step:
    - acquires a user accepted example of at least one particular user utterance and at least one particular associated user gesture responsive to the user allowing a particular one of the user commands, represented by the at least one particular user utterance and the at least one particular associated user gesture, to be ultimately performed; and
      
      acquires a user rejected example of the at least one particular user utterance and the at least one particular associated user gesture responsive to the user preventing or undoing the particular one of the user commands represented by the at least one particular user utterance and the at least one particular associated user gesture.
  - 12. The method of claim 6, wherein said learning step comprises generating statistical data to inform subsequent trials based on whether the user allows the given one of the user commands to proceed or intends to undo the given one of the user commands.
  - 13. The method of claim 6, wherein said learning step comprises learning one or more ways in which the user expresses an intention to perform a particular one of the user commands using a combination of user gestures and deixis.
  - 14. The method of claim 1, wherein the user commands comprise a command for moving content from a first location to a second location in a virtual environment.
  - 15. The method of claim 1, wherein at least one of the sample utterances comprises an in-between word, disposed between one of the verb parts and one of the noun parts, that is ignored during said recognizing step.
  - 16. The method of claim 1, wherein results of the tracking are stored in a statistical knowledge repository.
  - 17. The method of claim 1, wherein the selectively performing a given one of the user commands is based on at least one of evidence and user intent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Lenchner, Jonathan, Venkataraman, Vinay
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US14/748,296
Publication Number

US 20160239259A1
Time in Patent Office

1,147 Days
Field of Search

704275, 704 9, 704225, 704235, 707269
US Class Current
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/005   Input arrangements through ...

G06F 3/017   Gesture based interaction, ...

G06F 3/0304   Detection arrangements usin...

G06F 3/167   Audio in a user interface, ...

G06N 20/00   Machine learning

G10L 15/04   Segmentation; Word boundary...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 2015/227   of the speaker; Human-fact...

Learning intended user actions

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Learning intended user actions

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links