LEARNING INTENDED USER ACTIONS

US 20180329679A1
Filed: 07/24/2018
Published: 11/15/2018
Est. Priority Date: 02/16/2015
Status: Active Grant

First Claim

Patent Images

1. A system, comprisinga microphone and camera for receiving user utterances indicative of user commands and associated user gestures for the user utterances;

a hardware-based recognizer for parsing sample utterances and the user utterances into verb parts and noun parts, and recognizing the user utterances and the associated user gestures by comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances; and

a user command selective execution device for selectively performing a given one of the user commands responsive to a recognition result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are provided. The method includes receiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances. The method further includes parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts. The method also includes recognizing, by a hardware-based recognizer, the user utterances and the associated user gestures based on the sample utterances and descriptions of associated supporting gestures for the sample utterances. The recognizing step includes comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances. The method additionally includes selectively performing a given one of the user commands responsive to a recognition result.

Citations

20 Claims

1. A system, comprisinga microphone and camera for receiving user utterances indicative of user commands and associated user gestures for the user utterances;
- a hardware-based recognizer for parsing sample utterances and the user utterances into verb parts and noun parts, and recognizing the user utterances and the associated user gestures by comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances; and
  
  a user command selective execution device for selectively performing a given one of the user commands responsive to a recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the recognizing the user utterances and the associated user gestures is based on the sample utterances and descriptions of associated supporting gestures for the sample utterances.
  - 3. The system of claim 1, wherein the user speech and associated gesture recognizer forms triples of a verb, a noun, and a gesture, and compares the verb and the noun to the gesture as a pair and individually.
  - 4. The method of claim 1, further comprising a user command learning device for learning from multiple recognition sessions by acquiring user accepted examples and user rejected examples of the user utterances and the associated user gestures, and wherein the user command selective execution device selectively performs a given one of the user commands responsive to the user accepted examples and the user rejected examples.
  - 5. The system of claim 4, further comprising a confidence indication generator for generating respective confidence values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of positive examples and a number of negative examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands.
  - 6. The system of claim 4, further comprising an error indication generator for generating respective error values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of user accepted examples and a number of user rejected examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands.
  - 7. The system of claim 5, wherein the hardware-based recognizer is further configured for recognizing multiple possible intended actions, and for arbitrating between the possible intended actions based on the respective confidence values corresponding thereto.
  - 8. The system of claim 4, wherein the user command learning device is further configured for acquiring at least one of user spoken words and user performed gestures potentially applicable to one or more of the user commands, for storing in a memory device as at least one of new sample utterances and new descriptions of associated sample gestures for the new sample utterances.

9. A computer program product for recognizing intended user actions, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising:
- receiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances;
  
  parsing sample utterances and the user utterances into verb parts and noun parts, and recognizing the user utterances and the associated user gestures by comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances; and
  
  selectively performing a given one of the user commands responsive to a recognition result.
- View Dependent Claims (10, 11, 12, 13, 15, 16, 17, 18)
- - 10. The computer program product of claim 9, wherein the recognizing the user utterances and the associated user gestures is based on the sample utterances and descriptions of associated supporting gestures for the sample utterances.
  - 11. The computer program product of claim 9, wherein the method further comprises forming triples of a verb, a noun, and a gesture, and sequentially comparing the verb and the noun to the gesture as a pair and individually.
  - 12. The computer program product of claim 9, wherein the method further comprises learning from multiple recognition sessions by acquiring user accepted examples and user rejected examples of the user utterances and the associated user gestures, and selectively performing a given one of the user commands responsive to the user accepted examples and the user rejected examples.
  - 13. The computer program product of claim 12, wherein the method further comprises generating respective confidence values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of positive examples and a number of negative examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands.
  - 15. The computer program product of claim 11, wherein the given one of the user commands is selectively performed in an absence of one of the verb or the noun corresponding thereto, responsive to a match between an existing one of the verb or the noun and a lack of contrary intent evidence that the existing one of the verb or the noun is unrelated to the gesture.
  - 16. The computer program product of claim 13, wherein said recognizing step comprises recognizing multiple possible intended actions, and the method further comprises arbitrating between the possible intended actions based on the respective confidence values corresponding thereto.
  - 17. The computer program product of claim 12, wherein said learning step comprises acquiring at least one of user spoken words and user performed gestures potentially applicable to one or more of the user commands, for storing in a memory device as at least one of new sample utterances and new descriptions of associated sample gestures for the new sample utterances.
  - 18. The computer program product of claim 12, wherein said learning step:
    - acquires a user accepted example of at least one particular user utterance and at least one particular associated user gesture responsive to the user allowing a particular one of the user commands, represented by the at least one particular user utterance and the at least one particular associated user gesture, to be ultimately performed; and
      
      acquires a user rejected example of the at least one particular user utterance and the at least one particular associated user gesture responsive to the user preventing or undoing the particular one of the user commands represented by the at least one particular user utterance and the at least one particular associated user gesture.

14. The computer program product of claim 22, wherein the method further comprises generating respective error values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of user accepted examples and a number of user rejected examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands.

19. A system, comprising:
- a processor operatively coupled to a computer-readable storage medium, the processor being configured for;
  
  receiving user utterances indicative of user commands and associated user gestures for the user utterances;
  
  parsing sample utterances and the user utterances into verb parts and noun parts, and recognizing the associated user gestures based on descriptions of associated supporting gestures for the sample utterance by sequentially comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances; and
  
  selectively performing a given one of the user commands responsive to a recognition result.
- View Dependent Claims (20)
- - 20. The system of claim 19, wherein the processor is further configured for tracking, using a statistical knowledge repository, words and word pairs used in conjunction with one or more recognized gestures, and determining a frequency of accepted and rejected system actions based on the tracking.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Lenchner, Jonathan, Venkataraman, Vinay

Granted Patent

US 10,656,909 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/005   Input arrangements through ...

G06F 3/017   Gesture based interaction, ...

G06F 3/0304   Detection arrangements usin...

G06F 3/167   Audio in a user interface, ...

G06N 20/00   Machine learning

G10L 15/04   Segmentation; Word boundary...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 2015/227   of the speaker; Human-fact...

LEARNING INTENDED USER ACTIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

LEARNING INTENDED USER ACTIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links