Personalized gesture recognition for user interaction with assistant systems

US 10,802,848 B2
Filed: 04/18/2019
Issued: 10/13/2020
Est. Priority Date: 04/20/2018
Status: Active Grant

First Claim

Patent Images

1. A method comprising, by one or more computing systems:

accessing, from a data store, a plurality of input tuples associated with a first user, wherein each input tuple comprises a gesture-input and a corresponding speech-input;

determining, by a natural-language understanding (NLU) module, a plurality of intents corresponding to the plurality of speech-inputs, respectively;

generating, for the plurality of gesture-inputs, a plurality of feature representations based on one or more machine-learning models;

determining a plurality of gesture identifiers for the plurality of gesture-inputs, respectively, based on their respective feature representations;

associating the plurality of intents with the plurality of gesture identifiers, respectively; and

training, for the first user, a personalized gesture-classification model based on the plurality of feature representations of their respective gesture-inputs and the associations between the plurality of intents and their respective gesture identifiers.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a method includes accessing a plurality of input tuples associated with a first user from a data store, wherein each input tuple comprises a gesture-input and a corresponding speech-input, determining a plurality of intents corresponding to the plurality of speech-inputs, respectively, by a natural-language understanding (NLU) module, generating a plurality of feature representations for the plurality of gesture-inputs based on one or more machine-learning models, determining a plurality of gesture identifiers for the plurality of gesture-inputs, respectively, based on their respective feature representations, associating the plurality of intents with the plurality of gesture identifiers, respectively, and training a personalized gesture-classification model for the first user based on the plurality of feature representations of their respective gesture-inputs and the associations between the plurality of intents and their respective gesture identifiers.

65 Citations

View as Search Results

20 Claims

1. A method comprising, by one or more computing systems:
- accessing, from a data store, a plurality of input tuples associated with a first user, wherein each input tuple comprises a gesture-input and a corresponding speech-input;
  
  determining, by a natural-language understanding (NLU) module, a plurality of intents corresponding to the plurality of speech-inputs, respectively;
  
  generating, for the plurality of gesture-inputs, a plurality of feature representations based on one or more machine-learning models;
  
  determining a plurality of gesture identifiers for the plurality of gesture-inputs, respectively, based on their respective feature representations;
  
  associating the plurality of intents with the plurality of gesture identifiers, respectively; and
  
  training, for the first user, a personalized gesture-classification model based on the plurality of feature representations of their respective gesture-inputs and the associations between the plurality of intents and their respective gesture identifiers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further comprising:
    - accessing, from the data store, a general gesture-classification model corresponding to a general user population, wherein training the personalized gesture-classification model is further based on the general gesture-classification model.
  - 3. The method of claim 2, wherein the general gesture-classification model is trained based on a plurality of gesture-inputs from the general user population.
  - 4. The method of claim 1, further comprising:
    - generating, by one or more automatic speech recognition (ASR) modules, a plurality of text-inputs for the plurality of speech-inputs, respectively.
  - 5. The method of claim 4, wherein determining the plurality of intents corresponding to the plurality of speech-inputs, respectively, is based on the plurality of text-inputs of the respective speech-inputs.
  - 6. The method of claim 1, wherein the one or more machine-learning models are based on one or more of a neural network model or a long-short term memory (LSTM) model.
  - 7. The method of claim 1, wherein the personalized gesture-classification model is based on convolutional neural networks.
  - 8. The method of claim 1, wherein generating each feature representation for each gesture-input comprises:
    - dividing the gesture-input into one or more components; and
      
      modeling the one or more components into the feature representation for the gesture-input.
  - 9. The method of claim 1, wherein generating each feature representation for each gesture-input comprises:
    - determining temporal information associated with the gesture-input; and
      
      modeling the temporal information into the feature representation for the gesture-input.
  - 10. The method of claim 1, further comprising:
    - receiving, from a client system associated with the first user, a new gesture-input from the first user; and
      
      determining, for the new gesture-input, an intent corresponding to the new gesture-input based on the personalized gesture-classification model.
  - 11. The method of claim 10, further comprising:
    - executing one or more tasks based on the determined intent.
  - 12. The method of claim 1, wherein training the personalized gesture-classification model is further based on user feedback data from the first user.

13. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
- access, from a data store, a plurality of input tuples associated with a first user, wherein each input tuple comprises a gesture-input and a corresponding speech-input;
  
  determine, by a natural-language understanding (NLU) module, a plurality of intents corresponding to the plurality of speech-inputs, respectively;
  
  generate, for the plurality of gesture-inputs, a plurality of feature representations based on one or more machine-learning models;
  
  determine a plurality of gesture identifiers for the plurality of gesture-inputs, respectively, based on their respective feature representations;
  
  associate the plurality of intents with the plurality of gesture identifiers, respectively; and
  
  train, for the first user, a personalized gesture-classification model based on the plurality of feature representations of their respective gesture-inputs and the associations between the plurality of intents and their respective gesture identifiers.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The media of claim 13, wherein the software is further operable when executed to:
    - access, from the data store, a general gesture-classification model corresponding to a general user population, wherein training the personalized gesture-classification model is further based on the general gesture-classification model.
  - 15. The media of claim 14, wherein the general gesture-classification model is trained based on a plurality of gesture-inputs from the general user population.
  - 16. The media of claim 13, wherein the software is further operable when executed to:
    - generate, by one or more automatic speech recognition (ASR) modules, a plurality of text-inputs for the plurality of speech-inputs, respectively.
  - 17. The media of claim 16, wherein determining the plurality of intents corresponding to the plurality of speech-inputs, respectively, is based on the plurality of text-inputs of the respective speech-inputs.
  - 18. The media of claim 13, wherein the one or more machine-learning models are based on one or more of a neural network model or a long-short term memory (LSTM) model.
  - 19. The media of claim 13, wherein the personalized gesture-classification model is based on convolutional neural networks.

20. A system comprising:
- one or more processors; and
  
  a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to;
  
  access, from a data store, a plurality of input tuples associated with a first user, wherein each input tuple comprises a gesture-input and a corresponding speech-input;
  
  determine, by a natural-language understanding (NLU) module, a plurality of intents corresponding to the plurality of speech-inputs, respectively;
  
  generate, for the plurality of gesture-inputs, a plurality of feature representations based on one or more machine-learning models;
  
  determine a plurality of gesture identifiers for the plurality of gesture-inputs, respectively, based on their respective feature representations;
  
  associate the plurality of intents with the plurality of gesture identifiers, respectively; and
  
  train, for the first user, a personalized gesture-classification model based on the plurality of feature representations of their respective gesture-inputs and the associations between the plurality of intents and their respective gesture identifiers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
META Platforms Technologies LLC (Meta Platforms, Inc. (f/k/a Facebook, Inc.))
Original Assignee
Facebook Technologies, LLC (Meta Platforms, Inc. (f/k/a Facebook, Inc.))
Inventors
Liu, Xiaohu, Crook, Paul Anthony, Penov, Francislav P., Subba, Rajen
Primary Examiner(s)
Bolotin, Dmitriy

Application Number

US16/388,130
Publication Number

US 20190324553A1
Time in Patent Office

544 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/176   Support for shared access t...

G06F 16/2255   Hash tables

G06F 16/2365   Ensuring data consistency a...

G06F 16/243   Natural language query form...

G06F 16/24552   Database cache management

G06F 16/24575   using context

G06F 16/24578   using ranking

G06F 16/248   Presentation of query results

G06F 16/3322   using system suggestions G0...

G06F 16/3323   using document space presen...

G06F 16/3329   Natural language query form...

G06F 16/3344   using natural language anal...

G06F 16/338   Presentation of query results

G06F 16/90332   Natural language query form...

G06F 16/90335   Query processing

G06F 16/9038   Presentation of query results

G06F 16/904   Browsing; Visualisation the...

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9532   Query formulation

G06F 16/9535   Search customisation based ...

G06F 18/2411 : based on the proximity to a...

G06F 2216/13 : Prefetching

G06F 3/011 : Arrangements for interactio...

G06F 3/013 : Eye tracking input arrangem...

G06F 3/017 : Gesture based interaction, ...

G06F 3/167 : Audio in a user interface, ...

G06F 40/205 : Parsing

G06F 40/274 : Converting codes to words; ...

G06F 40/295 : Named entity recognition

G06F 40/30 : Semantic analysis

G06F 40/40 : Processing or translation o...

G06F 7/14 : Merging, i.e. combining at ...

G06F 8/31 : Programming languages or pr...

G06F 9/44505 : Configuring for program ini...

G06F 9/4451 : User profiles; Roaming

G06F 9/453 : Help systems

G06N 20/00 : Machine learning

G06N 3/006 : based on simulated virtual ...

G06N 3/045 : Combinations of networks

G06N 3/08 : Learning methods

G06N 5/022 : Knowledge engineering; Know...

G06N 5/027 : Frames

G06N 7/01 : Probabilistic graphical mod...

G06Q 10/00 : Administration; Management

G06Q 50/01 : Social networking

G06V 10/764 : using classification, e.g. ...

G06V 10/82 : using neural networks

G06V 20/10 : Terrestrial scenes scenes u...

G06V 40/28 : Recognition of hand or arm ...

G10L 13/00 : Speech synthesis; Text to s...

G10L 13/04 : Details of speech synthesis...

G10L 15/02 : Feature extraction for spee...

G10L 15/063 : Training

G10L 15/07 : to the speaker

G10L 15/16 : using artificial neural net...

G10L 15/1815 : Semantic context, e.g. disa...

G10L 15/1822 : Parsing for meaning underst...

G10L 15/183 : using context dependencies,...

G10L 15/187 : Phonemic context, e.g. pron...

G10L 15/22 : Procedures used during a sp...

G10L 15/26 : Speech to text systems G10L...

G10L 17/00 : Speaker identification or v...

G10L 17/06 : Decision making techniques;...

G10L 17/22 : Interactive procedures; Man...

G10L 2015/223 : Execution procedure of a sp...

G10L 2015/225 : Feedback of the input speech

H04L 12/2816 : Controlling appliance servi...

H04L 41/20 : Network management software...

H04L 41/22 : comprising specially adapte...

H04L 43/0882 : Utilisation of link capacity

H04L 43/0894 : Packet rate

H04L 5/02 : Channels characterised by t...

H04L 51/02 : using automatic reactions o...

H04L 51/046 : Interoperability with other...

H04L 51/18 : Commands or executable codes

H04L 51/216 : Handling conversation histo...

H04L 51/222 : using geographical location...

H04L 51/52 : for supporting social netwo...

H04L 63/102 : Entity profiles

H04L 67/10 : in which an application is ...

H04L 67/306 : User profiles

H04L 67/53 : using third party service p...

H04L 67/535 : Tracking the activity of th...

H04L 67/5651 : Reducing the amount or size...

H04L 67/75 : Indicating network or usage...

H04W 12/08 : Access security

View All

Personalized gesture recognition for user interaction with assistant systems

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

65 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Personalized gesture recognition for user interaction with assistant systems

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

65 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links