DYNAMIC HYBRID MODELS FOR MULTIMODAL ANALYSIS

US 20160071024A1
Filed: 02/25/2015
Published: 03/10/2016
Est. Priority Date: 02/25/2014
Status: Active Grant

First Claim

Patent Images

1. A multimodal data analyzer comprising instructions embodied in one or more non-transitory machine accessible storage media, the multimodal data analyzer configured to cause a computing system comprising one or more computing devices to:

access a set of time-varying instances of multimodal data having at least two different modalities, each instance of the multimodal data having a temporal component; and

algorithmically learn a feature representation of the temporal component of the multimodal data using a deep learning architecture.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies for analyzing temporal components of multimodal data to detect short-term multimodal events, determine relationships between short-term multimodal events, and recognize long-term multimodal events, using a deep learning architecture, are disclosed.

Citations

20 Claims

1. A multimodal data analyzer comprising instructions embodied in one or more non-transitory machine accessible storage media, the multimodal data analyzer configured to cause a computing system comprising one or more computing devices to:
- access a set of time-varying instances of multimodal data having at least two different modalities, each instance of the multimodal data having a temporal component; and
  
  algorithmically learn a feature representation of the temporal component of the multimodal data using a deep learning architecture.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The multimodal data analyzer of claim 1, configured to classify the set of multimodal data by applying a temporal discriminative model to the feature representation of the temporal component of the multimodal data.
  - 3. The multimodal data analyzer of claim 1, configured to, using the deep learning architecture, identify short-term temporal features in the multimodal data.
  - 4. The multimodal data analyzer of claim 1, wherein the multimodal data comprises recorded speech and the multimodal data analyzer is configured to identify an intra-utterance dynamic feature of the recorded speech.
  - 5. The multimodal data analyzer of claim 1, configured to, using the deep learning architecture, identify a long-term temporal feature in the multimodal data.
  - 6. The multimodal data analyzer of claim 1, wherein the multimodal data comprises recorded speech and the multimodal data analyzer is configured to identify an inter-utterance dynamic feature in the recorded speech.
  - 7. The multimodal data analyzer of claim 1, wherein the multimodal data comprises audio and video, and the multimodal data analyzer is configured to (i) identify short-term dynamic features in the audio and video data and (ii) infer a long-term dynamic feature based on a combination of temporally-spaced audio and video short-term dynamic features.
  - 8. The multimodal data analyzer of claim 1, wherein the temporal deep learning architecture comprises a hybrid model having a generative component and a discriminative component, and wherein the multimodal data analyzer uses output of the generative component as input to the discriminative component.
  - 9. The multimodal data analyzer of claim 1, wherein the multimodal data analyzer is configured to identify at least two different temporally-spaced events in the multimodal data and infer a correlation between the at least two different temporally-spaced multimodal events.
  - 10. The multimodal data analyzer of claim 1, configured to algorithmically learn the feature representation of the temporal component of the multimodal data using an unsupervised machine learning technique.
  - 11. The multimodal data analyzer of claim 1, configured to algorithmically infer missing data both within a modality and across modalities.

12. A method for classifying multimodal data, the multimodal data comprising data having at least two different modalities, the method comprising, with a computing system comprising one or more computing devices:
- accessing a set of time-varying instances of multimodal data, each instance of the multimodal data having a temporal component; and
  
  algorithmically classifying the set of time-varying instances of multimodal data using a discriminative temporal model, the discriminative temporal model trained using a feature representation generated by a deep temporal generative model based on the temporal component of the multimodal data.
- View Dependent Claims (13, 14, 15)
- - 13. The method of claim 12, comprising identifying, within each modality of the multimodal data, a plurality of short-term features having different time scales.
  - 14. The method of claim 13, comprising, for each modality within the multimodal data, inferring a long-term dynamic feature based on the short-term dynamic features identified within the modality.
  - 15. The method of claim 13, comprising fusing short-term features across the different modalities of the multimodal data, and inferring a long-term dynamic feature based on the short-term features fused across the different modalities of the multimodal data.

16. A system for algorithmically recognizing a multimodal event in data, the system comprising:
- a data access module to access a set of time-varying instances of multimodal data, each instance of the multimodal data having a temporal component;
  
  a classifier module to classify different instances in the set of time-varying instances of multimodal data as indicative of different short-term events; and
  
  an event recognizer module to (i) recognize a longer-term multimodal event based on a plurality of multimodal short-term events identified by the classifier module and (ii) generate a semantic label for the recognized multimodal event.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein the classifier module is to apply a deep temporal generative model to the temporal component of the audio-visual data.
  - 18. The system of claim 17, wherein the event recognizer module is to use a discriminative temporal model to recognize the longer-term multimodal event.
  - 19. The system of claim 18, wherein the system is to train the discriminative temporal model using a feature representation generated by the deep temporal generative model.
  - 20. The system of claim 16, wherein the event recognizer module is to recognize the longer-term multimodal event by correlating a plurality of different short-term multimodal events having different time scales.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Siddiquie, Behjat, Divakaran, Ajay, Richey, Colleen, Khan, Saad, Sawhney, Harpreet S., Amer, Mohamed R.

Granted Patent

US 9,875,445 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 18/29   Graphical models, e.g. Baye...

G06N 20/00   Machine learning

G06N 7/01   Probabilistic graphical mod...

DYNAMIC HYBRID MODELS FOR MULTIMODAL ANALYSIS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DYNAMIC HYBRID MODELS FOR MULTIMODAL ANALYSIS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links