Association of visual labels and event context in image data

US 9,734,166 B2
Filed: 08/26/2013
Issued: 08/15/2017
Est. Priority Date: 08/26/2013
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

generating a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event, and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions;

generating a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more audio-visual features;

constructing a similarity structure from the first set of contextual dimensions and the second net of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and

matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more audio-visual features serve to annotate the one or more audio-visual features;

wherein the generating, constructing and matching steps are performed via one or more processing devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A first set of contextual dimensions is generated from one or more textual descriptions associated with a given event, which includes one or more examples. A second set of contextual dimensions is generated from one or more visual features associated with the given event, which includes one or more visual example recordings. A similarity structure is constructed from the first set of contextual dimensions and the second set of contextual dimensions. One or more of the textual descriptions is matched with one or more of the visual features based on the similarity structure.

29 Citations

21 Claims

1. A method, comprising:
- generating a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event, and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions;
  
  generating a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more audio-visual features;
  
  constructing a similarity structure from the first set of contextual dimensions and the second net of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and
  
  matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more audio-visual features serve to annotate the one or more audio-visual features;
  
  wherein the generating, constructing and matching steps are performed via one or more processing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises parsing the one or more textual descriptions associated with the given event by identifying one or more terms or one or more sets of terms appearing in one or more taxonomies or one or more ontologies.
  - 3. The method of claim 2, wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises mapping the one or more identified terms or one or more identified sets of terms to one or more textual objects in the one or more taxonomies or the one or more ontologies.
  - 4. The method of claim 3, wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises classifying the one or more textual objects into one or more classes.
  - 5. The method of claim 4, wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises arranging the one or more classified textual objects in a time sequence describing the given event in one or more event taxonomy graphs.
  - 6. The method of claim 5, wherein the step of generating a second set of contextual dimensions for one or more audio-visual features associated with the given event further comprises extracting the one or more audio-visual features associated with the given event from one or more images or one or more objects from a video frame from one or more videos.
  - 7. The method of claim 6, wherein the step of generating a second set of contextual dimensions for one or more audio-visual features associated with the given event further comprises classifying the one or more audio-visual features into one or more visual concepts associated with one or more taxonomies or one or more ontologies.
  - 8. The method of claim 1, wherein the step of constructing a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions further comprises forming the relationship network by associating each of the one or more visual concepts to the one or more event taxonomy graphs.
  - 9. The method of claim 8, wherein the step of matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure further comprises assigning a relevant one of the one or more textual descriptions to one of the one or more images or the one or more videos based on the formed relationship network.
  - 10. The method of claim 9, wherein the step of classifying the one or more textual objects and the step of classifying the one or more audio-visual features further comprise selecting from context classes, object classes and activity classes.

11. A computer program product comprising a processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by the one or more processing devices implement steps of:
- generating a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event, and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions;
  
  generating a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more visual features;
  
  constructing a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and
  
  matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more audio-visual features serve to annotate the one or more audio-visual features.

12. An apparatus, comprising:
- a memory; and
  
  a processor operatively coupled to the memory and configured to;
  
  generate a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event,and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions;
  
  generate a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more audio-visual features;
  
  construct a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and
  
  match one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more visual audio-visual features serve to annotate the one or more audio-visual features.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The apparatus of claim 12, wherein generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises parsing the one or more textual descriptions associated with the given event by identifying one or more terms or one or more sets of terms appearing in one or more taxonomies or one or more ontologies.
  - 14. The apparatus of claim 13, wherein generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises mapping the one or more identified terms or one or more identified sets of terms to one or more textual objects in the one or more taxonomies or the one or more ontologies.
  - 15. The apparatus of claim 14, wherein generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises classifying the one or more textual objects into one or more classes.
  - 16. The apparatus of claim 15, wherein generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises arranging the one or more classified textual objects in a time sequence describing the given event in one or more event taxonomy graphs.
  - 17. The apparatus of claim 16, wherein generating a second set of contextual dimensions for one or more audio-visual features associated with the given event further comprises extracting the one or more audio-visual features associated with the given event from one or more images or one or more objects from a video frame from one or more videos.
  - 18. The apparatus of claim 17, wherein generating a second set of contextual dimensions for one or more audio-visual features associated with the given event further comprises classifying the one or more audio-visual features into one or more visual concepts associated with one or more taxonomies or one or more ontologies.
  - 19. The apparatus of claim 12, wherein constructing a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions further comprises forming the relationship network by associating each of the one or more visual concepts to the one or more event taxonomy graphs.
  - 20. The apparatus of claim 19, wherein matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure further comprises assigning a relevant one of the one or more textual descriptions to one of the one or more images or the one or more videos based on the formed relationship network.
  - 21. The apparatus of claim 20, wherein classifying the one or more textual Objects and the step of classifying the one or more audio-visual features further comprise selecting from context classes, object classes and activity classes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chang, Yuan-Chi, Cao, Liangliang, Nguyen, Quoc-Bao
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
Wong, Huen

Application Number

US13/975,497
Publication Number

US 20150058348A1
Time in Patent Office

1,450 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/26   Visual data mining; Browsin...

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/48   Retrieval characterised by ...

G06F 16/483   using metadata automaticall...

G06F 16/5866   using information manually ...

G06F 16/748   Hypervideo linking data to ...

Association of visual labels and event context in image data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Association of visual labels and event context in image data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links