Structured knowledge modeling and extraction from images

US 11,514,244 B2
Filed: 12/22/2015
Issued: 11/29/2022
Est. Priority Date: 11/11/2015
Status: Active Grant

First Claim

Patent Images

1. A method implemented by at least one computing device, the method comprising:

obtaining, by the at least one computing device, training data including images and associated text;

extracting, by the at least one computing device, a plurality of text features resulting from natural language processing of the associated text of the training data, the plurality of text features corresponding to a subject and an object, respectively, within a respective said image of the training data;

generating, by the at least one computing device, a plurality of bounding boxes in the respective said image for at least one said text feature;

localizing, by the at least one computing device, the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object;

adding, by the at least one computing device, an additional area from the respective said image to the combination of the first said bounding box and the second said bounding box; and

training, by the at least one computing device, a model using data that includes the at least one said text feature as localized to the combination of the first and second said bounding boxes having the additional area as part of machine learning.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques and systems are described to model and extract knowledge from images. A digital medium environment is configured to learn and use a model to compute a descriptive summarization of an input image automatically and without user intervention. Training data is obtained to train a model using machine learning in order to generate a structured image representation that serves as the descriptive summarization of an input image. The images and associated text are processed to extract structured semantic knowledge from the text, which is then associated with the images. The structured semantic knowledge is processed along with corresponding images to train a model using machine learning such that the model describes a relationship between text features within the structured semantic knowledge. Once the model is learned, the model is usable to process input images to generate a structured image representation of the image.

Citations

19 Claims

1. A method implemented by at least one computing device, the method comprising:
- obtaining, by the at least one computing device, training data including images and associated text;
  
  extracting, by the at least one computing device, a plurality of text features resulting from natural language processing of the associated text of the training data, the plurality of text features corresponding to a subject and an object, respectively, within a respective said image of the training data;
  
  generating, by the at least one computing device, a plurality of bounding boxes in the respective said image for at least one said text feature;
  
  localizing, by the at least one computing device, the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object;
  
  adding, by the at least one computing device, an additional area from the respective said image to the combination of the first said bounding box and the second said bounding box; and
  
  training, by the at least one computing device, a model using data that includes the at least one said text feature as localized to the combination of the first and second said bounding boxes having the additional area as part of machine learning.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method as described in claim 1, further comprising generating a descriptive summarization of the object of an input image using the model.
  - 3. The method as described in claim 1, wherein the localizing is performed responsive to determining respective distance between the first and second said bounding boxes is within a threshold distance.
  - 4. The method as described in claim 1, wherein the associated text is a caption or metadata of the respective said image.
  - 5. The method as described in claim 1, wherein the plurality of text features are in a form of <
    - subject, predicate, object>
      
      .
  - 6. The method as described in claim 1, further comprising:
    - removing at least one of the plurality of the text features from use as part of the training.
  - 7. The method as described in claim 1, further comprising:
    - identifying confidence in the extracting.
  - 8. The method as described in claim 1, wherein the training includes adapting the plurality of text features or the image features one to another, within a vector space.
  - 9. The method as described in claim 1, wherein the model explicitly correlates the image features of an input image with the plurality of text features such that at least one of the image features is explicitly correlated with a first one of the plurality of text features but not a second one of the plurality of text features.
  - 10. The method as described in claim 1, wherein the plurality of text features are explicitly correlated to the image features.

11. A system implemented by at least one computing device comprising:
- an extractor module to extract a plurality of text features from text associated with images in training data using natural language processing;
  
  a grounding and localization module to;
  
  generate bounding boxes in a respective said image for at least one text feature of the plurality of text features;
  
  determine the bounding boxes include multiple occurrences of a subject or an object of the at least one said text feature;
  
  identify relative positional information from the text associated with the at least one said text feature; and
  
  ground the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object based on the relative positional information, the at least one said text feature is grounded to the combination of the first said bounding box for the subject and the second said bounding box for the object as a smallest rectangular areain the respective said image that includes the first said bounding box and the second said bounding box; and
  
  a model training module to train a model using the training data having the grounded at least one said text feature as part of machine learning.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The system as described in claim 11, wherein the associated text is unstructured.
  - 13. The system as described in claim 11, wherein the plurality of text features are in a form of a <
    - subject, predicate, object>
      
      tuple.
  - 14. The system as described in claim 11, wherein the extractor module is configured to localize at least part of the plurality of text features as corresponding to respective portions within respective said images and as not corresponding to other portions within respective said images.
  - 15. The system as described in claim 11, further comprising a module configured to generate a caption for an input image based on the plurality of text features.
  - 16. The system as described in claim 11, further comprising a use module configured to deduce, based on the plurality of text features, scene properties of an input image using the model.

17. A method implemented by at least one computing device, the method comprising:
- obtaining, by the at least one computing device, training data including images and associated text;
  
  extracting, by the at least one computing device, a plurality of text features using natural language processing from the associated text;
  
  generating, by the at least one computing device, bounding boxes in a respective said image for at least one text feature of the plurality of text features;
  
  determining, by the at least one computing device, the bounding boxes include multiple occurrences of a subject or an object of the at least one said text feature;
  
  identifying, by the at least one computing device, relative positional information from the text associated with the at least one said text feature;
  
  grounding, by the at least one computing device, the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object based on the relative positional information;
  
  adding, by the at least one computing device, an additional area from the respective said image to the combination of the first said bounding box and the second said bounding box; and
  
  training, by the at least one computing device, a model using the combination of the first and second said bounding boxes having the additional area for the at least one said text feature.
- View Dependent Claims (18, 19)
- - 18. The method as described in claim 17, wherein the combination of the first and second said bounding boxes is a smallest rectangular area in the respective said image that includes the first and second said bounding boxes and the adding includes adding the additional area from the respective said image that is not included in the smallest rectangular area.
  - 19. The method as described in claim 17, wherein the plurality of text features are in a form of a <
    - subject, predicate, object>
      
      tuple.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Inc.
Inventors
Cohen, Scott D., Chang, Walter Wei-Tuh, Price, Brian L., Elhoseiny, Mohamed Hamdy Mahmoud Abdelbaky
Primary Examiner(s)
Huntley, Michael J
Assistant Examiner(s)
Alabi, Oluwatosin O

Application Number

US14/978,350
Publication Number

US 20170132526A1
Time in Patent Office

2,534 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/30   Semantic analysis

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 5/022   Knowledge engineering; Know...

Structured knowledge modeling and extraction from images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Structured knowledge modeling and extraction from images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links