Structured knowledge modeling and extraction from images
First Claim
1. A method implemented by at least one computing device, the method comprising:
- obtaining, by the at least one computing device, training data including images and associated text;
extracting, by the at least one computing device, a plurality of text features resulting from natural language processing of the associated text of the training data, the plurality of text features corresponding to a subject and an object, respectively, within a respective said image of the training data;
generating, by the at least one computing device, a plurality of bounding boxes in the respective said image for at least one said text feature;
localizing, by the at least one computing device, the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object;
adding, by the at least one computing device, an additional area from the respective said image to the combination of the first said bounding box and the second said bounding box; and
training, by the at least one computing device, a model using data that includes the at least one said text feature as localized to the combination of the first and second said bounding boxes having the additional area as part of machine learning.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques and systems are described to model and extract knowledge from images. A digital medium environment is configured to learn and use a model to compute a descriptive summarization of an input image automatically and without user intervention. Training data is obtained to train a model using machine learning in order to generate a structured image representation that serves as the descriptive summarization of an input image. The images and associated text are processed to extract structured semantic knowledge from the text, which is then associated with the images. The structured semantic knowledge is processed along with corresponding images to train a model using machine learning such that the model describes a relationship between text features within the structured semantic knowledge. Once the model is learned, the model is usable to process input images to generate a structured image representation of the image.
-
Citations
19 Claims
-
1. A method implemented by at least one computing device, the method comprising:
-
obtaining, by the at least one computing device, training data including images and associated text; extracting, by the at least one computing device, a plurality of text features resulting from natural language processing of the associated text of the training data, the plurality of text features corresponding to a subject and an object, respectively, within a respective said image of the training data; generating, by the at least one computing device, a plurality of bounding boxes in the respective said image for at least one said text feature; localizing, by the at least one computing device, the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object; adding, by the at least one computing device, an additional area from the respective said image to the combination of the first said bounding box and the second said bounding box; and training, by the at least one computing device, a model using data that includes the at least one said text feature as localized to the combination of the first and second said bounding boxes having the additional area as part of machine learning. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system implemented by at least one computing device comprising:
-
an extractor module to extract a plurality of text features from text associated with images in training data using natural language processing; a grounding and localization module to; generate bounding boxes in a respective said image for at least one text feature of the plurality of text features; determine the bounding boxes include multiple occurrences of a subject or an object of the at least one said text feature; identify relative positional information from the text associated with the at least one said text feature; and ground the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object based on the relative positional information, the at least one said text feature is grounded to the combination of the first said bounding box for the subject and the second said bounding box for the object as a smallest rectangular area in the respective said image that includes the first said bounding box and the second said bounding box; and a model training module to train a model using the training data having the grounded at least one said text feature as part of machine learning. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A method implemented by at least one computing device, the method comprising:
-
obtaining, by the at least one computing device, training data including images and associated text; extracting, by the at least one computing device, a plurality of text features using natural language processing from the associated text; generating, by the at least one computing device, bounding boxes in a respective said image for at least one text feature of the plurality of text features; determining, by the at least one computing device, the bounding boxes include multiple occurrences of a subject or an object of the at least one said text feature; identifying, by the at least one computing device, relative positional information from the text associated with the at least one said text feature; grounding, by the at least one computing device, the at least one said text feature to a combination of a first said bounding box for the subject and a second said bounding box for the object based on the relative positional information; adding, by the at least one computing device, an additional area from the respective said image to the combination of the first said bounding box and the second said bounding box; and training, by the at least one computing device, a model using the combination of the first and second said bounding boxes having the additional area for the at least one said text feature. - View Dependent Claims (18, 19)
-
Specification