Visual language modeling for image classification
First Claim
1. A method at least partially implemented by a computing device, the method comprising:
- modeling images representing multiple image categories as respective matrices of visual words;
generating visual language models from the respective matrices of visual words, the generating comprising;
correlating visual words in the matrices of visual words according to a visual word grammar indicating conditional distribution of the visual words; and
for each category of the multiple image categories, building respective visual language models based on the conditional distribution of the visual words;
estimating an image category for an image in view of the visual language models; and
presenting the image category or a result based on the image category to a user.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for visual language modeling for image classification are described. In one aspect the systems and methods model training images corresponding to multiple image categories as matrices of visual words. Visual language models are generated from the matrices. In view of a given image, for example, provided by a user or from the Web, the systems and methods determine an image category corresponding to the given image. This image categorization is accomplished by maximizing the posterior probability of visual words associated with the given image over the visual language models. The image category, or a result corresponding to the image category, is presented to the user.
-
Citations
17 Claims
-
1. A method at least partially implemented by a computing device, the method comprising:
-
modeling images representing multiple image categories as respective matrices of visual words; generating visual language models from the respective matrices of visual words, the generating comprising; correlating visual words in the matrices of visual words according to a visual word grammar indicating conditional distribution of the visual words; and for each category of the multiple image categories, building respective visual language models based on the conditional distribution of the visual words; estimating an image category for an image in view of the visual language models; and presenting the image category or a result based on the image category to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer storage device storing computer executable instructions, which, when executed by the computer, cause the computer to perform operations, the operations comprising:
-
building visual language models from matrices of visual words generated from a set of training images, the visual language models being based on a visual word grammar, the training images corresponding to one or more predetermined image classifications, the building comprising; for each training image; dividing the training image into multiple image patches, each image patch being a group of pixels; for each image patch of the image patches; extracting features to describe one or more properties of the patch; representing at least a subset of the features as one or more multidimensional vectors; transforming, in view of a visual word grammar, the one or more multidimensional vectors into a respective hash code, the respective hash code being a visual word of the visual words; creating a visual document from an image for image categorization; determining an image category for the image based on characteristics of the visual document in view of the visual language models, and the image category corresponding to a classification of the one or more predetermined image classifications; and presenting the image category or a result based on the image category to a user. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computing device comprising:
-
a processor; and a memory couple to the processor, the memory including computer-program instructions encoded thereon, the computer-program instructions, when executed by the processor, for performing operations comprising; loading a set of training images associated with corresponding image categories; for each training image of the training images; (a) dividing the training image into a respective set of image patches; (b) generating a visual word for each image patch to form a respective visual document for the training image; for each category of the one or more image categories, generating visual language model(s); estimating, using the visual language model(s), an image category for a given image comprising; generating a visual document comprising respective visual words from the given image to determine a conditional distribution of the visual words over respective ones of these visual language model(s), a visual language model associated with a largest conditional distribution of the visual words indicating the image category; and presenting the image category or a result corresponding to the image category to a user. - View Dependent Claims (16, 17)
-
Specification