Method and apparatus for indexing and retrieving images using visual keywords
First Claim
1. A method of indexing and retrieving a visual document using visual keywords, said method including the steps of:
- providing a plurality of visual keywords derived using a learning technique from a plurality of visual tokens extracted across a predetermined number of visual elements;
each said visual token being a coherent unit of the visual document, and representing a visual content domain, comparing a plurality of visual tokens of another visual document with said visual keywords, a comparison result being represented by a three-dimensional map of detected locations of said visual keywords; and
determining a spatial distribution of visual keywords dependent upon said comparison result to provide a visual-content signature for said visual document.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, an apparatus and a computer program product for indexing and retrieving image data using visual keywords (108) is disclosed. Visual keywords (108) are prototypical, visual tokens (104) and are extracted from samples of visual documents (100) in a visual-content domain via supervised and/or unsupervised learning processes. An image or a video-shot key frame is described and indexed by a signature (112) that registers the spatial distribution of the visual keywords (108) present in its visual content. Visual documents (100) are retrieved for a sample query (120) by comparing the similarities between the signature (112) of the query (120) and those of visual documents (100) in the database. The signatures (112) of visual documents (100) are generated based on spatial distributions of the visual keywords (108). Singular-value decomposition (114) is applied to the signatures (112) to obtain a coded description (116).
-
Citations
78 Claims
-
1. A method of indexing and retrieving a visual document using visual keywords, said method including the steps of:
-
providing a plurality of visual keywords derived using a learning technique from a plurality of visual tokens extracted across a predetermined number of visual elements;
each said visual token being a coherent unit of the visual document, and representing a visual content domain,comparing a plurality of visual tokens of another visual document with said visual keywords, a comparison result being represented by a three-dimensional map of detected locations of said visual keywords; and
determining a spatial distribution of visual keywords dependent upon said comparison result to provide a visual-content signature for said visual document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. An apparatus for indexing and retrieving a visual document using visual keywords, said apparatus including:
-
means for providing a plurality of visual keywords derived using a learning technique from a plurality of visual tokens extracted across a predetermined number of visual documents;
each said visual token being a coherent unit of the visual document, and representing a visual content domain,means for comparing a plurality of visual tokens of another visual document with said visual keywords, a comparison result being represented by a three-dimensional map of detected locations of said visual keywords; and
means for determining a spatial distribution of visual keywords dependent upon said comparison result to provide a visual-content signature for said visual document. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
-
53. A computer program product having a computer readable medium having a computer program recorded thereon for indexing and retrieving a visual document using visual keywords, said computer program product including:
-
means for providing a plurality of visual keywords derived using a leaming technique from a plurality of visual tokens extracted across a predetermined number of visual documents;
each said visual token being a coherent unit of the visual document, and representing a visual content domain,means for comparing a plurality of visual tokens of another visual document with said visual keywords, a comparison result being represented by three-dimensional map of detected locations of said visual keywords; and
means for determining a spatial distribution of visual keywords dependent upon said comparison result to provide a visual-content signature for said visual document. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78)
-
Specification