Automatic determination of whether a document includes an image gallery
First Claim
Patent Images
1. A computer-implemented method comprising:
- identifying, by one or more processors, a document stored on a server in a network;
calculating, by one or more processors, position information indicating relative positions of images in the document, the position information providing an indication of horizontal and vertical distances between the images in the document, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed on a display;
constructing, by one or more processors, a histogram based on the horizontal and vertical distances between the images in the document;
calculating, by one or more processors, a likelihood that the document includes an image gallery based on a peak value in the histogram;
indicating, by one or more processors, that the document contains an image gallery or does not contain an image gallery based on the calculated likelihood; and
in response to an indication that the document contains an image gallery, updating, by one or more processors, an index, stored in a memory, that references the document or a score of the document.
2 Assignments
0 Petitions
Accused Products
Abstract
Image galleries are automatically located within documents, such as web pages. Documents that are determined to contain image galleries may be treated differently when storing the document for later retrieval by an image search engine. In one implementation, the image galleries are automatically located within a document by calculating position information indicating relative positions of images in the document. The document may be determined to contain an image gallery when the position information indicates that the images in the document are generally evenly distributed.
31 Citations
29 Claims
-
1. A computer-implemented method comprising:
-
identifying, by one or more processors, a document stored on a server in a network; calculating, by one or more processors, position information indicating relative positions of images in the document, the position information providing an indication of horizontal and vertical distances between the images in the document, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed on a display; constructing, by one or more processors, a histogram based on the horizontal and vertical distances between the images in the document; calculating, by one or more processors, a likelihood that the document includes an image gallery based on a peak value in the histogram; indicating, by one or more processors, that the document contains an image gallery or does not contain an image gallery based on the calculated likelihood; and in response to an indication that the document contains an image gallery, updating, by one or more processors, an index, stored in a memory, that references the document or a score of the document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A device comprising:
-
means for identifying a document stored on a server in a network; means for calculating position information indicating relative positions of images in the document, the position information providing an indication of spacing of the images in the document, the means for calculating including; means for constructing a histogram based on horizontal and vertical distances between the images in the document, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed, and means for calculating a likelihood that the document includes an image gallery based on a peak value in the histogram; means for determining that the document contains an image gallery when the calculated position information indicates that the images are uniformly arranged relative to one another within the document; means for determining that the document does not contain an image gallery when the calculated position information indicates that the images are not uniformly arranged relative to one another in the document; and means for storing an indication that the document contains the image gallery when the document is determined to contain the image gallery. - View Dependent Claims (9, 10)
-
-
11. An image search engine implemented in a computer device, comprising:
-
a search component to return images relevant to search queries based on a comparison of a search query to a document index; and an image indexing component to store the document index based on text associated with images in documents, the image indexing component annotating the document index to indicate documents in the document index that include an image gallery, where the image indexing component is to automatically determine that a document includes an image gallery by analyzing the relative positions of images in the document to determine whether the images in the document are evenly spaced in the document with respect to one another, where the image indexing component is to automatically determine whether a document includes an image gallery by; constructing a two-dimensional histogram of occurrences of images based on horizontal and vertical image spacings from one another, the horizontal and vertical image spacings corresponding to spatial distances between the images when the document is rendered and displayed; and calculating a likelihood that the document includes an image gallery based on a peak value in the histogram. - View Dependent Claims (28)
-
-
12. An image search engine implemented in a computer device, comprising:
-
a search component to return images relevant to search queries based on a comparison of the search query to a document index; and an image indexing component to store the document index based on text associated with images in documents, the image indexing component annotating the document index to indicate documents in the document index that include an image gallery, the image indexing component to automatically determine that a document includes an image gallery based on relative positions of images within the document, where the image indexing component determines the relative position of images in the document by; locating tokens in the document, the located tokens corresponding to the images in the document; generating a token table in which the located tokens are assigned to cells in the token table such that tokens in the token table are arranged to have a logical spatial layout in the token table corresponding to a spatial layout of the images in the document; and constructing a two-dimensional histogram based on horizontal and vertical distances between the located tokens in the document, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed. - View Dependent Claims (13)
-
-
14. A computer-implemented method comprising:
-
calculating, by one or more processors, position information indicating relative positions of images in documents that are stored on at least one server in a network, the position information providing an indication of spacing of the images in the documents, the calculating including; constructing, by one or more processors, histograms based on horizontal and vertical distances between the images in the documents, the horizontal and vertical distances corresponding to spatial distances between the images when the documents are rendered and displayed on a display; calculating, by one or more processors, a likelihood that each document, of the documents, includes an image gallery based on a peak value in the histogram; determining, by one or more processors, that a first document of the documents contains an image gallery when the calculated position information indicates that the images are uniformly arranged relative to one another within the first document; determining, by one or more processors, that the first document does not contain an image gallery when the calculated position information indicates that the images are not uniformly arranged relative to one another within the first document; generating, by one or more processors, a document index based on text associated with the images in the first document, the document index being structured such that images that are arranged in image galleries are ranked differently than images that are not arranged in the image galleries; storing the generated document index in a memory; and returning, by one or more processors, images relevant to a search query based on a comparison of the search query to the document index. - View Dependent Claims (29)
-
-
15. A computer-implemented method comprising:
-
crawling, by one or more processors, a network to identify documents stored on a server; determining, by one or more processors, that a document, that is identified, contains an image gallery based on a spatial layout of images in the document and links corresponding to images in the document or sizes of images in the document, where determining that the document contains an image gallery includes; constructing, by one or more processors, a histogram based on horizontal and vertical distances between images in the document, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed, and calculating, by one or more processors, a likelihood that the document includes an image gallery based on a peak value in the histogram; and updating, by one or more processors, an index, in a memory and that includes the document, to indicate that the document contains an image gallery. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A computer-implemented method comprising:
-
identifying, by one or more processors, a document stored on a server in a network; locating, by one or more processors, tokens in the document, the tokens representing images in the document; generating, by one or more processors, a token table based on a spatial layout of the located tokens; calculating, by one or more processors, position information indicating relative positions of the images in the document based on the token table, the calculating including; constructing, by one or more processors, and using the token table, a histogram based on the horizontal and vertical distances between the tokens in the token table, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed on a display; calculating, by one or more processors, a likelihood that the document includes an image gallery based on a peak value in the histogram; determining, by one or more processors, that the document contains an image gallery when the position information indicates that the images in the document are evenly spaced in the document with respect to one another; determining, by one or more processors, that the document does not contain an image gallery when the position information indicates that the images in the document are not evenly spaced in the document with respect to one another; and updating, by one or more processors, an index in a memory based on whether the document is determined to contain an image gallery. - View Dependent Claims (23, 24)
-
-
25. A computer-readable memory device including computer-executable instructions, the computer-readable memory device comprising:
-
one or more instructions to identify a document stored on a server in a network; one or more instructions to calculate position information indicating relative positions of images in the document, the position information providing an indication of spacing of the images in the document, where the one or more instructions to calculate position information includes; one or more instructions to construct a histogram based on horizontal and vertical distances between the images in the document, the horizontal and vertical distances corresponding to spatial distances between the images when the document is rendered and displayed on a display, and one or more instructions to calculate, by the processor, a likelihood that the document includes an image gallery based on a peak value in the histogram; one or more instructions to determine that the document contains an image gallery when the calculated position information indicates that the images are uniformly arranged relative to one another within the document; one or more instructions to determine that the document does not contain an image gallery when the calculated position information indicates that the images are not uniformly arranged relative to one another within the document; and one or more instructions to, in response to a determination that the document contains an image gallery, update an index that references the document or a score of the document. - View Dependent Claims (26, 27)
-
Specification