SYSTEMS AND METHODS FOR BUILDING A DOCUMENT INDEX
First Claim
1. A method for building a document index or a vertical index, the method comprising:
- (A) obtaining a first document, wherein the first document comprises code for a web page that corresponds to the first document;
(B) rendering a static graphic representation of the web page corresponding to the first document, wherein the rendering comprises generating a word map for the static graphic representation that comprises, for each respective word in a plurality of words in the first document, each area in the static graphic representation that is occupied by the respective word;
(C) storing the word map for the web page, wherein the word map comprises (i) an instance of a first word, (ii) an x-coordinate and a y-coordinate that represents where the instance of the first word appears in the static graphic representation of the web page, and (iii) a size of the area in the static graphic representation of the web page occupied by the instance of the first word; and
(D) building the document index or the vertical index comprising a plurality of documents, the plurality of documents comprising the first document, wherein the x-coordinate and the y-coordinate that represents where the instance of the first word that appears in the static graphic representation of the web page or the size of the area in the static graphic representation of the web page occupied by the instance of the first word is used as a feature of the first document that is indexed in the document index or the vertical index.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for building a document or vertical index are provided in which a document comprising code for a web page on the Internet is obtained. A static graphic representation of the web page is rendered thereby building a word map that has, for each respective word in a plurality of words, areas in the representation occupied by the word. The word map having (i) an instance of a word, (ii) x- and y- coordinates of where the word appears in the representation, and (iii) a size of the area in the representation occupied by the word, is stored. A document or vertical index including the document is built such that x- and y- coordinates of the word in the representation or the size of the area in the representation occupied by the word is used as a feature of the document in the document or vertical index.
67 Citations
27 Claims
-
1. A method for building a document index or a vertical index, the method comprising:
-
(A) obtaining a first document, wherein the first document comprises code for a web page that corresponds to the first document; (B) rendering a static graphic representation of the web page corresponding to the first document, wherein the rendering comprises generating a word map for the static graphic representation that comprises, for each respective word in a plurality of words in the first document, each area in the static graphic representation that is occupied by the respective word; (C) storing the word map for the web page, wherein the word map comprises (i) an instance of a first word, (ii) an x-coordinate and a y-coordinate that represents where the instance of the first word appears in the static graphic representation of the web page, and (iii) a size of the area in the static graphic representation of the web page occupied by the instance of the first word; and (D) building the document index or the vertical index comprising a plurality of documents, the plurality of documents comprising the first document, wherein the x-coordinate and the y-coordinate that represents where the instance of the first word that appears in the static graphic representation of the web page or the size of the area in the static graphic representation of the web page occupied by the instance of the first word is used as a feature of the first document that is indexed in the document index or the vertical index. - View Dependent Claims (2, 3, 4, 5, 6, 19, 22, 25)
-
-
7. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
(A) instructions for obtaining a first document, wherein the first document comprises code for a web page that corresponds to the first document; (B) instructions for rendering a static graphic representation of the web page corresponding to the first document, wherein the rendering comprises generating a word map for the static graphic representation that comprises, for each respective word in a plurality of words in the first document, each area in the static graphic representation that is occupied by the respective word; (C) instructions for storing the word map for the web page, wherein the word map comprises (i) an instance of a first word, (ii) an x-coordinate and a y-coordinate that represents where the instance of the first word appears in the static graphic representation of the web page, and (iii) a size of the area in the static graphic representation of the web page occupied by the instance of the first word; and (D) instructions for building a document index or a vertical index of a plurality of documents, the plurality of documents comprising the first document, wherein the x-coordinate and the y-coordinate that represents where the instance of the first word that appears in the static graphic representation of the web page or the size of the area in the static graphic representation of the web page occupied by the instance of the first word is used as a feature of the first document that is indexed in the document index or the vertical index. - View Dependent Claims (8, 9, 10, 11, 12, 20, 23, 26)
-
-
13. A computer, comprising:
-
a main memory; a processor; and one or more programs, stored in the main memory and executed by the processor, the one or more programs collectively including instructions for; (A) obtaining a first document, wherein the first document comprises code for a web page that corresponds to the first document; (B) rendering a static graphic representation of the web page corresponding to the first document, wherein the rendering comprises generating a word map for the static graphic representation that comprises, for each respective word in a plurality of words in the first document, each area in the static graphic representation that is occupied by the respective word; (C) storing the word map for the web page, wherein the word map comprises (i) an instance of a first word, (ii) an x-coordinate and a y-coordinate that represents where the instance of the first word appears in the static graphic representation of the web page, and (iii) a size of the area in the static graphic representation of the web page occupied by the instance of the first word; and (D) building a document index or a vertical index of a plurality of documents, the plurality of documents comprising the first document, wherein the x-coordinate and the y-coordinate that represents where the instance of the first word that appears in the static graphic representation of the web page or the size of the area in the static graphic representation of the web page occupied by the instance of the first word is used as a feature of the first document that is indexed in the document index or the vertical index. - View Dependent Claims (14, 15, 16, 17, 18, 21, 24, 27)
-
Specification