Ranking search engine results
First Claim
1. A computer-implemented method for randomly walking through a hyper-text-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the method comprising:
- a) selecting a host;
b) selecting at random a document associated with the host;
c) retrieving the selected document;
d) randomly choosing whether to select a random new document;
e) responsive to choosing to select the random new document;
e.1) selecting at random a new host from among the previously selected hosts;
e.2) selecting at random a new document associated with the new host; and
e.3) retrieving the selected new document;
f) responsive to choosing not to select the random new document;
f.1) selecting at random a link in the retrieved document; and
f.2) retrieving a document referenced by the selected link; and
g) repeating d), and then conditionally repeating e) or f) depending upon the choosing made in d).
4 Assignments
0 Petitions
Accused Products
Abstract
A method, system, and computer program product for determining relative quality of search engine indexes and search results include performing a two-level random walk through a hypertext-linked document set. Search engine index quality is measured based on the number of encountered documents that are indexed by the search engine index. Search result quality is measured based on the number and quality of documents that link to the result document.
-
Citations
55 Claims
-
1. A computer-implemented method for randomly walking through a hyper-text-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the method comprising:
-
a) selecting a host; b) selecting at random a document associated with the host; c) retrieving the selected document; d) randomly choosing whether to select a random new document; e) responsive to choosing to select the random new document; e.1) selecting at random a new host from among the previously selected hosts; e.2) selecting at random a new document associated with the new host; and e.3) retrieving the selected new document; f) responsive to choosing not to select the random new document; f.1) selecting at random a link in the retrieved document; and f.2) retrieving a document referenced by the selected link; and g) repeating d), and then conditionally repeating e) or f) depending upon the choosing made in d). - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the method comprising:
-
a) initializing a host set; b) initializing a document set for each host in the host set; c) selecting at random a host from the host set; d) selecting at random a document from the document set of the selected host; and e) responsive to the selected document containing at least one link; e.1) selecting at random a link from the selected document; e.2) selecting a document corresponding to the selected link; e.3) selecting a host corresponding to the selected document; e.4) adding the selected host to the host set; e.5) adding the selected document to the document set of the selected host; and e.6) repeating e.1) through e.5) until all links have been traversed. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-implemented method for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the method comprising:
-
a) initializing a host set; b) initializing a document set for each host in the host set; c) selecting at random a host from the host set; d) selecting at random a document from the document set of the selected host; e) randomly choosing whether to select a random new document; and f) responsive to choosing not to select a random new document and further responsive to the selected document containing at least one link; f.1) selecting at random a link from the selected document; f.2) selecting a document corresponding to the selected link; f.3) selecting a host corresponding to the selected document; f.4) adding the selected host to the host set; f.5) adding the selected document to the document set of the selected host; and f.6) repeating f.1) through f.5) until all links have been traversed. - View Dependent Claims (12)
-
-
13. A computer-implemented method for measuring relative quality of a search engine index, comprising:
-
a) performing a two-level random walk among documents within a document set, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, and wherein performing the two-level random walk comprises; a.1) selecting a host; a.2) selecting at random a document associated with the host; a.3) retrieving the selected document; a.4) selecting at random a link in the retrieved document; a.5) retrieving a document referenced by the selected link; and a.6) repeating a.4) and a.5) until all links have been traversed; b) for each document encountered in the random walk, determining whether the document is indexed by the search engine index; and c) aggregating the results of b). - View Dependent Claims (14, 15)
-
-
16. A computer-implemented method for measuring relative quality of a search engine index, comprising:
-
a) performing a two-level random walk among documents within a document set, by; a.1) selecting a host; a.2) selecting at random a document associated with the host; a.3) retrieving the selected document; a.4) randomly choosing whether to select a random new document; a.4.1) responsive to choosing to select the random new document; a.4.1.1) selecting at random a new host from among the previously selected hosts; a.4.1.2) selecting at random a new document associated with the host; and a.4.1.3) retrieving the selected new document; a.4.2) responsive to choosing not to select the random new document; a.4.2.1) selecting at random a link in the retrieved document; and a.4.2.2) retrieving a document referenced by the selected link; and a.5) repeating a.4), and then conditionally repeating a.4.1) through a.4.1.3) or a.4.2) through a.4.2.2) depending upon the choosing made in a.4); b) for each document encountered in the random walk, determining whether the document is indexed by the search engine index; and c) aggregating the results of b). - View Dependent Claims (17)
-
-
18. A computer-implemented method for measuring relative quality of a search engine index, comprising:
-
a) performing a two-level random walk among documents within a document set, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, and wherein performing the two-level random walk comprises; a.1) initializing a host set; a.2) initializing a document set for each host in the host set; a.3) selecting at random a host from the host set; a.4) selecting at random a document from the document set of the selected host; a.5) adding a host that is referenced by a selected link to the host set; a.6) adding a document referenced by the selected link to the document set of the selected host; a.7) responsive to the selected document containing at least one link; a.7.1) selecting at random a link from the selected document; a.7.2) selecting a document corresponding to the selected link; a.7.3) selecting a host corresponding to the selected document; and a.7.4) repeating a.5) through a.8) until all links have been traversed; and a.8) responsive to the selected document not containing at least one link, repeating a.3) through a.6), and further conditionally repeating a.7) or a.8), until all documents have been traversed; b) for each document encountered in the random walk, determining whether the document is indexed by the search engine index; and c) aggregating the results of b). - View Dependent Claims (19)
-
-
20. A computer-implemented method for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, the method comprising:
-
a) performing a two-level random walk among documents within a document set; and b) determining a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document, the determining of the quality metric comprising determining a value for; - View Dependent Claims (21, 22)
-
-
23. A computer-implemented method for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, wherein each document is associated with a host, the method comprising:
-
a) performing a two-level random walk among documents within a document set by; a.1) selecting a host; a.2) selecting at random a document associated with the host; a.3) retrieving the selected document; a.4) randomly choosing whether to select a random new document; a.5) responsive to choosing to select the random new document; a.5.1) selecting at random a host from among the previously selected hosts; a.5.2) selecting at random a document associated with the host; and a.5.3) retrieving the selected document; a.6) responsive to choosing not to select the random new document; a.6.1) selecting at random a link in the retrieved document; and a.6.2) retrieving a document referenced by the selected link; and a.7) repeating a.4), and then conditionally repeating a.5) through a.5.3) or a.6) through a.6.2) depending upon the choosing made in a.4); and b) determining a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document. - View Dependent Claims (24)
-
-
25. A computer-implemented method for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, wherein each document is associated with a host, the method comprising:
-
a) performing a two-level random walk among documents within a document set, by; a.1) initializing a host set; a.2) initializing a document set for each host in the host set; a.3) selecting at random a host from the host set; a.4) randomly choosing whether to select a random new host; a.5) responsive to choosing to select the random new host; a.5.1) selecting at random a new host from among the previously selected hosts; a.6) responsive to choosing not to select the random new host; a.6.1) selecting at random a document from the document set of the selected host; and a.6.2) responsive to the selected document containing at least one link; a.6.2.1) selecting at random a link from the selected document; a.6.2.2) selecting a document corresponding to the selected link; a.6.2.3) selecting a host corresponding to the selected document; and a.6.2.4) adding the selected host to the host set; a.6.2.5) adding the selected document to the document set of the selected host; a.6.2.6) repeating a.6.2.1) through a.6.2.5) until all links have been traversed; and a.7) repeating a.3) through a.4), and then conditionally repeating a.5) through a.5.1) or a.6) through a.6.2.6); and b) determining a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document. - View Dependent Claims (26)
-
-
27. A computer-implemented method for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the method comprising:
-
a) selecting a host; b) selecting at random a document associated with the host; c) retrieving the selected document; d) randomly choosing whether to select a random new host; e) responsive to choosing to select the random new host; e.1) selecting at random a new host from among the previously selected hosts; and e.2) repeating b) through d), and then conditionally e) through e.2) or f) through f.3 until all documents have been traversed; and f) responsive to choosing not to select the random new host; f.1) selecting at random a link in the retrieved document; f.2) retrieving a document referenced by the selected link; and f.3) repeating d), and then conditionally e) through e.2) or f) through f.3) until all links have been traversed.
-
-
28. A computer-implemented method for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, the method comprising:
-
a) performing a two-level random walk among documents within a document set, by; a.1) initializing a host set; a.2) initializing a document set for each host in the host set; a.3) selecting at random a host from the host set; a.4) randomly choosing whether to select a random new host; a.5) responsive to choosing to select a random new host; a.5.1) selecting at random a new host from among the previously selected hosts; a.6) responsive to choosing not to select the random new host; a.6.1) selecting at random a document from the document set of the selected host; and a.6.2) responsive to the selected document containing at least one link; a.6.2.1) selecting at random a link from the selected document; a.6.2.2) selecting a document corresponding to the selected link; a.6.2.3) selecting a host corresponding to the selected document; and a.6.2.4) adding the selected host to the host set; a.6.2.5) adding the selected document to the document set of the selected host; a.6.2.6) repeating a.6.2.1) through a.6.2.5) until all links have been traversed; and a.7) repeating a.3) through a.4), and then conditionally repeating a.5) through a.5.1) or a.6) through a.6.2.6); and b) determining a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document; c) determining a quality metric for at least one additional target document; and d) ranking the quality metric of the first document with respect to the quality metrics of the additional target documents. - View Dependent Claims (29)
-
-
30. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a sub-set of the documents contain a plurality of links to other documents, each document being associated with a host, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to select a host; b) computer-readable program code devices configured to cause a computer to select at random a document associated with the host; c) computer-readable program code devices configured to cause a computer to retrieve the selected document; d) computer-readable program code devices configured to cause a computer to randomly choose whether to select a random new document; e) computer-readable program code devices configured to cause a computer to, responsive to choosing to select the random new document; e.1) select at random a new host from among the previously selected hosts; and e.2) select at random a new document associated with the host; and e.3) retrieve the selected new document; f) computer-readable program code devices configured to cause a computer to, responsive to choosing not to select the random new document; f.1) select at random a link in the retrieved document; and f.2) retrieve a document referenced by the selected link; and g) computer-readable program code devices configured to cause a computer to repeat the operations of d) and then conditionally repeat the operations of e) or f) depending on the choice made in d). - View Dependent Claims (31, 32)
-
-
33. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to initialize a host set; b) computer-readable program code devices configured to cause a computer to initialize a document set for each host in the host set; c) computer-readable program code devices configured to cause a computer to select at random a host from the host set; d) computer-readable program code devices configured to cause a computer to select at random a document from the document set of the selected host; and e) computer-readable program code devices configured to cause a computer to, responsive to the selected document containing at least one link; e.1) select at random a link from the selected document; e.2) select a document corresponding to the selected link; e.3) select a host corresponding to the selected document; and e.4) add the selected host to the host set; e.5) add the selected document to the document set of the selected host; and e.6) repeat the operations of e.1) through e.5) until all links have been traversed. - View Dependent Claims (34, 35, 36, 37)
-
-
38. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a sub-set of the documents contain a plurality of links to other documents, each document being associated with a host, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to initialize a host set; b) computer-readable program code devices configured to cause a computer to initialize a document set for each host in the host set; c) computer-readable program code devices configured to cause a computer to select at random a host from the host set; d) computer-readable program code devices configured to cause a computer to select at random a document from the document set of the selected host; e) computer-readable program code devices configured to cause a computer to randomly choose whether to select a random new document; and f) computer-readable program code devices configured to cause a computer to, responsive to choosing not to select a random new document, and further responsive to the selected document containing at least one link; f.1) select at random a link from the selected document; f.2) select a document corresponding to the selected link; f.3) select a host corresponding to the selected document; and f.4) add the selected host to the host set; f.5) add the selected document to the document set of the selected host; and f.6) repeat the operations of f.1 through f.5) until all links have been traversed. - View Dependent Claims (39)
-
-
40. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for measuring relative quality of a search engine index, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to perform a two-level random walk among documents within a document set, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, and wherein the computer-readable program code devices configured to cause a computer to perform a two-level random walk comprise; a.1) computer-readable program code devices configured to cause a computer to select a host; a.2) computer-readable program code devices configured to cause a computer to select at random a document associated with the host; a.3) computer-readable program code devices configured to cause a computer to retrieve the selected document; a.4) computer-readable program code devices configured to cause a computer to select at random a link in the retrieved document; a.5) computer-readable program code devices configured to cause a computer to retrieve a document referenced by the selected link; and a.6) computer-readable program code devices configured to cause a computer to repeat the operations of a.4) and a.5) until all links have been traversed; b) computer-readable program code devices configured to cause a computer to, for each document encountered in the random walk, determine whether the document is indexed by the search engine index; and c) computer-readable program code devices configured to cause a computer to aggregate the results of the operations of b). - View Dependent Claims (41, 42)
-
-
43. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for measuring relative quality of a search engine index, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to perform a two-level random walk among documents within a document set, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, and wherein the computer-readable program code devices configured to cause a computer to perform a two-level random walk comprise; a.1) computer-readable program code devices configured to cause a computer to initialize a host set; a.2) computer-readable program code devices configured to cause a computer to initialize a document set for each host in the host set; a.3) computer-readable program code devices configured to cause a computer to select at random a host from the host set; a.4) computer-readable program code devices configured to cause a computer to select at random a link from a document in the document set of the selected host; a.5) computer-readable program code devices configured to cause a computer to add a host referenced by the link to the host set; a.6) computer-readable program code devices configured to cause a computer to add a document referenced by the link to the document set of the selected host; a.7) computer-readable program code devices configured to cause a computer to, responsive to the selected document containing at least one link; a.7.1) select at random a link from the selected document; a.7.2) select a document corresponding to the selected link; a.7.3) select a host corresponding to the selected document; and a.7.4) repeat the operations of a.5) through a.8) until all links have been traversed; and a.8) computer-readable program code devices configured to cause a computer to, responsive to the selected document not containing at least one link, repeat the operations of a.3) through a.6), and further conditionally repeating a.7) or a.8), until all documents have been traversed; b) computer-readable program code devices configured to cause a computer to, for each document encountered in the random walk, determine whether the document is indexed by the search engine index; and c) computer-readable program code devices configured to cause a computer to aggregate the results of the operations of b). - View Dependent Claims (44)
-
-
45. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, the computer program product comprising:
-
computer-readable program code devices configured to cause a computer to perform a two-level random walk among documents within a document set; and computer-readable program code devices configured to cause a computer to determine a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document and to further determine a value for; - View Dependent Claims (46, 47)
-
-
48. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, and wherein each document is associated with a host, the computer program product comprising:
-
computer-readable program code devices configured to cause a computer to perform a two-level random walk among documents within a document set, by; a.1) selecting a host; a.2) selecting at random a document associated with the host; a.3) retrieving the selected document; a.4) randomly choosing whether to select a random new document; a.5) responsive to choosing to select the random new document; a.5.1) selecting at random a host from among the previously selected hosts; and a.5.2) selecting at random a document associated with the host; and a.5.3) retrieving the selected document; a.6) responsive to choosing not to select the random new document; a.6.1) selecting at random a link in the retrieved document; and a.6.2) retrieving a document referenced by the selected link; and a.7) repeating the operations of a.4), and then conditionally repeating the operations of a.5) through a.5.3) or a.6) through a.6.2) depending upon the choosing made in a.4); and computer-readable program code devices configured to cause a computer to determine a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document. - View Dependent Claims (49)
-
-
50. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, wherein each document is associated with a host, the computer program product comprising:
-
computer-readable program code devices configured to cause a computer to perform a two-level random walk among documents within a document set, by; a.1) initializing a host set; a.2) initializing a document set for each host in the host set; a.3) selecting at random a host from the host set; a.4) randomly choosing whether to select a random new host; a.5) responsive to choosing to select the random new host; a.5.1) selecting at random a host from among the previously selected hosts; a.6) responsive to choosing not to select the random new host; a.6.1) selecting at random a document from the document set of the selected host; a.6.2) adding the selected host to the host set; a.6.3) adding the selected document to the document set of the selected host; a.6.4) responsive to the selected document containing at least one link; a.6.4.1) selecting at random a link from the selected document; a.6.4.2) selecting a document corresponding to the selected link; a.6.4.3) selecting a host corresponding to the selected document; and a.6.4.4) repeating the operations of a.6.2) through a.6.4.3) until all links have been traversed; and a.7) repeating the operations of a.3) through a.4), and then conditionally repeating a.5) through a.5.1) or a.6) through a.6.4.4; and computer-readable program code devices configured to cause a computer to determine a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document. - View Dependent Claims (51)
-
-
52. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to select a host; b) computer-readable program code devices configured to cause a computer to select at random a document associated with the host; c) computer-readable program code devices configured to cause a computer to retrieve the selected document; d) computer-readable program code devices configured to cause a computer to randomly choose whether to select a random new host; e) computer-readable program code devices configured to cause a computer to, responsive to choosing to select the random new host; e.1) select at random a new host from among the previously selected hosts; and e.2) repeat the operations of b) through d) and then conditionally e) through e.2) or f) through f.3) until all documents have been traversed; and f) computer-readable program code devices configured to cause a computer to, responsive to choosing not to select the random new host; f.1) select at random a link in the retrieved document; f.2) retrieve a document referenced by the selected link; and f.3) repeat the operations of d), and then conditionally e) through e.2) or f) through f.3) until all links have been traversed.
-
-
53. A computer program product comprising a computer-readable medium having computer-readable code embodied therein for measuring relative quality of a target document in a document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, the computer program product comprising:
-
a) computer-readable program code devices configured to cause a computer to perform a two-level random walk among documents within a document set by; a.1) initializing a host set; a.2) initializing a document set for each host in the host set; a.3) selecting at random a host from the host set; a.4) randomly choosing whether to select a random new host; a.5) responsive to choosing to select a random new host; a.5.1) selecting at random a new host from among the previously selected hosts; a.6) responsive to choosing not to select the random new host; a.6.1) selecting at random a link from a document in the document set of the selected host; a.6.2) adding the host referenced by the link to the host set; a.6.3) adding the document referenced by the link to the document set of the selected host; a.6.4) responsive to the selected document containing at least one link; a.6.4.1) selecting at random a link from the selected document; a.6.4.2) selecting a document corresponding to the selected link; a.6.4.3) selecting a host corresponding to the selected document; a.6.4.4) repeating the operations of a.6.2) through a.6.4.3) until all links have been traversed; and a.7) responsive to the selected document not containing at least one link, repeating the operations of a.3) through a.4), and then conditionally repeating a.5) through a.5.1) or a.6) through a.6.4.4); b) computer-readable program code devices configured to cause a computer to determine a quality metric responsive to the number of documents encountered during the two-level random walk that link to the target document; c) computer-readable program code devices configured to cause a computer to determine a quality metric for at least one additional target document; and d) computer-readable program code devices configured to cause a computer to rank the quality metric of the first document with respect to the quality metrics of the additional target documents. - View Dependent Claims (54)
-
-
55. A system for randomly walking through a hypertext-linked document set comprising a plurality of documents, wherein at least a subset of the documents contain a plurality of links to other documents, each document being associated with a host, the system comprising:
-
a) a host selector; b) a random document selector, coupled to the host selector, for selecting at random a document associated with the host; c) a document retriever, coupled to the random document selector, for retrieving the selected document; and d) a link selector, coupled to the document retriever; wherein, responsive to the host selector randomly choosing to select a random host; the host selector selects at random a host from among the previously selected hosts; the random document selector selects at random a document associated with the host; and the document retriever retrieves the selected document; and wherein, responsive to the host selector randomly choosing not to select a random host; the link selector selects at random a link in the retrieved document; and the document retriever retrieves a document referenced by the selected link; and wherein the link selector, the random document selector, and the document retriever repeat their respective operations until all links have been traversed.
-
Specification