Hypertext document retrieval system and method
First Claim
1. A method of indexing documents, the method comprising:
- obtaining a list of hyperlinks pointing to each document, wherein each hyperlink includes one or more terms;
indexing each document with the terms in the hyperlinks pointing to that document, wherein a number of hyperlinks, each containing a particular term, may point to a document; and
indexing the number of hyperlinks containing the particular term pointing to the document with that document.
11 Assignments
0 Petitions
Accused Products
Abstract
A search engine for retrieving documents pertinent to a query indexes documents in accordance with hyperlinks pointing to those documents. The indexer traverses the hypertext database and finds hypertext information including the address of the document the hyperlinks point to and the anchor text of each hyperlink. The information is stored in an inverted index file, which may also be used to calculate document link vectors for each hyperlink pointing to a particular document. When a query is entered, the search engine finds all document vectors for documents having the query terms in their anchor text. A query vector is also calculated, and the dot product of the query vector and each document link vector is calculated. The dot products relating to a particular document are summed to determine the relevance ranking for each document.
-
Citations
25 Claims
-
1. A method of indexing documents, the method comprising:
-
obtaining a list of hyperlinks pointing to each document, wherein each hyperlink includes one or more terms; indexing each document with the terms in the hyperlinks pointing to that document, wherein a number of hyperlinks, each containing a particular term, may point to a document; and indexing the number of hyperlinks containing the particular term pointing to the document with that document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of ranking documents based on the document'"'"'s relevance to a query, wherein the query comprises at least one term, and wherein hyperlinks contain terms and point to corresponding documents, the method comprising:
-
comparing the words in the query to the words in a hyperlink to obtain a relevance ranking for each hyperlink; and summing the relevance rankings for each hyperlink pointing to a particular document to obtain a summed relevance score for that document. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
Specification