System and method for incorporating anchor text into ranking search results
First Claim
1. A computer-implemented method for presenting a ranking of search results, comprising:
- providing an index to a plurality of documents including;
a main index associating with each of the documents a frequency of one or more terms being included in each of the documents;
an anchor text index associating with each of the documents an anchor text frequency of the one or more terms being included in anchor text in a source document referencing each of the documents;
receiving a query including at least one query term;
applying the query to the index to yield results of the query identifying one or more of the documents that include the at least one query term;
applying a scoring function to generate a score for each of the one or more documents included in the results of the query, wherein the scoring function (score) includes one of;
where;
wtf is a weighted term frequency applying a weight to a frequency with which a given query term is included in the document;
wtfAnchor is a weighted term frequency applying a weight to a frequency with which the given query term is included in anchor text referencing the document;
k1 is a constant;
b is a constant;
wdl is a weighted document length applying a weight to a length of the document being scored;
avwdl is an average weighted document length of all documents being scored;
N is the number of documents on the network; and
n is the number of documents including at least one appearance of a given query term; and
generating an output of the ranked results of the query to be displayed to a user.
2 Assignments
0 Petitions
Accused Products
Abstract
Search results of a search query on a network are ranked according to a scoring function that incorporates anchor text as a term. The scoring function is adjusted so that a target document of anchor text reflect the use of terms in the anchor text in the target document'"'"'s ranking. Initially, the properties associated with the anchor text are collected during a crawl of the network. A separate index is generated that includes an inverted list of the documents and the terms in the anchor text. The index is then consulted in response to a query to calculate a document'"'"'s score. The score is then used to rank the documents and produce the query results.
210 Citations
18 Claims
-
1. A computer-implemented method for presenting a ranking of search results, comprising:
-
providing an index to a plurality of documents including; a main index associating with each of the documents a frequency of one or more terms being included in each of the documents; an anchor text index associating with each of the documents an anchor text frequency of the one or more terms being included in anchor text in a source document referencing each of the documents; receiving a query including at least one query term; applying the query to the index to yield results of the query identifying one or more of the documents that include the at least one query term; applying a scoring function to generate a score for each of the one or more documents included in the results of the query, wherein the scoring function (score) includes one of; where; wtf is a weighted term frequency applying a weight to a frequency with which a given query term is included in the document; wtfAnchor is a weighted term frequency applying a weight to a frequency with which the given query term is included in anchor text referencing the document; k1 is a constant; b is a constant; wdl is a weighted document length applying a weight to a length of the document being scored; avwdl is an average weighted document length of all documents being scored; N is the number of documents on the network; and n is the number of documents including at least one appearance of a given query term; and generating an output of the ranked results of the query to be displayed to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
8. The computer-implemented method of claim 1, wherein a strength of the length normalization provided by BAnchor is adjusted by choosing a different constant value associated with BAnchor.
-
9. A computer-readable storage medium storing instructions executable on a computing system, comprising instructions to:
-
evaluate contents of each of a plurality of documents on a network, including; recording a frequency of terms included within the document; making an entry in an anchor text table for each anchor text entry referencing another document; compile an index, including; generating a main index that associates with each of the documents a frequency with which the at least one term is included in the document; generating an anchor text index that associates with each of the documents a frequency of terms listed in anchor text entries in the anchor text table referencing the document; receive a query including at least one query term; apply the query to the index to yield results of the query identifying one or more of the documents that include the at least one query term; apply applying a scoring function to generate a score for each of the one or more documents included in the results of the query, applying a scoring function to generate a score for each of the one or more documents included in the results of the query, wherein the scoring function (score) includes one of; where; wtf is a weighted term frequency applying a weight to a frequency with which a given query term is included in the document; wtfAnchor is a weighted term frequency applying a weight to a frequency with which the given query term is included in anchor text referencing the document; k1 is a constant; b is a constant; wdl is a weighted document length applying a weight to a length of the document being scored; avwdl is an average weighted document length of all documents being scored; N is the number of documents on the network; and n is the number of documents including at least one appearance of a given query term; and generate an output of the ranked results of the query to be displayed to a user. - View Dependent Claims (10, 11, 12, 13)
-
-
12. The computer-readable storage medium of claim 9, wherein a strength of the length normalization provided by BAnchor is adjusted by choosing a different constant value associated with BAnchor.
-
13. The computer-readable storage medium of claim 9, further comprising causing an output of the ranked results of the query to be presented to a user.
-
14. A search engine system, comprising:
-
a processor; an index for a plurality of documents, including; a main index associating with each of the documents a frequency of one or more terms being included in each of the documents; an anchor text index associating with each of the documents an anchor text frequency of the one or more terms being included in anchor text in a source document referencing each of the documents; a ranking system, including; a query interface configured to receive a query including at least one query term and apply the query to the index to identify one or more of the documents that include the at least one query term; a scoring function to generate a score for each of the one or more documents included in the results of the query, wherein the scoring function (score) includes;
Score=Σ
(wtf/B+wtf—
anchor/B—
anchor)(k1+1)*log(N/n) (k1+(wtf/B)+(wtf—
anchor/B—
anchor)where; wtf is a weighted term frequency applying a weight to a frequency with which a given query term is included in the document; wtfAnchor is a weighted term frequency applying a weight to a frequency with which the given query term is included in anchor text referencing the document; k1 is a constant; wdl is a weighted document length applying a weight to a length of the document being score; avwdl is an average weighted document length of all documents being scored; B is a document length normalization component defined as B=((1−
b)+b*wdl/avwdl) where b is a constant;BAnchor is an anchor text normalization component defined as B=((1−
b)+b*wdl/avwdl) where b is a constant;N is the number of documents on the network; and n is the number of documents including at least one appearance of a given query term; a ranking system configured to rank the results of the query based on the score generated for each of the documents included in the results of the query. - View Dependent Claims (15, 16, 17, 18)
-
Specification