Anchor tag indexing in a web crawler system
First Claim
Patent Images
1. A system comprising:
- at least one processor;
an index for searching documents, the index including terms associated with documents; and
memory storing instructions that, when executed by the at least one processor, perform operations including;
obtaining, via a web crawler, a source document,identifying, in the source document, annotation text, the annotation text being text within a predetermined distance of an outbound link to a target document and the annotation text including at least one term,storing in the index an association between the term and the source document,storing in the index, responsive to identifying the annotation text, an association between the term and the target document,identifying, responsive to receiving a query that includes the term, the source document and the target document as associated with the term in the index,responsive to identifying the associations, including the source document and the target document in a list of documents responsive to the query, andreturning the list of documents responsive to the query as a search result for the query.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
31 Citations
20 Claims
-
1. A system comprising:
-
at least one processor; an index for searching documents, the index including terms associated with documents; and memory storing instructions that, when executed by the at least one processor, perform operations including; obtaining, via a web crawler, a source document, identifying, in the source document, annotation text, the annotation text being text within a predetermined distance of an outbound link to a target document and the annotation text including at least one term, storing in the index an association between the term and the source document, storing in the index, responsive to identifying the annotation text, an association between the term and the target document, identifying, responsive to receiving a query that includes the term, the source document and the target document as associated with the term in the index, responsive to identifying the associations, including the source document and the target document in a list of documents responsive to the query, and returning the list of documents responsive to the query as a search result for the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
obtaining, via a web crawler, a source document; identifying, in the source document, annotation text, the annotation text being text within a predetermined distance of an outbound link to a target document and the annotation text including at least one term; storing in an index an association between the term and the source document; storing in the index, responsive to identifying the annotation text, an association between the term and the target document; identifying, responsive to receiving a query that includes the term, the source document and the target document as associated with the term in the index; responsive to identifying the associations, including the source document and the target document in a list of documents responsive to the query; and returning the list of documents responsive to the query as a search result for the query. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification