Assigning document identification tags
First Claim
1. A computer-implemented method of assigning a document identifier to a new document, the new document to be added to a collection of documents, the method being performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:
- partitioning a set of document identifiers into a plurality of segments, each segment associated with a respective subset of the set of document identifiers, wherein the document identifiers comprise a predetermined set of monotonically ordered document identification tags;
subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identifiers, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric;
receiving query-independent information about the new document, the information including a value of the query-independent document importance metric and a unique document identifier for the new document;
selecting, based at least in part on the unique document identifier, one of the segments;
selecting, based at least on the query-independent information, one of the tiers associated with the selected segment;
assigning to the new document a document identifier from the respective subset of document identifiers associated with the selected tier, the assigned document identifier not previously assigned to any of the documents in the collection of documents, andrepeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents.
1 Assignment
0 Petitions
Accused Products
Abstract
Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
122 Citations
35 Claims
-
1. A computer-implemented method of assigning a document identifier to a new document, the new document to be added to a collection of documents, the method being performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:
-
partitioning a set of document identifiers into a plurality of segments, each segment associated with a respective subset of the set of document identifiers, wherein the document identifiers comprise a predetermined set of monotonically ordered document identification tags; subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identifiers, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric; receiving query-independent information about the new document, the information including a value of the query-independent document importance metric and a unique document identifier for the new document; selecting, based at least in part on the unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identifier from the respective subset of document identifiers associated with the selected tier, the assigned document identifier not previously assigned to any of the documents in the collection of documents, and repeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method of assigning a plurality of document identification tags to a plurality of new documents, the plurality of new documents to be added to a collection of documents, the method comprising:
-
on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors; partitioning a set of valid globally unique document identifiers into a plurality of segments, each segment associated with a respective subset of the set of valid globally unique document identifiers; subdividing each of the segments into a plurality of tiers, wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric, each segment having an associated, predetermined set of monotonically ordered document identification tags, and each tier of a respective segment having an associated subset of the set of document identification tags for the respective segment; receiving query-independent information about a new document, the information including a value of the query-independent document importance metric and a globally unique document identifier for the new document; selecting, based at least in part on the globally unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment;
assigning to the new document a document identification tag from the subset of document identification tags associated with the selected tier, wherein the document identification tag assigned to the new document is unique with respect to document identification tags assigned to other documents in the collection of documents; andrepeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents; wherein the assigned document identification tags are assigned to documents in the collection of documents having globally unique document identifiers associated with the respective segment. - View Dependent Claims (13)
-
-
14. A system for assigning a document identification tag to a new document, the new document to be added to a collection of documents, the system comprising:
-
one or more processors; and memory storing one or more programs to be executed by the one or more processors;
the one or more programs comprising instructions for;partitioning a set of monotonically ordered document identification tags into a plurality of segments, each segment associated with a respective subset of the set of monotonically ordered document identification tags; subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identification tags, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric; receiving query-independent information about the new document, the information including a value of the query-independent document importance metric and a globally unique document identifier for the new document; selecting, based at least in part on the globally unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identification tag from the subset of document identification tags associated with the selected tier, the assigned document identification tag not previously assigned to any of the documents in the collection of documents; repeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents; and wherein the assigned document identification tags are assigned to documents in the collection of documents having globally unique document identifiers associated with the respective segment. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
partitioning a set of monotonically ordered document identification tags into a plurality of segments, each segment associated with a respective subset of the set of monotonically ordered document identification tags; subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identification tags, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric; receiving query-independent information about a new document, the information including a value of the query-independent document importance metric and a globally unique document identifier for the new document; selecting, based at least in part on the globally unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identification tag from the respective subset of document identification tags associated with the selected tier, the assigned document identification tag not previously assigned to any of the documents in a collection of documents; repeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents; and wherein the assigned document identification tags are assigned to documents in the collection of documents having globally unique document identifiers associated with the respective segment. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification