Assigning document identification tags
First Claim
1. A computer-implemented method of assigning a document identification tag to a new document, the new document to be added to a collection of documents, the method comprising:
- subdividing a predetermined set of monotonically ordered document identification tags into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identification tags, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric;
receiving query-independent information about the new document, the information including the query-independent document importance metric;
selecting, based at least on the query-independent information, one of the tiers;
assigning to the new document a document identification tag from the respective subset of document identification tags associated with the selected tier, the assigned document identification tag not previously assigned to any of the documents in the collection of documents; and
storing an assignment of the document identification tag from the respective subset of document identification tags associated with the selected tier to the new document in a computer-readable medium.
2 Assignments
0 Petitions
Accused Products
Abstract
Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
-
Citations
41 Claims
-
1. A computer-implemented method of assigning a document identification tag to a new document, the new document to be added to a collection of documents, the method comprising:
-
subdividing a predetermined set of monotonically ordered document identification tags into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identification tags, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric; receiving query-independent information about the new document, the information including the query-independent document importance metric; selecting, based at least on the query-independent information, one of the tiers; assigning to the new document a document identification tag from the respective subset of document identification tags associated with the selected tier, the assigned document identification tag not previously assigned to any of the documents in the collection of documents; and storing an assignment of the document identification tag from the respective subset of document identification tags associated with the selected tier to the new document in a computer-readable medium. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method of assigning a plurality of document identification tags to a plurality of new documents, the plurality of new documents to be added to a collection of documents, the method comprising:
-
partitioning a set of valid globally unique document identifiers into a plurality of segments, each segment associated with a respective subset of the set of valid globally unique document identifiers; subdividing each of the segments into a plurality of tiers, wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric, each segment having an associated, predetermined set of monotonically ordered document identification tags, and each tier of a respective segment having an associated subset of the set of document identification tags for the respective segment; receiving query-independent information about a new document, the information including the query-independent document importance metric and a globally unique document identifier; selecting, based at least in part on the globally unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identification tag from the subset of document identification tags associated with the selected tier, wherein the document identification tag assigned to the new document is unique with respect to document identification tags assigned to other documents in the collection of documents; storing an assignment of the document identification tag from the subset of document identification tags associated with the selected tier to the new document in a computer-readable medium; and repeating the receiving, selecting a segment, selecting a tier, assigning, and storing with respect to one or more additional new documents; wherein the assigned document identification tags are assigned to documents in the collection of documents having globally unique document identifiers associated with the respective segment. - View Dependent Claims (15)
-
-
16. A system for assigning a document identification tag to a new document, the new document to be added to a collection of documents, the system comprising:
-
at least one central processing unit; and a communications bus for connecting the central processing unit to a computer readable medium; the computer readable medium comprising; a data structure representing a subdivision of a predetermined set of monotonically ordered document identification tags into a plurality of tiers, wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric, wherein each tier is associated with a respective subset of the set of document identification tags; an interface configured to receive query-independent information about the new document, the information including the query-independent document importance metric; and a tag assignment module configured to select, based at least on the query-independent information, one of the tiers; assign to the new document a document identification tag from the respective subset of document identification tags associated with the selected tier, the assigned document identification tag not previously assigned to any of the documents in the collection of documents; and store an assignment of the document identification tag from the respective subset of document identification tags associated with the selected tier to the new document in the computer readable medium. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism therein, the computer program mechanism comprising:
-
a data structure representing a subdivision of a predetermined set of monotonically ordered document identification tags into a plurality of tiers, wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric, wherein each tier is associated with a respective subset of the set of document identification tags, and wherein the data structure includes a representation of a monotonic ordering of the tiers; an interface configured to receive query-independent information about a new document, the information including the query-independent document importance metric; and a tag assignment module including instructions for selecting, based at least on the query-independent information, one of the tiers; and assigning to the new document a document identification tag from the respective subset of document identification tags associated with the selected tier, the assigned document identification tag not previously assigned to any of the documents in a collection of documents. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
Specification