Method and system for calculating document importance using document classifications
First Claim
1. A method in a computer system with a processor and a memory for calculating importance of documents, the documents having inter-document links, the method comprising:
- providing an organization of the documents into collections, each collection including a plurality of documents;
for each collection, identifying inter-collection links of the documents within the collection, an inter-collection link being a link between a document in one collection and a document in another collection;
calculating by the processor importance of each collection by applying a page ranking algorithm to the collections wherein the page ranking algorithm operates on nodes and links, each collection represented by a node and the inter-collection links represented by links between the nodes, to calculate importance for each collection of documents based on the inter-collection links from a document in one collection to a document in another collection;
dividing by the processor the collections into a high importance set of collections and a low importance set of collections, wherein the collections of the high importance set of collections have a calculated importance that is above the calculated importance of the collections in the low importance set of collections;
after dividing the collection into the high importance set of collections and the low importance set of collections,calculating by the processor importance of the documents in the high importance set of collections by applying a page ranking algorithm to the documents in the high importance set of collections wherein each document is represented by a node and the inter-document links between documents within the high importance set of collections are represented by links between the nodes to calculate the importance of each document in the high importance set of collections based on the inter-document links between the documents of the high importance set of collections wherein the documents of the low importance set of collections are not factored into calculating importance of the documents of the high importance set of collections; and
calculating importance of the documents in the low importance set of collections by applying a ranking algorithm to the documents in the low importance set of collections to calculate the importance of each document in the low importance set of collections by, for each collection of the low importance set of collections,calculating a local importance of each document in the collection of the low importance set of collections by applying a page ranking algorithm to the documents in the collection wherein each document in the collection is represented by a node and the intra-collection links between documents of the collection are represented by links between the nodes; and
for each document in the collection of the low importance set of collections, calculating a combined importance for that document based on the calculated importance of the collection and the calculated local importance of the document; and
presenting a combined ranking of the documents in the collections based on the calculated importance of the documents of the high importance set of collections and the calculated importance of the documents of the low importance set of collections.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for calculating the importance of web pages is provided. The web pages are organized hierarchically into collections. The system calculates the importance of each collection based on inter-collection links from a web page in one collection to a web page in another collection. The system then calculates the importance of web pages in the collections with a high calculated importance based on links between the web pages in those collections using, for example, a conventional page rank algorithm. The system may also calculate the importance of web pages in each collection with a low calculated importance separately based on the links between the web pages in the collection using, for example, a conventional page rank algorithm.
13 Citations
10 Claims
-
1. A method in a computer system with a processor and a memory for calculating importance of documents, the documents having inter-document links, the method comprising:
-
providing an organization of the documents into collections, each collection including a plurality of documents; for each collection, identifying inter-collection links of the documents within the collection, an inter-collection link being a link between a document in one collection and a document in another collection; calculating by the processor importance of each collection by applying a page ranking algorithm to the collections wherein the page ranking algorithm operates on nodes and links, each collection represented by a node and the inter-collection links represented by links between the nodes, to calculate importance for each collection of documents based on the inter-collection links from a document in one collection to a document in another collection; dividing by the processor the collections into a high importance set of collections and a low importance set of collections, wherein the collections of the high importance set of collections have a calculated importance that is above the calculated importance of the collections in the low importance set of collections; after dividing the collection into the high importance set of collections and the low importance set of collections, calculating by the processor importance of the documents in the high importance set of collections by applying a page ranking algorithm to the documents in the high importance set of collections wherein each document is represented by a node and the inter-document links between documents within the high importance set of collections are represented by links between the nodes to calculate the importance of each document in the high importance set of collections based on the inter-document links between the documents of the high importance set of collections wherein the documents of the low importance set of collections are not factored into calculating importance of the documents of the high importance set of collections; and calculating importance of the documents in the low importance set of collections by applying a ranking algorithm to the documents in the low importance set of collections to calculate the importance of each document in the low importance set of collections by, for each collection of the low importance set of collections, calculating a local importance of each document in the collection of the low importance set of collections by applying a page ranking algorithm to the documents in the collection wherein each document in the collection is represented by a node and the intra-collection links between documents of the collection are represented by links between the nodes; and for each document in the collection of the low importance set of collections, calculating a combined importance for that document based on the calculated importance of the collection and the calculated local importance of the document; and presenting a combined ranking of the documents in the collections based on the calculated importance of the documents of the high importance set of collections and the calculated importance of the documents of the low importance set of collections. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer system for calculating the importance of documents that are organized into collections, comprising:
-
a memory storing computer-executable instructions of; a component that calculates importance for each collection of documents by applying a page ranking algorithm to the collections, the page ranking algorithm operating on nodes and links, wherein each collection is represented by a node and inter-collection links are represented by links between the nodes, an inter-collection link being an inter-document link between a document in one collection and a document in another collection; a component that divides the collections into a high importance set of collections and a low importance set of collections, wherein the collections of the high importance set of collections have a calculated importance that is above the calculated importance of the collections in the low importance set of collections; a component that calculates importance of the documents of the high importance set of collections by applying a page ranking algorithm, wherein the documents are represented by nodes and the inter-document links between the documents of the high importance set of collections are represented by links between the nodes, wherein the documents of the low importance set of collections are not factored into the calculating of the importance of the documents of the high importance set of collections; a component that calculates importance of the documents in the low importance set of collections by applying a ranking algorithm to the documents in the low importance set of collections to calculate the importance of each document in the low importance set of collections by, for each collection in the low importance set of collections, calculating a local importance of each document in the collection by applying a page ranking algorithm to the documents in the collection wherein each document in the collection is represented by a node and the intra-collection links between documents of the collection are represented by links between the nodes; and for each document in the collection, calculating a combined importance for that document based on the calculated importance of the collection and the calculated local importance of the document; and a component that outputs an indication of the ranking of the documents of the collections based on the calculated importance of the documents of the high importance set of collections and the combined importance of the documents in the low importance set of collections; and a processor for executing the computer-executable instructions stored in the memory. - View Dependent Claims (7, 8, 9, 10)
-
Specification