System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents
First Claim
1. A system for identifying documents relevant to a focus document in a linked collection of documents, said system comprising:
- means for obtaining raw data for said linked collection of documents, said raw data including usage data, topology data and content data;
means for creating usage, topology and text similarity maps for said linked collection of documents from said raw data; and
means for predicting a relevant set of documents for a subset of said linked collection of documents using one or more of said usage, topology and text similarity maps.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for extracting and analyzing information from a collection of linked documents at a locality to enable categorization of documents and prediction of documents relevant to a focus document. The system obtains and analyzes topology, usage and path information from for a collection at a locality, e.g. a web locality on the world wide web. For categorization, document meta information is represented as document vectors. Predefined criteria is applied to the document vectors to create lists of "similar" types of documents. For relevance prediction, networks representing topology, usage path and text similarity amongst the documents in the collection are created. A spreading activation technique is applied to the networks starting at a focus document to predict the documents relevant to the focus document. Using category and relevance prediction information, tools can be built to enable a user to more efficiently traverse through the collection of linked documents.
317 Citations
13 Claims
-
1. A system for identifying documents relevant to a focus document in a linked collection of documents, said system comprising:
-
means for obtaining raw data for said linked collection of documents, said raw data including usage data, topology data and content data; means for creating usage, topology and text similarity maps for said linked collection of documents from said raw data; and means for predicting a relevant set of documents for a subset of said linked collection of documents using one or more of said usage, topology and text similarity maps. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for identifying documents relevant to a focus document in a linked collection of documents, said method comprising the steps of:
-
a) obtaining raw data for said linked collection of documents, said raw data including topology information and usage information; b) generating text similarity information between documents in said linked collection of documents; c) creating a plurality of characteristic maps from said raw data and text similarity information, each of said plurality of characteristic maps indicating relationships between documents in said linked collection of documents; d) selecting one or more focus documents from said linked collection of documents; e) spreading activation starting at said one or more focus documents through one or more of said plurality of characteristic maps until activation settles into an asymptotic pattern; and f) identifying relevant documents as those meeting a predetermined activation criteria. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification