Systems and methods for employing an orthogonal corpus for document indexing
First Claim
1. A method for topical indexing a document collection that is initially unconnected with a body of textual reference material, comprising:
- processing the body of textual reference material into a plurality of text portions, each text portion being associated with a single topic from a plurality of topics,processing said plurality of text portions to derive keywords for each topic,assigning a weight to each keyword in a text portion;
associating a keyword with a corresponding text portion if the weight of said keyword in said corresponding text portion is equal to or greater than a weight of said keyword in the text portions other than the corresponding text portion, or is equal to or greater than a predetermined threshold value;
forming first keyword-weight pairs of said associated keywords;
applying the associated keywords to at least one document from said initially unconnected document collection and forming second keyword-weight pairs associated with said at least one document;
forming a numeric score between the first and second keyword-weight pairs; and
based on said score, associating said at least one document from said initially unconnected document collection and said single topic from said plurality of topics.
3 Assignments
0 Petitions
Accused Products
Abstract
The invention provides for indexing and cataloging of content on the Internet, as well as from other stores of information, may be performed by applying a process that employs an orthogonal corpus, or corpora, of information, such as an Encyclopedia. To this end, the processes described herein identify the topics discussed within the corpus. The process also identifies within the corpus a set of keywords that are relevant to the topics presented in the corpus. The keywords associated with a topic may be employed to identify documents stored in another database that are related to the topic. A graphical representation of the index of topics found in the corpus may then be generated, with individual topics operating as links to these related documents. Thus, a user interested in reviewing content in the corpus related to a certain topic, may also activate a link in the graphical representation of the index to access other documents that have been identified as related to the topic of interest to the user.
102 Citations
15 Claims
-
1. A method for topical indexing a document collection that is initially unconnected with a body of textual reference material, comprising:
-
processing the body of textual reference material into a plurality of text portions, each text portion being associated with a single topic from a plurality of topics, processing said plurality of text portions to derive keywords for each topic, assigning a weight to each keyword in a text portion; associating a keyword with a corresponding text portion if the weight of said keyword in said corresponding text portion is equal to or greater than a weight of said keyword in the text portions other than the corresponding text portion, or is equal to or greater than a predetermined threshold value; forming first keyword-weight pairs of said associated keywords; applying the associated keywords to at least one document from said initially unconnected document collection and forming second keyword-weight pairs associated with said at least one document; forming a numeric score between the first and second keyword-weight pairs; and based on said score, associating said at least one document from said initially unconnected document collection and said single topic from said plurality of topics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
Specification