System and method for topic-based document analysis for information filtering
First Claim
1. A Topic Analysis System for analyzing documents to assess the relevance of documents with respect to a topic of interest to a user, said system comprising:
- (a) a document evaluation subsystem for evaluating a document, said document evaluation subsystem comprises means for computing inter-relation among a plurality of key-phrases within the document, wherein the computation comprises the steps of;
computing the probability of occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document, and computing the probability of non-occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document; and
(b) a clustering subsystem for representing a plurality of documents with respect to the topic associated with the user, said clustering subsystem comprising means for clustering the plurality of documents, wherein the plurality of documents includes a plurality of documents analyzed by said system and a plurality of documents analyzed by the user, and the representing comprises the steps of;
assigning a document of the plurality of documents to a hierarchically related plurality of positive clusters and plurality of negative clusters.
2 Assignments
0 Petitions
Accused Products
Abstract
An information filtering process designed to sort through large volumes of dynamically generated textual information, incrementally learning process that learns as new text documents arrive and the user grades them by providing feedback. Text-based documents either dynamically retrieved from the Web or available in a textual repository on an Intranet are represented by applying key-word weighting'"'"'s after capturing the user reasoning for classifying the document as relevant or irrelevant. When a new item (document) arrives, the learning agent suggests a classification and also provides an explanation by pointing out the main features (key-phrases) of the item (document) responsible for its classification. The user looks at this and provides hints by showing a list of features (key-phrases) and are truly responsible for a particular way of classifying the document. This interaction method contributes to the learning process. The apparatus includes a feedback-based clustering scheme that models user'"'"'s interest profiles, a simple neural adaptation method for leaning the cluster centers to provide personalized information filtering for information seekers.
115 Citations
13 Claims
-
1. A Topic Analysis System for analyzing documents to assess the relevance of documents with respect to a topic of interest to a user, said system comprising:
-
(a) a document evaluation subsystem for evaluating a document, said document evaluation subsystem comprises means for computing inter-relation among a plurality of key-phrases within the document, wherein the computation comprises the steps of;
computing the probability of occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document, and computing the probability of non-occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document; and
(b) a clustering subsystem for representing a plurality of documents with respect to the topic associated with the user, said clustering subsystem comprising means for clustering the plurality of documents, wherein the plurality of documents includes a plurality of documents analyzed by said system and a plurality of documents analyzed by the user, and the representing comprises the steps of;
assigning a document of the plurality of documents to a hierarchically related plurality of positive clusters and plurality of negative clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
and the updating comprises the steps of: evaluating the said cluster type with respect to evaluation of the document by said system, and performing neural adaptation process.
-
-
9. The system of claim 6, wherein said clustering subsystem further comprises means for updating clusters based on evaluation of the document by the user, wherein the evaluation of the document includes the cluster type, and a plurality of absent key-phrases, and
the updating comprises the steps of: -
forming soft-rules based on said plurality of absent key-phrases, and forming a new cluster of type opposite to said cluster type.
-
-
10. The system of claim 6, wherein said clustering subsystem further comprises means for updating clusters based on evaluation of a document by the user, wherein the evaluation of the document includes cluster type, plurality of present key-phrases, and plurality of absent key-phrases, and
the updating comprises the steps of: -
forming a new cluster of type opposite of said cluster type based on said cluster type and said system evaluation of the document, deleting the document from cluster containing the document based on change in interest in said topic, performing neural adaptation process based on best possible matching of said plurality of present key-phrases with a cluster of clusters associated with said topic and the user, and forming a new cluster with said cluster type based on best possible matching of said plurality of present key-phrases with a cluster of clusters associated with the topic and the user.
-
-
11. The system of claim 6, wherein said clustering subsystem further comprises means for forming a new positive cluster of a document, wherein the formation comprises the steps of:
-
attaching strategic weights to strategic key-phrases of the document with respect to the topic, calculation of threshold, identification of overlap with any negative clusters of clusters associated with the topic and the user, and forming cluster embeddings.
-
-
12. The system of claim 6, wherein said clustering subsystem further comprises means for forming a new negative cluster of a document, wherein said formation comprises the steps of:
-
attaching strategic weights to strategic key-phrases of the document with respect to the topic, calculation of threshold, identification of overlap with any positive clusters of clusters associated with the topic and the user, and forming cluster embeddings.
-
-
13. The system of claim 6, wherein said clustering subsystem further comprises means for performing neural adaptation process of a cluster with respect to the document, wherein said performing comprises the steps of:
-
attaching the document to said cluster, determining new cluster center of said cluster, reclassifying plurality of documents based on new threshold of said cluster, and forming plurality of new clusters based on plurality of documents that fall outside said cluster.
-
Specification