System and method for topic-based document analysis for information filtering

US 6,751,614 B1
Filed: 11/09/2000
Issued: 06/15/2004
Est. Priority Date: 11/09/2000
Status: Expired due to Term

First Claim

Patent Images

1. A Topic Analysis System for analyzing documents to assess the relevance of documents with respect to a topic of interest to a user, said system comprising:

(a) a document evaluation subsystem for evaluating a document, said document evaluation subsystem comprises means for computing inter-relation among a plurality of key-phrases within the document, wherein the computation comprises the steps of;

computing the probability of occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document, and computing the probability of non-occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document; and

(b) a clustering subsystem for representing a plurality of documents with respect to the topic associated with the user, said clustering subsystem comprising means for clustering the plurality of documents, wherein the plurality of documents includes a plurality of documents analyzed by said system and a plurality of documents analyzed by the user, and the representing comprises the steps of;

assigning a document of the plurality of documents to a hierarchically related plurality of positive clusters and plurality of negative clusters.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information filtering process designed to sort through large volumes of dynamically generated textual information, incrementally learning process that learns as new text documents arrive and the user grades them by providing feedback. Text-based documents either dynamically retrieved from the Web or available in a textual repository on an Intranet are represented by applying key-word weighting'"'"'s after capturing the user reasoning for classifying the document as relevant or irrelevant. When a new item (document) arrives, the learning agent suggests a classification and also provides an explanation by pointing out the main features (key-phrases) of the item (document) responsible for its classification. The user looks at this and provides hints by showing a list of features (key-phrases) and are truly responsible for a particular way of classifying the document. This interaction method contributes to the learning process. The apparatus includes a feedback-based clustering scheme that models user'"'"'s interest profiles, a simple neural adaptation method for leaning the cluster centers to provide personalized information filtering for information seekers.

115 Citations

13 Claims

1. A Topic Analysis System for analyzing documents to assess the relevance of documents with respect to a topic of interest to a user, said system comprising:
- (a) a document evaluation subsystem for evaluating a document, said document evaluation subsystem comprises means for computing inter-relation among a plurality of key-phrases within the document, wherein the computation comprises the steps of;
  
  computing the probability of occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document, and computing the probability of non-occurrence of a key-phrase in the document depending on the occurrence of said plurality of key-phrases in the document; and
  
  (b) a clustering subsystem for representing a plurality of documents with respect to the topic associated with the user, said clustering subsystem comprising means for clustering the plurality of documents, wherein the plurality of documents includes a plurality of documents analyzed by said system and a plurality of documents analyzed by the user, and the representing comprises the steps of;
  
  assigning a document of the plurality of documents to a hierarchically related plurality of positive clusters and plurality of negative clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1, wherein the document evaluation subsystem comprises means for identifying a plurality of key-phrases of the topic present in a document, wherein said identification is based on text processing of the document.
  - 3. The system of claim 2, wherein the document evaluation subsystem further comprises means for assigning strength value to a key-phrase of the document with respect to the topic based on the frequency of occurrence of said key-phrase in the document.
  - 4. The system of claim 2, wherein the document evaluation subsystem further comprises means for computing plurality of probabilities of a key-phrase of the document with respect to the topic based on co-occurrence of said key-phrase with a key-phrase of the document with respect to the topic, wherein the computation is based on co-occurrence matrix of strength values of plurality of key-phrases of the topic.
  - 5. The system of claim 2, wherein the document evaluation subsystem further comprises means for computing evaluation of the document with respect to the topic based on plurality of probabilities associated with plurality of key-phrases of the document with respect to the topic.
  - 6. The system of claim 1, wherein said clustering subsystem comprises means for updating clusters based on evaluation of the document by said system and evaluation of the document by a user in a consistent way.
  - 7. The system of claim 6, wherein said clustering subsystem further comprises means for updating clusters based on evaluation of the document by said system, wherein said updating comprises the means for identifying a cluster containing the document, removing the document from said cluster, and recomputing a cluster center of said cluster.
  - 8. The system of claim 6, wherein said clustering subsystem further comprises means for updating clusters based on evaluation of the document by the user, wherein said evaluation of the document includes cluster type,
9. The system of claim 6, wherein said clustering subsystem further comprises means for updating clusters based on evaluation of the document by the user, wherein the evaluation of the document includes the cluster type, and a plurality of absent key-phrases, andthe updating comprises the steps of:
- forming soft-rules based on said plurality of absent key-phrases, and forming a new cluster of type opposite to said cluster type.
10. The system of claim 6, wherein said clustering subsystem further comprises means for updating clusters based on evaluation of a document by the user, wherein the evaluation of the document includes cluster type, plurality of present key-phrases, and plurality of absent key-phrases, andthe updating comprises the steps of:
- forming a new cluster of type opposite of said cluster type based on said cluster type and said system evaluation of the document, deleting the document from cluster containing the document based on change in interest in said topic, performing neural adaptation process based on best possible matching of said plurality of present key-phrases with a cluster of clusters associated with said topic and the user, and forming a new cluster with said cluster type based on best possible matching of said plurality of present key-phrases with a cluster of clusters associated with the topic and the user.
11. The system of claim 6, wherein said clustering subsystem further comprises means for forming a new positive cluster of a document, wherein the formation comprises the steps of:
- attaching strategic weights to strategic key-phrases of the document with respect to the topic, calculation of threshold, identification of overlap with any negative clusters of clusters associated with the topic and the user, and forming cluster embeddings.
12. The system of claim 6, wherein said clustering subsystem further comprises means for forming a new negative cluster of a document, wherein said formation comprises the steps of:
- attaching strategic weights to strategic key-phrases of the document with respect to the topic, calculation of threshold, identification of overlap with any positive clusters of clusters associated with the topic and the user, and forming cluster embeddings.
13. The system of claim 6, wherein said clustering subsystem further comprises means for performing neural adaptation process of a cluster with respect to the document, wherein said performing comprises the steps of:
- attaching the document to said cluster, determining new cluster center of said cluster, reclassifying plurality of documents based on new threshold of said cluster, and forming plurality of new clusters based on plurality of documents that fall outside said cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tech Mahindra Limited
Original Assignee
Satyam Computer Services Limited of Mayfair Centre
Inventors
Rao, Kalyan
Primary Examiner(s)
Robinson, Greta
Assistant Examiner(s)
RAYYAN, SUSAN F

Application Number

US09/708,580
Time in Patent Office

1,314 Days
Field of Search

707/5, 707/104.1, 707/102, 704/245
US Class Current

1/1
CPC Class Codes

G06F 16/355 Class or cluster creation o...

Y10S 707/99935 Query augmenting and refini...

System and method for topic-based document analysis for information filtering

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

115 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for topic-based document analysis for information filtering

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

115 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links