Method and system for filtering content in a discovered topic
First Claim
1. A method of content filtering in a discovered topic comprising:
- preprocessing querying data, said querying data having caused a retrieval of a collection of documents, said collection of documents comprising documents having content comprising related subject matter and comprising documents having content comprising extraneous subject matter, relative to said querying data;
clustering said collection of documents in accordance with said querying data, said clustering enabling said discovered topic to be identified, said discovered topic relative to said querying data; and
postfiltering said collection of documents to generate a collection of documents having content comprising said related subject matter relative to said topic, wherein extraneous subject matter are excluded.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of filtering content in a discovered topic. In one embodiment, a method for filtering content in a discovered topic is comprised of preprocessing querying data. The querying data has caused retrieval of a collection of documents. The collection of documents includes documents containing subject matter related to said querying data. The collection of documents also includes documents containing subject matter extraneous to said querying data. The querying data is clustered. Clustering of the querying data enables the discovered topic to be identified. The collection of documents are postfiltered. The postfiltering of the collection of documents generates a collection of documents having the related subject matter, and extraneous subject matter is excluded.
78 Citations
25 Claims
-
1. A method of content filtering in a discovered topic comprising:
-
preprocessing querying data, said querying data having caused a retrieval of a collection of documents, said collection of documents comprising documents having content comprising related subject matter and comprising documents having content comprising extraneous subject matter, relative to said querying data;
clustering said collection of documents in accordance with said querying data, said clustering enabling said discovered topic to be identified, said discovered topic relative to said querying data; and
postfiltering said collection of documents to generate a collection of documents having content comprising said related subject matter relative to said topic, wherein extraneous subject matter are excluded. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. In a web based environment, a method of topic discovery and content filtering comprising:
-
receiving a query causing a retrieval of a collection of documents related to said query, said collection of documents comprising documents having content comprising extraneous subject matter and documents having content comprising desired subject matter;
preprocessing query data, said query data information pertaining to said query;
clustering said query data enabling discovery of a topic relative to said query; and
postfiltering said retrieved collection of documents to generate a collection of documents having content comprising said desired subject matter relative to a discovered topic. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer system comprising:
-
a bus;
a display device coupled to said bus;
a storage device coupled to said bus; and
a processor coupled to said bus, said processor for;
receiving a query, said query causing retrieval of a collection of documents related to said query, wherein said collection of documents comprises documents comprising extraneous subject matter and documents comprising desired subject matter;
preprocessing data relative to said query;
clustering said data relative to said query to enable identification of a discovered topic relative to said querying data;
postfiltering said retrieved documents in said collection of documents, wherein said postfiltering generates a collection of documents comprising documents having content comprising said desired subject matter relative to said discovered topic; and
labeling said discovered topic in accordance with a metric based upon document similarity, said metric used to measure cohesion between said documents in said collection of documents, wherein a high measure of cohesion indicates a document containing subject matter relative to said topic, said topic displayed to a user via said display device. - View Dependent Claims (23, 24, 25)
-
Specification