Method and system for discovering significant subsets in collection of documents
First Claim
Patent Images
1. A method of discovering a subset in a collection of documents, comprising:
- identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents;
analyzing a first document in the collection of documents to determine a characteristic feature of said first document;
generating a profile of said first document based on said characteristic feature; and
comparing a subsequent document in the collection of documents to said profile,wherein said set of documents comprises a cluster of documents in said plurality of documents, andwherein when said subsequent document matches said profile, said subsequent document is included in said set of documents and a next subsequent document is compared at least to said subsequent document.
1 Assignment
0 Petitions
Accused Products
Abstract
A method (and system) of discovering a significant subset in a collection of documents, includes identifying a set of documents from a plurality of documents based on a likelihood that documents in the set of documents carries an instance of information that is characteristic to the documents in the set of documents.
146 Citations
7 Claims
-
1. A method of discovering a subset in a collection of documents, comprising:
-
identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents; analyzing a first document in the collection of documents to determine a characteristic feature of said first document; generating a profile of said first document based on said characteristic feature; and comparing a subsequent document in the collection of documents to said profile, wherein said set of documents comprises a cluster of documents in said plurality of documents, and wherein when said subsequent document matches said profile, said subsequent document is included in said set of documents and a next subsequent document is compared at least to said subsequent document. - View Dependent Claims (2, 3, 4)
-
-
5. A method of discovering a subset in a collection of documents, comprising:
-
identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents; analyzing a first document in the collection of documents to determine a characteristic feature of said first document; generating a profile of said first document based on said characteristic feature; and comparing a subsequent document in the collection of documents to said profile, wherein said set of documents comprises a cluster of documents in said plurality of documents, and wherein when said subsequent document does not match said profile, said subsequent document is excluded from said set of documents and a new profile is generated for said subsequent document.
-
-
6. A method of discovering a subset in a collection of documents, comprising:
-
identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents; generating a profile for a first document based on characteristic features of the first document; and comparing a subsequent document in the collection of documents to said profile, wherein when said subsequent document does not match said profile, said subsequent document is excluded from said set of documents and a new profile is generated for said subsequent document.
-
-
7. A method of discovering a subset in a collection of documents, comprising:
-
identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents; generating a profile for a first document based on characteristic features of the first document; and comparing a subsequent document in the collection of documents to said profile, wherein when said subsequent document does not match said profile, said subsequent document is excluded from said set of documents and a new profile is generated for said subsequent document.
-
Specification