Method and system for discovering significant subsets in collection of documents

US 7,360,686 B2
Filed: 05/11/2005
Issued: 04/22/2008
Est. Priority Date: 05/11/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method of discovering a subset in a collection of documents, comprising:

identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents;

analyzing a first document in the collection of documents to determine a characteristic feature of said first document;

generating a profile of said first document based on said characteristic feature; and

comparing a subsequent document in the collection of documents to said profile,wherein said set of documents comprises a cluster of documents in said plurality of documents, andwherein when said subsequent document matches said profile, said subsequent document is included in said set of documents and a next subsequent document is compared at least to said subsequent document.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method (and system) of discovering a significant subset in a collection of documents, includes identifying a set of documents from a plurality of documents based on a likelihood that documents in the set of documents carries an instance of information that is characteristic to the documents in the set of documents.

146 Citations

7 Claims

1. A method of discovering a subset in a collection of documents, comprising:
- identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents;
  
  analyzing a first document in the collection of documents to determine a characteristic feature of said first document;
  
  generating a profile of said first document based on said characteristic feature; and
  
  comparing a subsequent document in the collection of documents to said profile,wherein said set of documents comprises a cluster of documents in said plurality of documents, andwherein when said subsequent document matches said profile, said subsequent document is included in said set of documents and a next subsequent document is compared at least to said subsequent document.
- View Dependent Claims (2, 3, 4)
- - 2. A signal-bearing medium tangibly embodying a program of machine readable instructions executable by a digital processing apparatus to perform the method of discovering a subset in a collection of documents according to claim 1.
  - 3. The signal-bearing medium according to claim 2, further comprising:
    - isolating, after said identifying, said set of documents from the collection of documents.
  - 4. A method of deploying computing infrastructure, comprising integrating computer-readable code into a computing system, wherein the computer readable code in combination with the computing system is capable of performing the method of discovering a subset in a collection of documents, according to claim 1.

5. A method of discovering a subset in a collection of documents, comprising:
- identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents;
  
  analyzing a first document in the collection of documents to determine a characteristic feature of said first document;
  
  generating a profile of said first document based on said characteristic feature; and
  
  comparing a subsequent document in the collection of documents to said profile,wherein said set of documents comprises a cluster of documents in said plurality of documents, andwherein when said subsequent document does not match said profile, said subsequent document is excluded from said set of documents and a new profile is generated for said subsequent document.

6. A method of discovering a subset in a collection of documents, comprising:
- identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents;
  
  generating a profile for a first document based on characteristic features of the first document; and
  
  comparing a subsequent document in the collection of documents to said profile,wherein when said subsequent document does not match said profile, said subsequent document is excluded from said set of documents and a new profile is generated for said subsequent document.

7. A method of discovering a subset in a collection of documents, comprising:
- identifying a set of documents from a plurality of documents based on a likelihood that documents in said set of documents carry an instance of information that is characteristic to the documents in said set of documents;
  
  generating a profile for a first document based on characteristic features of the first document; and
  
  comparing a subsequent document in the collection of documents to said profile,wherein when said subsequent document does not match said profile, said subsequent document is excluded from said set of documents and a new profile is generated for said subsequent document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation, JP Morgan Chase Bank, N.A. (JP Morgan Chase & Co)
Original Assignee
International Business Machines Corporation, JP Morgan Chase Bank, N.A. (JP Morgan Chase & Co)
Inventors
Walach, Eugene, Reilly, Micheal J., Nowicki, Tomasz J., Hoch, Robert, Ibikunle, Tayo, Tresser, Charles P., Sachar, Howard E., Karnin, Ehud, Liberis, William A.
Primary Examiner(s)
HESS, DANIEL A

Application Number

US11/126,211
Publication Number

US 20060255124A1
Time in Patent Office

1,077 Days
Field of Search

235/380, 235/379, 382/137, 382/139, 705/45, 705/39, 705/44
US Class Current

235/379
CPC Class Codes

G06Q 20/042 characterized in that the p...

G07F 19/00 Complete banking systems; C...

Method and system for discovering significant subsets in collection of documents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

146 Citations

7 Claims

Specification

Use Cases

Quick Links

Others

Method and system for discovering significant subsets in collection of documents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

146 Citations

7 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others