SYSTEM, METHOD AND COMPUTER EXECUTABLE PROGRAM FOR INFORMATION TRACKING FROM HETEROGENEOUS SOURCES

US 20090006377A1
Filed: 01/23/2008
Published: 01/01/2009
Est. Priority Date: 01/23/2007
Status: Active Grant

First Claim

Patent Images

1. ) A system for information clustering, said system comprising;

a data accumulation part for accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;

a vector space generation part for generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;

a dimension reduction part for reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix;

a centroid vector determination part for generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and

an item repository for storing said centroid vectors together with said keywords and said weights of said centroid vector.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for information clustering comprising a data accumulation part for accumulating documents in a document repository, the documents having loosely related attributes, and defining a cluster between the documents being time sliced so as to define chunks of the documents; a vector space generation part for generating document-keyword vectors, the document-keyword vectors consisting of sparse numeral values depending on presence of key words; a dimension reduction part for reducing dimensions of the keywords to create a dimension reduction matrix of the document-keyword matrix; a centroid vector determination part for generating a centroid vector of the cluster, the centroid vectors being defined from keywords and weight of documents within the cluster; and an item repository for storing the centroid vectors together with the keywords and the weights of the centroid vector.

19 Citations

View as Search Results

18 Claims

1. ) A system for information clustering, said system comprising;
- a data accumulation part for accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
  
  a vector space generation part for generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
  
  a dimension reduction part for reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix;
  
  a centroid vector determination part for generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and
  
  an item repository for storing said centroid vectors together with said keywords and said weights of said centroid vector.
- View Dependent Claims (2, 3, 4)
- - 2. ) The system of claim 1, wherein said centroid vector determination part retrieves a principal document in said document using said principal component as a first query vector and subsequently retrieves documents defining said clusters using said principal document as a second query vector.
  - 3. ) The system of claim 1, wherein said vector space generation part executes dimension reduction to each of said chunks of said dimension reduction matrix and said centroid vector generation part generates clusters for every chunk of said dimension reduction matrix.
  - 4. ) The system of claim 1, wherein said system further comprises an item analyzer part for analyzing evolution of items with respect to said chunk of said document and for information tracking.

5. ) A computer executable method for information clustering, said method making said computer execute the steps of;
- accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
  
  generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
  
  reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and
  
  generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and
  
  storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector.
- View Dependent Claims (6, 7, 8)
- - 6. ) The method of claim 5, said method further comprising the steps of;
    - retrieving a principal document in said document using said principal component as a first query vector andsubsequently retrieving documents defining said clusters using said principal document as a second query vector.
  - 7. ) The method of claim 5, said method further comprising the steps of;
    - executing dimension reduction to each of said chunks of said dimension reduction matrix andgenerating clusters for every chunk of said dimension reduction matrix.
  - 8. ) The method of claim 5, said method further comprising the steps of;
    - analyzing evolution of items with respect to said chunk of said document and for information tracking.

9. ) A system for information tracking, said system comprising;
- a data accumulation part for accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
  
  a vector space generation part for generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
  
  a dimension reduction part for reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix;
  
  a centroid vector determination part for generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster;
  
  an item analyzer part for analyzing evolution of items with respect to said chunk of said document and for information tracking; and
  
  an item repository for storing said centroid vectors together with said keywords and said weights of said centroid vector.
- View Dependent Claims (10)
- - 10. ) The system of claim 9, wherein said centroid vector determination part retrieves a principal document in said document using said principal component as a first query vector and subsequently retrieves documents defining said clusters using said principal document as a second query vector.

11. ) A computer executable method for information tracking, said method making said computer execute the steps of;
- accumulating documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
  
  generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
  
  reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and
  
  generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster;
  
  storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector; and
  
  analyzing evolution of items with respect to said chunk of said document and for information tracking.
- View Dependent Claims (12)
- - 12. ) The method of claim 11, said method further comprising the steps of;
    - retrieving a principal document in said document using said principal component as a first query vector andsubsequently retrieving documents defining said clusters using said principal document as a second query vector.

13. ) A computer executable program for making a computer execute a method for information clustering, said method making said computer execute the steps of;
- accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
  
  generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
  
  reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and
  
  generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and
  
  storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector.
- View Dependent Claims (14, 15, 16)
- - 14. ) The program of claim 13, wherein said method further comprises the steps of;
    - retrieving a principal document in said document using said principal component as a first query vector andsubsequently retrieving documents defining said clusters using said principal document as a second query vector.
  - 15. ) The program of claim 13, wherein the method further comprises the steps of;
    - executing dimension reduction to each of said chunks of said dimension reduction matrix andgenerating clusters for every chunk of said dimension reduction matrix.
  - 16. ) The program of claim 13, wherein the method further comprises the steps of;
    - analyzing evolution of items with respect to said chunk of said document and for information tracking.

17. ) A computer executable program for making a computer execute a method for information tracking, said method making said computer execute the steps of;
- accumulating documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
  
  generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
  
  reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and
  
  generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster;
  
  storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector; and
  
  analyzing evolution of items with respect to said chunk of said document and for information tracking.
- View Dependent Claims (18)
- - 18. ) The program of claim 17, said method further comprising the steps of;
    - retrieving a principal document in said document using said principal component as a first query vector andsubsequently retrieving documents defining said clusters using said principal document as a second query vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Domo, Inc.
Original Assignee
International Business Machines Corporation
Inventors
Kobayashi, Mei, Kay Yung, Raylene

Granted Patent

US 7,996,407 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/355 Class or cluster creation o...

SYSTEM, METHOD AND COMPUTER EXECUTABLE PROGRAM FOR INFORMATION TRACKING FROM HETEROGENEOUS SOURCES

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

SYSTEM, METHOD AND COMPUTER EXECUTABLE PROGRAM FOR INFORMATION TRACKING FROM HETEROGENEOUS SOURCES

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others