SYSTEM, METHOD AND COMPUTER EXECUTABLE PROGRAM FOR INFORMATION TRACKING FROM HETEROGENEOUS SOURCES
First Claim
1. ) A system for information clustering, said system comprising;
- a data accumulation part for accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents;
a vector space generation part for generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents;
a dimension reduction part for reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix;
a centroid vector determination part for generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and
an item repository for storing said centroid vectors together with said keywords and said weights of said centroid vector.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for information clustering comprising a data accumulation part for accumulating documents in a document repository, the documents having loosely related attributes, and defining a cluster between the documents being time sliced so as to define chunks of the documents; a vector space generation part for generating document-keyword vectors, the document-keyword vectors consisting of sparse numeral values depending on presence of key words; a dimension reduction part for reducing dimensions of the keywords to create a dimension reduction matrix of the document-keyword matrix; a centroid vector determination part for generating a centroid vector of the cluster, the centroid vectors being defined from keywords and weight of documents within the cluster; and an item repository for storing the centroid vectors together with the keywords and the weights of the centroid vector.
19 Citations
18 Claims
-
1. ) A system for information clustering, said system comprising;
-
a data accumulation part for accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents; a vector space generation part for generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents; a dimension reduction part for reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; a centroid vector determination part for generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and an item repository for storing said centroid vectors together with said keywords and said weights of said centroid vector. - View Dependent Claims (2, 3, 4)
-
-
5. ) A computer executable method for information clustering, said method making said computer execute the steps of;
-
accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents; generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents; reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector. - View Dependent Claims (6, 7, 8)
-
-
9. ) A system for information tracking, said system comprising;
-
a data accumulation part for accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents; a vector space generation part for generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents; a dimension reduction part for reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; a centroid vector determination part for generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; an item analyzer part for analyzing evolution of items with respect to said chunk of said document and for information tracking; and an item repository for storing said centroid vectors together with said keywords and said weights of said centroid vector. - View Dependent Claims (10)
-
-
11. ) A computer executable method for information tracking, said method making said computer execute the steps of;
-
accumulating documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents; generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents; reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector; and analyzing evolution of items with respect to said chunk of said document and for information tracking. - View Dependent Claims (12)
-
-
13. ) A computer executable program for making a computer execute a method for information clustering, said method making said computer execute the steps of;
-
accumulating and clustering documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents; generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents; reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; and storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector. - View Dependent Claims (14, 15, 16)
-
-
17. ) A computer executable program for making a computer execute a method for information tracking, said method making said computer execute the steps of;
-
accumulating documents in a document repository, said documents including loosely related clusters between said documents being time sliced so as to define chunks of said documents; generating document-keyword vectors, said document-keyword vectors consisting of sparse numeral values depending on presence of keywords in said documents; reducing dimensions of said keywords to create a dimension reduction matrix of said document-keyword matrix; and generating a centroid vector of said cluster, said cluster being retrieved from said document-keyword vector using a principal component in a same line of said dimension reduction matrix, said centroid vectors being defined from keywords and weight of documents within said cluster; storing said centroid vectors in an item repository together with said keywords and said weights of said centroid vector; and analyzing evolution of items with respect to said chunk of said document and for information tracking. - View Dependent Claims (18)
-
Specification