Method of analyzing documents
First Claim
Patent Images
1. A method of analyzing a plurality of documents, comprising:
- collecting and filtering terms from a plurality of documents;
identifying a term-frequency vector for each of the documents;
identifying a term-frequency matrix, wherein rows of the matrix comprise values for the term-frequency vectors;
projecting the term-frequency matrix onto a lower dimensional space using latent semantic analysis, to create a transformed term matrix;
developing a correlation matrix using the rows of the transformed term matrix;
creating a concept graph of connected components using a concept threshold, where each connected component is a set of terms that corresponds to a concept; and
clustering documents that contain concept term sets together.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for analyzing documents is disclosed. The method compares concepts consisting of groups of terms for similarity within a corpus of document, clusters documents that contain certain concept term sets together. It may also rank the documents within each cluster according to the frequency of term co-occurrence within the concepts.
-
Citations
20 Claims
-
1. A method of analyzing a plurality of documents, comprising:
-
collecting and filtering terms from a plurality of documents;
identifying a term-frequency vector for each of the documents;
identifying a term-frequency matrix, wherein rows of the matrix comprise values for the term-frequency vectors;
projecting the term-frequency matrix onto a lower dimensional space using latent semantic analysis, to create a transformed term matrix;
developing a correlation matrix using the rows of the transformed term matrix;
creating a concept graph of connected components using a concept threshold, where each connected component is a set of terms that corresponds to a concept; and
clustering documents that contain concept term sets together. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16)
-
-
14. A method for determining product problems from field service logs, comprising:
-
collecting and filtering terms from a plurality of service logs;
identifying a term-frequency vector for each of the logs;
identifying a term-frequency matrix, wherein rows of the matrix comprise values for the term-frequency vectors;
projecting the term-frequency matrix onto a lower dimensional space using latent semantic analysis, to create a transformed term matrix;
determining a correlation matrix using the rows of the transformed term matrix;
creating a concept graph of connected components using a concept threshold, where each connected component comprises a set of terms that corresponds to a concept;
clustering documents that contain concept term sets together;
for each cluster and corresponding term sets, ranking logs in each cluster by frequency of occurrence of terms in the concept term set.
-
-
17. A method analyzing a plurality of documents, comprising:
-
collecting and filtering terms from a plurality of documents;
identifying a term-frequency vector for each of the documents;
identifying a term-frequency matrix, wherein rows of the matrix comprise values for the term-frequency vectors;
projecting the term-frequency matrix onto a lower dimensional space using latent semantic analysis, to create a transformed term matrix;
developing a correlation matrix using the rows of the transformed term matrix;
creating a dendrogram of related concepts using a function of the correlation matrix;
identifying branches of the dendrogram corresponding to related concepts; and
clustering documents that contain concept term sets together. - View Dependent Claims (18, 19, 20)
-
Specification