SYSTEMS AND METHODS FOR METRIC DATA SMOOTHING
First Claim
1. A method comprising:
- receiving a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document;
receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix;
adjusting, for each document, a frequency value of the second column based on the frequency value of the first column;
projecting each frequency value of the matrix into a reference space to generate a set of projection values in the reference space;
identifying a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values;
clustering, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents; and
generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.
5 Assignments
0 Petitions
Accused Products
Abstract
An exemplary method may comprise receiving a matrix for a set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document, receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix, adjusting, for each document, a frequency value of the second column based on the frequency value of the first column, projecting each frequency value into a reference space to generate a set of projection values, identifying a plurality of subsets of the reference space, clustering, for each subset of the plurality of subsets, at least some documents that correspond to projection values, and generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.
48 Citations
27 Claims
-
1. A method comprising:
-
receiving a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document; receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix; adjusting, for each document, a frequency value of the second column based on the frequency value of the first column; projecting each frequency value of the matrix into a reference space to generate a set of projection values in the reference space; identifying a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values; clustering, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents; and generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
2. The method of claim 2, wherein clustering, for the each subset of the plurality of subsets, at least some documents of the set of documents comprises:
-
determining a distance between at least two documents of the set of documents corresponding to at least two projection values in a first subset of the plurality of subsets; comparing the distance to a threshold value; and clustering each of the at least two documents in two different clusters or one cluster based on the comparison.
-
-
14. A system comprises:
-
an input module configured to receive a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document; a smoothing module configured to receive an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix and to adjust, for each document, a frequency value of the second column based on the frequency value of the first column; and an analysis module configured to project each frequency value of the matrix into a reference space to generate a set of projection values in the reference space, to identify a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values, to cluster, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents, and to generate a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer readable medium comprising instructions, the instructions being executable by a processor to perform a method, the method comprising:
-
receiving a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document; receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix; adjusting, for each document, a frequency value of the second column based on the frequency value of the first column; projecting each frequency value of the matrix into a reference space to generate a set of projection values in the reference space; identifying a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values; clustering, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents; and generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.
-
Specification