System and method for analysis and navigation of data
First Claim
Patent Images
1. A data analysis system, comprising:
- a central processing unit (CPU);
a Raw Pair Distance (RPD) module operatively coupled to the CPU and configured to receive a corpus of data comprising a plurality of sequential terms and output a raw pair distance table listing each occurrence in the corpus of two different terms separated by no more than a predetermined number of other terms, wherein each row of the raw pair distance table includes the two different terms and the number of other terms separating the two different terms;
a Mean Pair Distance (MPD) module operatively coupled to the CPU and configured to receive the raw pair distance table, select a plurality of nodes from the terms included in the raw pair distance table;
output a nodes table wherein each row of the nodes table includes one node, a corresponding unique numerical node ID number, and a corresponding mass value of the node, and output a node-node distance matrix using the raw pair distance table wherein each row of the node-node distance matrix includes a pair of terms from the raw pair distance table wherein each of the terms is a node, a calculated distance value of the pair of terms, and a calculated strength of the pair of terms;
an Energy Reduction module operatively coupled to the CPU and configured to receive the node-node distance matrix and output an NSPACE matrix for a predetermined number of dimensions n, wherein each row includes one node numerical ID number and coordinates specifying a location of the corresponding node in n-dimensions; and
a 3D visualizer operatively coupled to the CPU and configured to receive the NSPACE matrix and communicate with a display to provide a graphical representation of selected nodes and coordinate relationships between the selected nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for analyzing a large number of textual passages are described. A computing device receives the textual passages as input and generates a Raw Pair Distance (RPD) table. The device then determines a Node table and an Node-Node Distance (NND) matrix from the RPD table. An energy reduction process is used to generate an NSPACE matrix from the NND matrix. Finally, a 3D visualizer displays aspects of the Nodes table and the NSPACE matrix to a user. The systems and methods may enable a user to quickly search and understand the text relationships within the large number of textual passages.
-
Citations
20 Claims
-
1. A data analysis system, comprising:
-
a central processing unit (CPU); a Raw Pair Distance (RPD) module operatively coupled to the CPU and configured to receive a corpus of data comprising a plurality of sequential terms and output a raw pair distance table listing each occurrence in the corpus of two different terms separated by no more than a predetermined number of other terms, wherein each row of the raw pair distance table includes the two different terms and the number of other terms separating the two different terms; a Mean Pair Distance (MPD) module operatively coupled to the CPU and configured to receive the raw pair distance table, select a plurality of nodes from the terms included in the raw pair distance table;
output a nodes table wherein each row of the nodes table includes one node, a corresponding unique numerical node ID number, and a corresponding mass value of the node, and output a node-node distance matrix using the raw pair distance table wherein each row of the node-node distance matrix includes a pair of terms from the raw pair distance table wherein each of the terms is a node, a calculated distance value of the pair of terms, and a calculated strength of the pair of terms;an Energy Reduction module operatively coupled to the CPU and configured to receive the node-node distance matrix and output an NSPACE matrix for a predetermined number of dimensions n, wherein each row includes one node numerical ID number and coordinates specifying a location of the corresponding node in n-dimensions; and a 3D visualizer operatively coupled to the CPU and configured to receive the NSPACE matrix and communicate with a display to provide a graphical representation of selected nodes and coordinate relationships between the selected nodes. - View Dependent Claims (2, 3)
-
-
4. A method for analyzing a plurality of text passages from a corpus of text using a text analysis system including at least one computing device including a processor, non-transitory memory, and at least one application configured to run on the processor, wherein the corpus is searchable and accessible by the system, comprising the steps of:
-
compiling a list of all terms included in the plurality of text passages; determining all co-located term pairs in the plurality of text passages, wherein each co-located term pair comprises one occurrence of two different terms separated by no more than a first predetermined number of other terms; creating a raw pair distance table including each co-located term pair and the number of other terms separating each co-located term pair; selecting a plurality of nodes from the plurality of terms, wherein the nodes are selected by an importance algorithm; calculating a mass value for each node; creating a nodes table including each node and the corresponding mass for each node; creating a node-node-distance matrix including each co-located term pair of the raw pair distance table where the co-located term pair includes two nodes, a calculated distance value of the co-located term pair, and a calculated strength value of each pair; running an energy reduction algorithm on the node-node distance matrix using a predetermined number of dimensions n, whereby a point in n-dimensional space is calculated for each node; and creating an NSPACE matrix including n-dimensional coordinates for each node. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for creating an optimized node coordinate matrix in a predetermined number of dimensions n from a node-node distance matrix, wherein the node-node distance matrix includes a plurality of node pairs and a distance associated with each node pair, using a text analysis system comprising a CPU and at least one software module operatively coupled to the CPU and configured to perform the text analysis method, comprising the steps of:
-
assigning a coordinate location in n-space to each node; setting a stress value; creating an offset array based on the number of dimensions; for each node, performing the steps of; for each row of the offset array, setting the row equal to a current offset array row and performing the steps of; offsetting the coordinate location of the node based on the current offset array row; determining, based on the offset coordinate location for each node pair including the selected node, a trial distance between the nodes in the node pair based on the offset coordinate location of the node and the coordinate location of the other node; comparing the trial distance for each node pair with the corresponding node pair distance from the node-node distance matrix; assigning a stress value to each node pair wherein the larger the difference between the compared distances, the larger the stress value; summing the node pair stresses; and setting, if the sum of the node pair stresses is lower than the stress value, the stress value equal to the sum of the node pair stresses and setting the node coordinate location equal to the offset coordinate location, thereby determining an optimized coordinate location for each node. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification