Data processing, analysis, and visualization system for use with disparate data types
First Claim
1. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
- (1) selecting a set of attributes associated with an object, the attributes selected comprising a text data type and one other data type chosen froma biopolymer sequence data type,a numerical data type, anda categorical data type;
(2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and
(3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected;
wherein the transformation operations for the attributes of the text data type comprise;
(a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
(b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality;
(c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix;
(d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and
(e) providing said matrix entries from step (d) for creating the high dimensional vector.
1 Assignment
0 Petitions
Accused Products
Abstract
A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operations that have been performed on the data. A system or method consistent with the present invention determines and displays the relative content and context of related information and is operative to aid in identifying relationships among disparate data types. Various data types, such as numerical data, protein and DNA sequence data, categorical information, and textual information, such as annotations associated with the numerical data or research papers may be correlated for visual analysis. A variety of user-selectable views may be correlated for user interaction to identify relationships that exist among the different types of data or various operations performed on the data.
Furthermore, the user may explore the information contained in sets of records and their associated attributes through the use of interactive 2-D line charts and interactive summary miniplots.
332 Citations
15 Claims
-
1. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
-
(1) selecting a set of attributes associated with an object, the attributes selected comprising a text data type and one other data type chosen from a biopolymer sequence data type, a numerical data type, and a categorical data type; (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected; wherein the transformation operations for the attributes of the text data type comprise; (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality; (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality; (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix; (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and (e) providing said matrix entries from step (d) for creating the high dimensional vector. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
2. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
-
(1) selecting a set of attributes associated with an object, the attributes selected comprising a biopolymer sequence data type and one other data type chosen from a text data type, a numerical data type, and a categorical data type; (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected; wherein the transformation operations for the attributes of the biopolymer sequence data type comprise; (i) comparing a sequence of each biopolymer material to a sequence of each other biopolymer material to provide respective comparison results; (ii) arranging the comparison results in a square matrix indexed by the plurality of biopolymer materials; and (iii) providing the square matrix entries for creating the high dimensional vector. - View Dependent Claims (3, 4, 5, 6)
-
-
7. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
-
(1) selecting a set of attributes associated with an object, the attributes selected comprising any three data types chosen from a text data type, a biopolymer sequence data type, a numerical data type, and a categorical data type; (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected; wherein the transformation operations for the attributes of the text data type, if selected, comprise; (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality; (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality; (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix; (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and (e) providing said matrix entries from step (d) for creating the high dimensional vector; and wherein the transformation operations for the attributes of the biopolymer sequence data type, if selected, comprise; (i) comparing a sequence of each biopolymer material to a sequence of each other biopolymer material to provide respective comparison results; (ii) arranging the comparison results in a square matrix indexed by the plurality of biopolymer materials; and (iii) providing the square matrix entries for creating the high dimensional vector.
-
Specification