Data processing, analysis, and visualization system for use with disparate data types

US 6,990,238 B1
Filed: 09/30/1999
Issued: 01/24/2006
Est. Priority Date: 09/30/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:

(1) selecting a set of attributes associated with an object, the attributes selected comprising a text data type and one other data type chosen froma biopolymer sequence data type,a numerical data type, anda categorical data type;

(2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and

(3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected;

wherein the transformation operations for the attributes of the text data type comprise;

(a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;

(b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality;

(c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix;

(d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and

(e) providing said matrix entries from step (d) for creating the high dimensional vector.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operations that have been performed on the data. A system or method consistent with the present invention determines and displays the relative content and context of related information and is operative to aid in identifying relationships among disparate data types. Various data types, such as numerical data, protein and DNA sequence data, categorical information, and textual information, such as annotations associated with the numerical data or research papers may be correlated for visual analysis. A variety of user-selectable views may be correlated for user interaction to identify relationships that exist among the different types of data or various operations performed on the data.

Furthermore, the user may explore the information contained in sets of records and their associated attributes through the use of interactive 2-D line charts and interactive summary miniplots.

332 Citations

15 Claims

1. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
- (1) selecting a set of attributes associated with an object, the attributes selected comprising a text data type and one other data type chosen froma biopolymer sequence data type,a numerical data type, anda categorical data type;
  
  (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and
  
  (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected;
  
  wherein the transformation operations for the attributes of the text data type comprise;
  
  (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
  
  (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality;
  
  (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix;
  
  (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and
  
  (e) providing said matrix entries from step (d) for creating the high dimensional vector.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. A computer-readable medium storing a software that when executed by a computer performs the method of any one of claims 1–
    - 7.
  - 9. A device adapted to perform the method of claim 1–
    - 7.
  - 10. The method of any of claims 1–
    - 7, wherein said application of transformation application on said selected attributes produces a vector representation of said object in correspondence with a uniform data structure.
  - 11. A computer-readable medium storing a software that when executed by a computer performs the method of claim 10.
  - 12. A device adapted to perform the method of claim 10.
  - 13. The method of claim 10, further comprising using said representation to identify cluster groups of related objects.
  - 14. A computer-readable medium storing a software that when executed by a computer performs the method of claim 13.
  - 15. A device adapted to perform the method of claim 13.

2. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
- (1) selecting a set of attributes associated with an object, the attributes selected comprising a biopolymer sequence data type and one other data type chosen froma text data type,a numerical data type, anda categorical data type;
  
  (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and
  
  (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected;
  
  wherein the transformation operations for the attributes of the biopolymer sequence data type comprise;
  
  (i) comparing a sequence of each biopolymer material to a sequence of each other biopolymer material to provide respective comparison results;
  
  (ii) arranging the comparison results in a square matrix indexed by the plurality of biopolymer materials; and
  
  (iii) providing the square matrix entries for creating the high dimensional vector.
- View Dependent Claims (3, 4, 5, 6)
- - 3. The computer-implemented method of claims 2, wherein the attributes selected in step (1) comprise a text data type and a biopolymer sequence data type.
  - 4. The computer-implemented method of claims 2, wherein the transformation operations for the attributes of the text data type comprise:
    - (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
      
      (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality;
      
      (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix;
      
      (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and
      
      (e) providing said matrix entries from step (d) for creating the high dimensional vector.
  - 5. The computer-implemented method of claim 3, wherein the attributes selected in step (1) comprise a text data type, a biopolymer sequence data type, and one other data type chosen from a numerical date type and a categorical data type.
  - 6. The computer-implemented method of claim 3, wherein the attributes selected in step (1) comprise a text data type, a biopolymer sequence data type, a numerical data type, and a categorical data type.

7. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising:
- (1) selecting a set of attributes associated with an object, the attributes selected comprising any three data types chosen froma text data type,a biopolymer sequence data type,a numerical data type, anda categorical data type;
  
  (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and
  
  (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected;
  
  wherein the transformation operations for the attributes of the text data type, if selected, comprise;
  
  (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
  
  (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality;
  
  (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix;
  
  (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and
  
  (e) providing said matrix entries from step (d) for creating the high dimensional vector;
  
  and wherein the transformation operations for the attributes of the biopolymer sequence data type, if selected, comprise;
  
  (i) comparing a sequence of each biopolymer material to a sequence of each other biopolymer material to provide respective comparison results;
  
  (ii) arranging the comparison results in a square matrix indexed by the plurality of biopolymer materials; and
  
  (iii) providing the square matrix entries for creating the high dimensional vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Battelle Memorial Institute
Original Assignee
Battelle Memorial Institute
Inventors
Havre, Susan L., Crow, Vernon L., Chen, Guang, Calapristi, Augustin J., Saffer, Jeffrey D., Sofia, Heidi J., Thurston, Sarah J., Zabriskie, Sean J., Decker, Scott D., Malard, Joel M., Miller, Nancy E., Payne, Deborah A., Scarberry, Randall E., Albright, Cory L., Stillwell, Lisa C., Thomas, Gregory S., Nowell, Lucille T., Groch, Kevin M.
Primary Examiner(s)
AHMED, SAMIR ANWAR

Application Number

US09/410,367
Time in Patent Office

2,308 Days
Field of Search

382/190, 382/195, 382/186, 382/187, 382/189, 382/224, 382/225, 382/228, 382/229, 382/159, 707/1, 707/2, 707/3, 707/4, 707/5, 707/6, 707/101, 707/102, 707/103.R, 707/104.1, 707/200, 707/522, 707/526, 707/532, 707/501.1, 345/427, 345/440, 345/619, 345/649, 345/712, 345/848, 704/9, 705/7, 705/14, 705/35
US Class Current

382/224
CPC Class Codes

G06F 16/34   Browsing; Visualisation the...

G06F 16/9038   Presentation of query results

G06F 18/23   Clustering techniques

G06F 18/40   Software arrangements speci...

G06V 10/762   using clustering, e.g. of s...

Y02A 90/10   Information and communicati...

Y10S 707/99931   Database or file accessing

Data processing, analysis, and visualization system for use with disparate data types

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

332 Citations

15 Claims

Specification

Use Cases

Quick Links

Others

Data processing, analysis, and visualization system for use with disparate data types

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

332 Citations

15 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others