METHOD OF INDEXED STORAGE AND RETRIEVAL OF MULTIDIMENSIONAL INFORMATION

US 20080172402A1
Filed: 08/10/2007
Published: 07/17/2008
Est. Priority Date: 09/28/1999
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of partitioning data records of a multi-dimensional database into groups comprising:

defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor andpartitioning the values of the designated variable into two or more groups.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A tree-structured index to multidimensional data is created using naturally occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles. A search engine utilizes hierarchical decomposition of the database by identifying clusters of similar DNA profiles and maps to parallel computer architecture, allowing scale up past previously feasible limits. Key benefits of the new method are logarithmic scale up and parallelization. These benefits are achieved by identification and utilization of naturally occurring patterns and clusters within stored data. The patterns and clusters enable the stored data to be partitioned into subsets of roughly equal size. The method can be applied recursively, resulting in a database tree that is balanced, meaning that all paths or branches through the tree have roughly the same length. The method achieves high performance by exploiting the natural structure of the data in a manner that maintains balanced trees. Implementation of the method maps naturally to parallel computer architectures, allowing scale up to very large databases.

61 Citations

View as Search Results

28 Claims

1. A computer-implemented method of partitioning data records of a multi-dimensional database into groups comprising:
- defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor andpartitioning the values of the designated variable into two or more groups.
- View Dependent Claims (2, 3, 4)
- - 2. A computer-implemented method as recited in claim 1 wherein a value of said function is determined by applying an optimization procedure.
  - 3. A computer-implemented method as recited in claim 1 wherein the partitioning of the values into two or more groups is determined responsive to an optimization procedure.
  - 4. A computer-implemented method as recited in claim 1 further comprising assigning a data record to a group according to the values of the designated variable.

5. A computer-implemented method of partitioning data records of a multi-dimensional database in a computer into groups of approximately equal size, comprising the steps of:
- (a) defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor;
  
  (b) partitioning the values of the designated variable into two or more groups, wherein a value of the function is determined by applying an optimization procedure; and
  
  (c) assigning a data record to a group according to the values of the designated variable.
- View Dependent Claims (6, 7, 8, 9)
- - 6. A computer-implemented method as recited in claim 5 further comprising selecting a partition from a set of computed solutions yielding acceptable performance.
  - 7. A computer-implemented method as recited in claim 5 wherein said optimization procedure results in an optimal assignment.
  - 8. A computer-implemented method as recited in claim 5 wherein said combination is linear.
  - 9. A computer-implemented method as recited in claim 5 wherein the designated variable simultaneously comprises a plurality of values.

10. A parallel data processing system comprising first and second computer processors for implementing a method of partitioning data records of a multi-dimensional database into groups comprising:
- defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor andpartitioning the values of the designated variable into two or more groups.
- View Dependent Claims (11)
- - 11. A parallel data processing system as recited in claim 10 comprising first and second panels each comprising first and second computer processors, each of said first and second panels having a bus control module, the bus control module of said first and second panels being linked to a bus control module of a control panel, the control panel having at least one control host computer processor linked to said bus control module of said control panel.

12. A computer-implemented method of creating an index to multidimensional data of a database for use with a search engine, the multidimensional data changing over time, comprisingidentifying clusters of said multidimensional data according to features,coding said features according to their binary presence or absence,typing said features by degree,identifying clusterable patterns of said multidimensionable data via principal component analysis,segmenting at nodes of a tree of said database, andrecursively applying said method to balance the tree as said multidimensional data changes over time.
- View Dependent Claims (13, 14, 15, 16)
- - 13. A computer-implemented method of creating an index as recited in claim 12said segmenting comprising entropy-adjacency partition assignment.
  - 14. A computer-implemented method of creating an index as recited in claim 12said segmenting comprising multivariate statistical analysis.
  - 15. A computer-implemented method of creating an index as recited in claim 12said identifying clusters of said multidimensional data according to features comprising applying logical tests on said data to determine cluster assignments.
  - 16. A computer-implemented method of creating an index as recited in claim 12 comprisinglimiting application of principal component analysis to a subset of said multidimensional data.

17. A computer-implemented method of partitioning data records of a multi-dimensional database in a computer, wherein the database is indexed using a tree of nodes, wherein the tree of nodes comprises a root node which is connected to two or more branches originating at the root node, wherein each branch terminates at a node, wherein each node other than the root node is a non-terminal node or a leaf node, wherein each non-terminal node is connected to two or more branches originating at the non-terminal node and terminating at a node, wherein the tree-structured index comprises one or more tests associated with each non-terminal node, said method comprising:
- (a) identifying naturally occurring sets of clusters in the data records of the database;
  
  (b) defining for each identified set of clusters a query that evaluates one of a Boolean expression or a decision tree and assigns each data record within the set of clusters; and
  
  (c) associating each query defined in step (b) with a non-terminal node and an associated set of clusters defined in step (a), and associating with each cluster within the set of clusters one branch originating at the non-terminal node, said branch forming part of one or more paths leading to leaf nodes comprising the data records assigned to the cluster by the query.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 18. A computer-implemented method as recited in claim 17 wherein said partitioning comprises partitioning of data records into groups of approximately equal size.
  - 19. A computer-implemented method as recited in claim 17 wherein said queries are determined by a combination of entropy and adjacency.
  - 20. A computer-implemented method as recited in claim 17 wherein said combination is linear.
  - 21. A computer-implemented method as recited in claim 17 wherein the data corresponds to DNA.
  - 22. A computer-implemented method as recited in claim 17 wherein the database is applicable to agriculture.
  - 23. A method as recited in claim 17 wherein the database is applicable to forensic science.
  - 24. A method as recited in claim 17 wherein the database is applicable to space science.
  - 25. A computer-implemented method as recited in claim 17 further comprising recursively creating a tree-structured index for said database of a computer as said multi-dimensional database changes in content over time.
  - 26. A computer-implemented method as recited in claim 17 comprising defining a partition of data records of the database using entropy/adjacency partition assignment.
  - 27. A computer-implemented method as recited in claim 17 comprising defining a partition of data records of the database using multivariable statistical analysis.
  - 28. A computer-implemented method as recited in claim 17, both data clustering and entropy-adjacency partitioning being used in the same tree of nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University of Tennessee Research Foundation (University of Teesside)
Original Assignee
University of Tennessee Research Foundation (University of Teesside)
Inventors
Horn, Roger D., Wang, Tse-Wei, Yadav, Puneet, Birdwell, John D., Icove, David J.

Granted Patent

US 7,882,106 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/101
CPC Class Codes

G06F 16/2246   Trees, e.g. B+trees

G06F 16/2264   Multidimensional index stru...

G06F 16/285   Clustering or classification

G16B 40/00   ICT specially adapted for b...

G16B 40/30   Unsupervised data analysis

G16B 50/00   ICT programming tools or da...

G16B 50/20   Heterogeneous data integration

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99942   Manipulating data structure...

Y10S 707/99945   Object-oriented database st...

METHOD OF INDEXED STORAGE AND RETRIEVAL OF MULTIDIMENSIONAL INFORMATION

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

61 Citations

28 Claims

Specification

Use Cases

Quick Links

Others

METHOD OF INDEXED STORAGE AND RETRIEVAL OF MULTIDIMENSIONAL INFORMATION

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

28 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others