METHOD OF INDEXED STORAGE AND RETRIEVAL OF MULTIDIMENSIONAL INFORMATION
First Claim
1. A computer-implemented method of partitioning data records of a multi-dimensional database into groups comprising:
- defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor andpartitioning the values of the designated variable into two or more groups.
0 Assignments
0 Petitions
Accused Products
Abstract
A tree-structured index to multidimensional data is created using naturally occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles. A search engine utilizes hierarchical decomposition of the database by identifying clusters of similar DNA profiles and maps to parallel computer architecture, allowing scale up past previously feasible limits. Key benefits of the new method are logarithmic scale up and parallelization. These benefits are achieved by identification and utilization of naturally occurring patterns and clusters within stored data. The patterns and clusters enable the stored data to be partitioned into subsets of roughly equal size. The method can be applied recursively, resulting in a database tree that is balanced, meaning that all paths or branches through the tree have roughly the same length. The method achieves high performance by exploiting the natural structure of the data in a manner that maintains balanced trees. Implementation of the method maps naturally to parallel computer architectures, allowing scale up to very large databases.
61 Citations
28 Claims
-
1. A computer-implemented method of partitioning data records of a multi-dimensional database into groups comprising:
-
defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor and partitioning the values of the designated variable into two or more groups. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method of partitioning data records of a multi-dimensional database in a computer into groups of approximately equal size, comprising the steps of:
-
(a) defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor; (b) partitioning the values of the designated variable into two or more groups, wherein a value of the function is determined by applying an optimization procedure; and (c) assigning a data record to a group according to the values of the designated variable. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A parallel data processing system comprising first and second computer processors for implementing a method of partitioning data records of a multi-dimensional database into groups comprising:
-
defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of non-zero measures of entropy and adjacency, adjacency being weighted by a non-zero weighting factor and partitioning the values of the designated variable into two or more groups. - View Dependent Claims (11)
-
-
12. A computer-implemented method of creating an index to multidimensional data of a database for use with a search engine, the multidimensional data changing over time, comprising
identifying clusters of said multidimensional data according to features, coding said features according to their binary presence or absence, typing said features by degree, identifying clusterable patterns of said multidimensionable data via principal component analysis, segmenting at nodes of a tree of said database, and recursively applying said method to balance the tree as said multidimensional data changes over time.
-
17. A computer-implemented method of partitioning data records of a multi-dimensional database in a computer, wherein the database is indexed using a tree of nodes, wherein the tree of nodes comprises a root node which is connected to two or more branches originating at the root node, wherein each branch terminates at a node, wherein each node other than the root node is a non-terminal node or a leaf node, wherein each non-terminal node is connected to two or more branches originating at the non-terminal node and terminating at a node, wherein the tree-structured index comprises one or more tests associated with each non-terminal node, said method comprising:
-
(a) identifying naturally occurring sets of clusters in the data records of the database; (b) defining for each identified set of clusters a query that evaluates one of a Boolean expression or a decision tree and assigns each data record within the set of clusters; and (c) associating each query defined in step (b) with a non-terminal node and an associated set of clusters defined in step (a), and associating with each cluster within the set of clusters one branch originating at the non-terminal node, said branch forming part of one or more paths leading to leaf nodes comprising the data records assigned to the cluster by the query. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification