Parallel data processing architecture
DCFirst Claim
1. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor;
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and
c) host processor memory, a method of balancing workload between said plurality of host processors comprising;
each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database;
each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; and
each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor thereby minimizing a time required to respond to client queries for retrieval of responsive data from said database, said bringing into balance comprising exchanging a block of search requests from host processors having long average waiting times to host processors having short waiting times and adjusting a size of a block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors.
0 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A tree-structured index to multidimensional data is created using naturally occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles. A search engine utilizes hierarchical decomposition of the database by identifying clusters of similar DNA profiles and maps to parallel computer architecture, allowing scale up past to previously feasible limits. Key benefits of the new method are logarithmic scale up and parallelization. These benefits are achieved by identification and utilization of naturally occurring patterns and clusters within stored data. The patterns and clusters enable the stored data to be partitioned into subsets of roughly equal size. The method can be applied recursively, resulting in a database tree that is balanced, meaning that all paths or branches through the tree have roughly the same length. The method achieves high performance by exploiting the natural structure of the data in a manner that maintains balanced trees. Implementation of the method maps naturally to parallel computer architectures, allowing scale up to very large databases.
-
Citations
19 Claims
-
1. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor;
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and
c) host processor memory, a method of balancing workload between said plurality of host processors comprising;each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database; each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; and each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor thereby minimizing a time required to respond to client queries for retrieval of responsive data from said database, said bringing into balance comprising exchanging a block of search requests from host processors having long average waiting times to host processors having short waiting times and adjusting a size of a block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors. - View Dependent Claims (2, 3, 4)
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and
-
5. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor;
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and
c) host processor memory, a method of balancing workload between said plurality of host processors comprising;each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database; each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; and each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor thereby minimizing a time required to respond to client queries for retrieval of responsive data from said database, said bringing a search queue of client queries into balance further comprising exchanging a block of search requests between at least two of said plurality of host processors, thereby minimizing a time required to respond to client queries by said exchanging of said block of search requests and adjusting a size of the block of exchanged search requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors. - View Dependent Claims (6, 7, 8)
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another; and
-
9. In a parallel data processing architecture for search, storage and retrieval of specific data of a database responsive to queries for said specific data, said parallel data processing architecture comprising a) a plurality of available host processors comprising at least one root host processor responsive to a client query for said specific data and at least one other host processor;
- b) a communication system coupling said plurality of available host processors, said plurality of available host processors being capable of communicating with one another; and
c) host processor memory, a method of storing information of said plurality of available host processors for responding to client queries for specific data of said database comprising;at least two of said plurality of available host processors each having a search engine and maintaining information of said plurality of said available host processors and a processor capacity and load of said plurality of available host processors, said information including information of a search queue length of client queries; each of said plurality of available host processors broadcasting its capacity and load information to each other of said plurality of available host processors and bringing each search queue of client queries into balance with another of said plurality of available host processors according to a time constant in response to receipt of said broadcast capacity and load information, said bringing its search queue into balance comprising equalizing average waiting times for service and computation between said search engines, exchanging a block of search requests from host processors having long average waiting times to host processors having short waiting times and adjusting a size of the block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors; stochastically selecting among said plurality of available host processors to determine a recipient host processor for an exchanged search request from said root host processor for searching said database, said recipient host processor storing a database index for said database comprising nodes of said database index for said database and data accessible via said nodes; and each of said plurality of available host processors reconfiguring information on said plurality of available host processors responsive to the receipt of said broadcast information of said capacity and load information. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- b) a communication system coupling said plurality of available host processors, said plurality of available host processors being capable of communicating with one another; and
-
18. In a parallel data processing architecture for search, storage and retrieval of data of a database responsive to queries for specific data of said database, said parallel data processing architecture comprising a) a plurality of host processors comprising at least one root host processor responsive to a client query for said specific data of said database and at least one other host processor;
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another;
and c) host processor memory, a method of balancing workload between said plurality of host processors comprising; each of said plurality of host processors maintaining load information of processor capacity and search queue length of said host processor, each of said plurality of host processors further maintaining information of a search queue of client queries at said host processor for said specific data of said database; each of said plurality of host processors broadcasting said load information of its processor capacity and search queue length to at least one other of said plurality of host processors; each of said plurality of host processors reconfiguring information on available host processors responsive to the receipt of said broadcast information of said capacity and load information; and each of said plurality of host processors bringing its search queue of client queries into balance with another of said plurality of host processors according to a time constant responsive to receipt of said broadcast capacity and load information, said balancing including exchanging unprocessed search requests with a recipient host processor responsive to a stochastic selection process to determine the recipient host processor of an exchanged search request between said root host processor and said recipient host processor, said bringing a search queue of client queries into balance further comprising exchanging a block of search requests between said plurality of host processor and adjusting a size of the block of exchanged requests according to relative processing speeds of host processors and inter-processor communications protocol between the host processors. - View Dependent Claims (19)
- b) a communication system coupling said plurality of host processors, said plurality of host processors being capable of communicating with one another;
Specification