Optimizing data analysis through directional dependencies of a graph including plurality of nodes and attributing threading models and setting status to each of the nodes
First Claim
1. A method for processing data comprising:
- defining a plurality of tasks requiring execution;
identifying one or more dependencies for each respective task of the plurality of tasks, the one or more dependencies comprising at least one of a resource dependency and a memory dependency;
determining dependencies between for each respective task of the plurality of tasks wherein the dependencies include directional dependencies;
creating a directed graph, wherein the directed graph includes;
a plurality of nodes, the nodes respectively corresponding to the plurality of tasks requiring execution; and
a plurality of directed edges that respectively correspond to the directional dependencies, each of said directed edges originating from a node of the plurality of nodes and terminating on another node of the plurality of nodes; and
analyzing the directed graph to assign execution of tasks to a plurality of processors to maximize computational efficiency;
performing the following steps until all data is processed;
setting status of all nodes of the plurality of nodes to processed or unprocessed status;
when a task corresponding to a node of the plurality of nodes has been completely computed, setting status of the node to processed status;
when data corresponding to a node of the plurality of nodes has been changed, setting status of the node to unprocessed status; and
executing tasks corresponding to all unprocessed nodes of the plurality of nodes in accordance with respectively assigned threading models.
1 Assignment
0 Petitions
Accused Products
Abstract
There is provided an adaptive semi-synchronous parallel processing system and method, which may be adapted to various data analysis applications such as flow cytometry systems. By identifying the relationship and memory dependencies between tasks that are necessary to complete an analysis, it is possible to significantly reduce the analysis processing time by selectively executing tasks after careful assignment of tasks to one or more processor queues, where the queue assignment is based on an optimal execution strategy. Further strategies are disclosed to address optimal processing once a task undergoes computation by a computational element in a multiprocessor system. Also disclosed is a technique to perform fluorescence compensation to correct spectral overlap between different detectors in a flow cytometry system due to emission characteristics of various fluorescent dyes.
42 Citations
40 Claims
-
1. A method for processing data comprising:
-
defining a plurality of tasks requiring execution; identifying one or more dependencies for each respective task of the plurality of tasks, the one or more dependencies comprising at least one of a resource dependency and a memory dependency; determining dependencies between for each respective task of the plurality of tasks wherein the dependencies include directional dependencies; creating a directed graph, wherein the directed graph includes; a plurality of nodes, the nodes respectively corresponding to the plurality of tasks requiring execution; and a plurality of directed edges that respectively correspond to the directional dependencies, each of said directed edges originating from a node of the plurality of nodes and terminating on another node of the plurality of nodes; and analyzing the directed graph to assign execution of tasks to a plurality of processors to maximize computational efficiency; performing the following steps until all data is processed; setting status of all nodes of the plurality of nodes to processed or unprocessed status; when a task corresponding to a node of the plurality of nodes has been completely computed, setting status of the node to processed status; when data corresponding to a node of the plurality of nodes has been changed, setting status of the node to unprocessed status; and executing tasks corresponding to all unprocessed nodes of the plurality of nodes in accordance with respectively assigned threading models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for optimizing utilization of processor resources comprising:
-
determining a plurality of processing tasks required to perform a requested computation and obtaining from the plurality of processing tasks a set of task dependencies; determining, from the plurality of processing tasks and task dependencies, a blocking state for each of the plurality of processing tasks and setting the blocking state to blocked or non-blocked; instantiating a plurality of graphing nodes respectively representing the plurality of processing tasks wherein said plurality of nodes are initialized to a dirty state; defining a set of directed edges between pairs of individual nodes from the plurality of graphing nodes; counting references for each node of the plurality of graphing nodes; adding blocking attributes to one or more nodes of the plurality of graphing nodes; partitioning the plurality of graphing nodes into a blocked set and an non-blocked set; within each of the blocked set partition and the non-blocked set partition, assigning a generation number to each node of the plurality of graphing nodes; attributing a threading model to each node of the plurality of graphing nodes; assigning each node of the plurality of graphing nodes to one or more processing queues; executing tasks assigned to the one or more processing queues; performing the following steps until all data is processed; setting status of all nodes of the plurality of graphing nodes to clean or dirty status; when a task corresponding to a node of the plurality of graphing nodes has been completely computed, setting status of the node to clean status; when a data corresponding to a node of the plurality of graphing nodes has been changed, setting status of the node to dirty status; and executing all dirty nodes of the plurality of graphing nodes in accordance with respectively assigned threading models. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A system for optimizing data computed by multiple processors comprising:
-
a plurality of processors wherein; the processors are respectively coupled to a plurality of cache memories; and the processors are coupled to a common data bus; a user interface comprising a user data entry interface and a display interface; a memory for storing data and one or more instructions for execution by the plurality of processors to implement one or more functions of the data processing system to; define a plurality of tasks requiring execution; identify one or more dependencies for each respective task of the plurality of tasks, the one or more dependencies comprising at least one of a resource dependency and a memory dependency; determine dependencies between for each respective task of the plurality of tasks wherein the dependencies include directional dependencies; create a directed graph, wherein the directed graph includes; a plurality of nodes, the nodes respectively corresponding to the plurality of tasks requiring execution; and a plurality of directed edges that respectively correspond to the directional dependencies, each of said directed edges originating from a node of the plurality of nodes and terminating on another node of the plurality of nodes; and analyze the directed graph to assign execution of tasks to a plurality of processors to maximize computational efficiency; attribute a threading model to each node of the plurality of nodes; assign a parallel threading model to each said node where said node corresponds to a task that is independent, may operate in parallel, and has a non-blocked status; assign a strip-mine threading model to each node in a collection of nodes that correspond to tasks that operate on common data; and assign a non-threading model is assigned to nodes corresponding to tasks that must be executed sequentially; wherein a strip-mine threading model comprises; partitioning a data set into a plurality of subsets, the size of each subset of the plurality of subsets chosen to be containable within a cache memory of predetermined size associated with a processor of the plurality of processors; allocating processing of a subset of the plurality of subsets to the processor; loading the subset of the plurality of subsets into the cache memory associated with the processor; executing a task utilizing the subset of the plurality of subsets; and indicating, by the processor, that another subset of the plurality of subsets may be allocated. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A system for optimizing data computed by multiple processors comprising:
-
a plurality of processors wherein; the processors are respectively coupled to a plurality of cache memories; and the processors are coupled to a common data bus; a user interface comprising a user data entry interface and a display interface; a memory for storing data and one or more instructions for execution by the plurality of processors to implement one or more functions of the data processing system to; define a plurality of tasks requiring execution; identify one or more dependencies for each respective task of the plurality of tasks, the one or more dependencies comprising at least one of a resource dependency and a memory dependency; determine dependencies between for each respective task of the plurality of tasks wherein the dependencies include directional dependencies; create a directed graph, wherein the directed graph includes; a plurality of nodes, the nodes respectively corresponding to the plurality of tasks requiring execution; and a plurality of directed edges that respectively correspond to the directional dependencies, each of said directed edges originating from a node of the plurality of nodes and terminating on another node of the plurality of nodes; and analyze the directed graph to assign execution of tasks to a plurality of processors to maximize computational efficiency; perform the following steps until all data is processed; setting status of all nodes of the plurality of nodes to processed or unprocessed status; when a task corresponding to a node of the plurality of nodes has been completely computed, setting status of the node to processed status; when data corresponding to a node of the plurality of nodes has been changed, setting status of the node to unprocessed status; and executing tasks corresponding to all unprocessed nodes of the plurality of nodes in accordance with respectively assigned threading models.
-
Specification