Systems and methods for processing complex data sets
First Claim
1. A processing cluster for executing a distributed processing operation on a large dataset, wherein multiple processing platforms perform separate, coordinated processing steps relative to portions of the dataset so as to collectively execute the distributed processing operation, the processing cluster comprising:
- a database comprising said dataset;
a first node, associated with a first processing platform, communicably coupled to the database;
a second node, associated with a second processing platform, communicably coupled to the first node, wherein the second node is configured to receive at least a first portion of the dataset from the first node via a first communication channel between said first node and said second node; and
a third node, associated with a third processing platform, communicably coupled to the second node, wherein the third node is configured to receive at least a second portion of the dataset from the second node via a second communication channel between said second node and said third node;
said first, second and third processing platforms thereby being operative for serial transfer of data of said dataset therebetween via said first and second communication channels free from direct transfer of data from said database to either of said second and third nodes;
wherein said first, second and third platforms are operative for executing respective first, second and third separate, coordinated processing steps of said distributed processing operation for said large dataset.
2 Assignments
0 Petitions
Accused Products
Abstract
Various systems and methods of the present invention provide for distributing access to a dataset to a plurality of processing nodes where the dataset is processed to produce node specific outputs. Distribution can be accomplished by a chain, or star-chain distribution model. Some systems and methods of the present invention provide for check-pointing and restarting improperly terminated processes. Other systems and methods provide for computing a coherent result using a cluster of heterogeneous nodes.
70 Citations
62 Claims
-
1. A processing cluster for executing a distributed processing operation on a large dataset, wherein multiple processing platforms perform separate, coordinated processing steps relative to portions of the dataset so as to collectively execute the distributed processing operation, the processing cluster comprising:
-
a database comprising said dataset;
a first node, associated with a first processing platform, communicably coupled to the database;
a second node, associated with a second processing platform, communicably coupled to the first node, wherein the second node is configured to receive at least a first portion of the dataset from the first node via a first communication channel between said first node and said second node; and
a third node, associated with a third processing platform, communicably coupled to the second node, wherein the third node is configured to receive at least a second portion of the dataset from the second node via a second communication channel between said second node and said third node;
said first, second and third processing platforms thereby being operative for serial transfer of data of said dataset therebetween via said first and second communication channels free from direct transfer of data from said database to either of said second and third nodes;
wherein said first, second and third platforms are operative for executing respective first, second and third separate, coordinated processing steps of said distributed processing operation for said large dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A data processing system, the system comprising:
-
a database comprising input data;
a master node communicably coupled to a chain of at least two sub-nodes, wherein the chain of at least two sub-nodes includes at least one sub-node configured to receive the input data from a preceding sub-node such that said master node, said one sub-node and said preceding sub-node are operative for serial transfer of said input data therebetween free from direct transfer of said input data between said master node and said at least one sub-node;
wherein said preceding sub-node and said at least one sub-node are operative for executing respective first and second separate, coordinated processing steps of a distributed processing operation with respect to said input data. - View Dependent Claims (13)
-
-
14. A seismic data processing cluster for executing a distributed processing operation on a large dataset, wherein multiple processing platforms perform separate, coordinated processing steps relative to portions of the dataset so as to collectively execute the distributed processing operation, the processing cluster comprising:
-
a database comprising said dataset of input seismic trace data;
a first node, associated with a first processing platform, communicably coupled to the database;
a second node, associated with a second processing platform, communicably coupled to the first node, wherein the second node is configured to receive at least a first portion of the dataset from the first node via a first communication channel between said first node and said second node; and
a third node, associated with a third processing platform, communicably coupled to the second node, wherein the third node is configured to receive at least a second portion of the dataset from the second node via a second communication channel between said second node and said third node;
said first, second and third processing platforms thereby being operative for serial transfer of data of said dataset therebetween via said first and second communication channels free from direct transfer of data from said database to either of said second and third nodes;
wherein said first, second and third platforms are operative for executing respective first, second and third separate, coordinated processing steps of said distributed processing operation for said large dataset. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A seismic data imaging system, the system comprising:
-
a database comprising input seismic trace data;
a master node communicably coupled to a chain of at least two sub-nodes, wherein the chain of at least two sub-nodes includes at least one sub-node configured to receive the input seismic trace data from a preceding sub-node such that said master node, said one sub-node and said preceding sub-node are operative for serial transfer of said input data therebetween free from direct transfer of said input data between said master node and said at least one sub-node;
wherein said preceding sub-node and said at least one sub-node are operative for executing respective first and second separate, coordinated processing steps of a distributed processing operation with respect so said input data. - View Dependent Claims (26)
-
-
27. A computer readable medium, the computer readable medium comprising computer executable instructions to:
-
receive input seismic data from an upstream node, wherein the upstream node is one of a chain of nodes serially communicably coupled to a master node such that said data is obtained from said upstream node free from direct transfer of data from said master node independent of said upstream node;
compute an image of a physical location based at least in part on the input seismic data; and
provide a first output, wherein said first output is combined with a second output of said upstream node to yield a composite result based on a distributed processing operation performed on said input seismic data. - View Dependent Claims (28, 29, 30, 31, 32)
-
-
33. A computer readable medium, the computer readable medium comprising computer executable instructions to:
-
access an output trace file, wherein the output trace file identifies a plurality of output seismic traces to be computed;
access a node file, wherein the node file includes an attribute about each node in a chain of nodes; and
assign each of the plurality of output seismic traces to a node in the chain of nodes, wherein computation of each of the plurality of output seismic traces on the assigned nodes completes within a balanced time. - View Dependent Claims (34, 35, 36, 37, 38)
-
-
39. A method of computing, the method comprising:
-
accessing a dataset;
designating a plurality of nodes as a node chain, wherein the node chain comprises a first node, a second node, and a third node;
serially transferring the dataset from the first node to the second node, and from the second node to the third node such that said third node receives said dataset substantially free from direct communication between said first node and said third node;
processing the dataset on the first node to create a first output, on the second node to create a second output, and on the third node to create a third output; and
assembling the first, second and third outputs to form a coherent output. - View Dependent Claims (40)
-
-
41. A method for use in controlling a distributed processing operation for processing seismic data to yield geologic information regarding a subterranean geologic formation, said distributed processing operation involving execution, on multiple platforms, of separated, coordinated processing steps with respect to a common processing job, said method comprising the steps of:
-
monitoring said distributed processing operation for processing seismic data to identify a malfunction;
identifying a portion of said processing job affected by said malfunction; and
automatically re-tasking at least one processing platform substantially free from any concurrent prompts by a human operator related to said re-tasking, so as to complete said portion of said processing job. - View Dependent Claims (42, 43, 44, 45, 46, 47)
-
-
48. A method for use in implementing a distributed processing operation for processing seismic data to yield geologic information regarding a subterranean geologic formation, wherein the seismic data includes data corresponding to a number of traces where each trace reflects a seismic signal received at a sensor location, said method comprising the steps of:
-
providing a cluster of processing platforms for executing said distributed processing operation, wherein said processing platforms execute separate, coordinated processing steps so as to collectively yield said geologic information;
establishing a first data type for a transfer of data from a first processing platform to a second processing platform of said cluster of processing platforms, wherein said first data type relates to the way that data is represented within a content of said transfer; and
operating said second processing platform to convert said content of said transfer from said first data type to a second data type different than said first data type;
wherein said seismic data is processed in a distributed processing environment involving heterogeneous processing platforms. - View Dependent Claims (49, 50)
-
-
51. An apparatus for use in controlling a distributed processing operation for processing seismic data to yield geologic information regarding a subterranean geologic formation, said distributed processing operation involving execution, on multiple platforms, of separated, coordinated processing steps with respect to a common processing job, said apparatus comprising the steps of:
-
a monitoring module for monitoring said distributed processing operation for processing seismic data to identify a malfunction and a portion of said processing job affected by said malfunction; and
a re-tasking module for automatically re-tasking at least one processing platform of said multiple processing platforms substantially free from any concurrent prompts by a human operator related to said re-tasking, so as to complete said portion of said processing job. - View Dependent Claims (52, 53, 54, 55, 56, 57)
-
-
58. An apparatus for use implementing a distributed processing operation for processing seismic data to yield geologic information regarding a subterranean geologic formation, wherein the seismic data includes data corresponding to a number of traces where each trace reflects a seismic signal received at a sensor location, said apparatus comprising:
-
a cluster of processing platforms configured for executing said distributed processing operation, wherein said processing platforms execute separate, coordinated processing steps so as to collectively yield said geologic information, wherein said cluster is configured to establish a first data type for transfer of data from a first processing platform to a second processing platform of said cluster of processing platforms, said first data type relating to the way that data is represented within a content of said transfer;
said second processing platform being operative to convert said content of said transfer from said first data type to a second data type different than said first data type, wherein said seismic data is processed in a distributed processing environment involving heterogeneous platforms. - View Dependent Claims (59, 60)
-
-
61. A processing cluster for executing a distributed processing operation on a large dataset of seismic data, wherein multiple processing platforms perform separate, coordinated processing steps relative to portions of the dataset so as to collectively execute the distributed processing operation, the processing cluster comprising:
-
a database comprising said dataset of seismic data;
a first node, associated with a first processing platform, communicably coupled to the database;
a second node, associated with a second processing platform, communicably coupled to the first node, wherein the second node is configured to receive at least a first portion of the dataset of seismic data from the first node via a first communication channel between said first node and said second node; and
a third node, associated with a third processing platform, communicably coupled to the second node, wherein the third node is configured to receive at least a second portion of the dataset of seismic data from the second node via a second communication channel between said second node and said third node;
said first, second and third processing platforms thereby being operative for serial transfer of data of said dataset of seismic data therebetween via said first and second communication channels free from direct transfer of data from said database to either of said second and third nodes;
wherein said first, second and third platforms are operative for executing respective first, second and third separate, coordinated processing steps of said distributed processing operation for said dataset of seismic data;
wherein said first node and second node are configured for transfer of data of a first data type therebetween, said first data type relating to the way that data is represented within a content of said transfer; and
said second node is operative to convert said content of said transfer from said first data type to a second data type different than said first data type, wherein said seismic data is processed in a distributed processing environment involving heterogeneous platforms.
-
-
62. A processing cluster for executing a distributed processing operation on a large dataset of seismic data, wherein multiple processing platforms perform separate, coordinated processing steps relative to portions of the dataset so as to collectively execute the distributed processing operation, the processing cluster comprising:
-
a database comprising said dataset of seismic data;
a first node, associated with a first processing platform, communicably coupled to the database;
a second node, associated with a second processing platform, communicably coupled to the first node, wherein the second node is configured to receive at least a first portion of the dataset of seismic data from the first node via a first communication channel between said first node and said second node; and
a third node, associated with a third processing platform, communicably coupled to the second node, wherein the third node is configured to receive at least a second portion of the dataset of seismic data from the second node via a second communication channel between said second node and said third node;
said first, second and third processing platforms thereby being operative for serial transfer of data of said dataset of seismic data therebetween via said first and second communication channels free from direct transfer of data from said database to either of said second and third nodes;
wherein said first, second and third platforms are operative for executing respective first, second and third separate, coordinated processing steps of said distributed processing operation for said dataset of seismic data;
a monitoring module for monitoring said distributed processing operation to identify a malfunction and one of said first, second and third nodes affected by said malfunction; and
a re-tasking module for automatically re-tasking one of said first, second and third nodes substantially free from any concurrent prompts by a human operator related to said re-tasking.
-
Specification