Modular architecture for extreme-scale distributed processing applications
First Claim
1. A system comprising:
- a subnode of a distributed processing node, the subnode including;
at least one processor core operatively connected to a memory, the memory being managed by Memcached;
a first interconnect operatively connected to the subnode;
a second interconnect operatively connected to the subnode and to a storage, the storage comprising a first storage unit and a second storage unit, the second storage unit having lower access time and latency than the first storage unit, the storage being accessed via a Hadoop Distributed File System;
a process running on the subnode, the process being operative to retrieve data from the memory of the subnode;
wherein;
the process interrogates the memory of the subnode for requested data;
if the requested data is not found in the memory of the subnode, the process interrogates the memory of at least one additional subnode of the distributed processing node via the first interconnect;
if the requested data is found in the memory of the additional subnode, the process copies the requested data to the memory of the subnode; and
if the requested data is not found in the memory of the subnode or the memory of the additional subnode, the process interrogates the storage via the second interconnect;
a storage manager allocates data between the first and second storage units based on access patterns, the storage manager preferentially relocating nonsequentially accessed data to the second storage unit from the first storage unit.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention relate to a new data center architecture that provides for efficient processing in distributed analytics applications. In one embodiment, a subnode of a distributed processing node is provided. The subnode includes at least one processor core operatively connected to a memory. A first interconnect operatively connects to the subnode. A second interconnect operatively connects the subnode to a storage. The storage includes a first storage unit and a second storage unit. The second storage unit has lower access time and latency than the first storage unit. A storage manager is provided that is operative to allocate data between the first and second storage units based on access patterns. The storage manager preferentially relocates non-sequentially accessed data to the second storage unit from the first storage unit.
-
Citations
18 Claims
-
1. A system comprising:
-
a subnode of a distributed processing node, the subnode including; at least one processor core operatively connected to a memory, the memory being managed by Memcached; a first interconnect operatively connected to the subnode; a second interconnect operatively connected to the subnode and to a storage, the storage comprising a first storage unit and a second storage unit, the second storage unit having lower access time and latency than the first storage unit, the storage being accessed via a Hadoop Distributed File System; a process running on the subnode, the process being operative to retrieve data from the memory of the subnode; wherein; the process interrogates the memory of the subnode for requested data; if the requested data is not found in the memory of the subnode, the process interrogates the memory of at least one additional subnode of the distributed processing node via the first interconnect; if the requested data is found in the memory of the additional subnode, the process copies the requested data to the memory of the subnode; and if the requested data is not found in the memory of the subnode or the memory of the additional subnode, the process interrogates the storage via the second interconnect; a storage manager allocates data between the first and second storage units based on access patterns, the storage manager preferentially relocating nonsequentially accessed data to the second storage unit from the first storage unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
allocating a task to a subnode of a distributed processing node, the subnode including at least one processor core operatively connected to a memory, the memory being managed by Memcached; determining data requested by the task; interrogating the memory of the subnode for the requested data; if the requested data is not found in the memory of the subnode, interrogating the memory of at least one additional subnode of the distributed processing node via a first interconnect; if the requested data is found in the memory of the additional subnode, copying the requested data from the memory of the additional subnode to the memory of the subnode; if the requested data is not found in the memory of the subnode or the memory of the additional subnode, interrogating a storage via a second interconnect, the storage comprising a first storage unit and a second storage unit, the second storage unit having lower access time and latency than the first storage unit, the storage being accessed via a Hadoop Distributed File System; and processing the task on the at least one processor core of the subnode; allocating data between the first and second storage units based on access patterns, preferentially relocating non-sequentially accessed data to the second storage unit from the first storage unit. - View Dependent Claims (15, 16)
-
-
17. A computer program product for distributed data processing, the computer program product comprising a non-transitory computer readable storage medium having program code embodied therewith, the program code executable by a processor to:
-
allocate the task to a subnode of a distributed processing node, the subnode including at least one processor core operatively connected to a memory, the memory being managed by Memcached; determine data requested by the task; interrogate the memory of the subnode for the requested data; if the requested data is not found in the memory of the subnode, interrogate the memory of at least one additional subnode of the distributed processing node via a first interconnect; if the requested data is found in the memory of the additional subnode, copy the requested data from the memory of the additional subnode to the memory of the subnode; if the requested data is not found in the memory of the subnode or the memory of the additional subnode, interrogate a storage via a second interconnect, the storage comprising a first storage unit and a second storage unit, the second storage unit having lower access time and latency than the first storage unit, the storage being accessed via a Hadoop Distributed File System; and process the task on the at least one processor core of the subnode; allocate data between the first and second storage units based on access patterns, preferentially relocating non-sequentially accessed data to the second storage unit from the first storage unit. - View Dependent Claims (18)
-
Specification