×

Peer-to-peer architecture for processing big data

  • US 10,291,696 B2
  • Filed: 04/28/2015
  • Issued: 05/14/2019
  • Est. Priority Date: 04/28/2014
  • Status: Active Grant
First Claim
Patent Images

1. A system for managing large datasets comprising:

  • a physical network comprising a plurality of computing devices and a plurality of processors;

    a peer-to-peer (P2P) network, the P2P network comprising a plurality of nodes and a logical network derived from the physical network;

    a distributed file system fordistributing data and jobs randomly across the plurality of nodes in the P2P network, by;

    receiving a file by an originating data node of the plurality of nodes, with a first processor of the plurality of processors assigned to the originating data node and being configured to;

    divide the file into a plurality of pages, assign a hash value to a first page of the plurality of pages, and transfer the first page of the plurality of pages to an initial responsible data node, the initial responsible data node including a name defining a string of characters that shares a predetermined number of values with the hash value of the first page of the plurality of pages;

    replicating the first page of the plurality of pages from the initial responsible data node to a first responsible data node and a second responsible data node;

    receiving a job at an originating job node and dividing the job into a plurality of tasks, wherein the job comprises an input file name, a map function to process the plurality of tasks, and a reduce function to generate a set of results for the plurality of tasks from values derived from the map function;

    routing a first task of the plurality of tasks to an initial responsible job node; and

    assigning the first task of the plurality of tasks to a first processing node using the initial responsible job node; and

    a task scheduler for delegating the first task of the plurality of tasks as necessary to optimize load distribution, by;

    assigning the first task of the plurality of tasks to a first queue of at least two queues in the first processing node, andforwarding the first task of the plurality of tasks to a second processing node remote from the first processing node,wherein each of the plurality of nodes in the P2P network is configured to perform data storage, task execution, and job delegation, andwherein a distributed hash table is utilized to form the P2P network, andwherein globally unique arbitrary keys are mapped to individual nodes of the plurality of nodes to allow for a node lookup by referencing the unique arbitrary keys to accommodate random distribution of node names within the P2P network.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×