×

Data analytics platform over parallel databases and distributed file systems

  • US 9,858,315 B2
  • Filed: 12/22/2016
  • Issued: 01/02/2018
  • Est. Priority Date: 02/25/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • embedding in each of a plurality of distributed processing segments a library or other shared object comprising one or more data analytical functions, wherein the library or other shared object is included in the processing segments as deployed;

    receiving, by a master node, a data analysis request;

    creating, by the master node, a plan to generate a response to the request;

    selecting one or more of the plurality of distributed processing segments to process a corresponding portion of the plan, wherein selecting the one or more of the plurality of distributed processing segments to process the corresponding portion of the plan includes;

    obtaining, by the master node, metadata associated with one or more portions of the plan to be performed by the one or more corresponding distributed processing segments of the plurality of distributed processing segments; and

    embedding the metadata in an assignment communication to be sent to the one or more of the plurality of distributed processing segments, wherein the metadata indicates a location, within a distributed data storage layer, of data to be processed by the corresponding one or more distributed processing segments;

    sending, by the master node, to each of the plurality of distributed processing segments for which a portion of the plan is to be processed, the corresponding portion of the plan to be processed by that corresponding segment and the metadata, wherein the metadata is used to locate or access a subset of data on which the segment is to perform an indicated processing, and wherein the metadata sent to each of the plurality of distributed processing segments for which the corresponding portion of the plan is assigned includes an identification of at least one data analytics function to be used to process the portion of the plan of the one or more data analytics functions that is embedded in each of the plurality of distributed processing segments;

    receiving, from each of the plurality of distributed processing segments for which a portion of the plan is assigned, a corresponding result of processing the portion of the plan; and

    generating, a master response to the data analysis request based at least in part on the corresponding result of processing the portion of the plan received from each of the plurality of distributed processing segments for which a portion of the plan is assigned.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×