×

Split elimination in mapreduce systems

  • US 10,691,646 B2
  • Filed: 03/05/2018
  • Issued: 06/23/2020
  • Est. Priority Date: 02/06/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving a query comprising at least one predicate, the query referring to data comprising a plurality of records, each record comprising a plurality of values in a plurality of attributes and each record being located in at least one of a plurality of blocks of a distributed file system, each block having a unique identifier;

    determining a block count indicating the number of blocks in which each of the values of the data appear;

    determining a record count indicating the number of instances of each of the values in each of the attributes;

    based on the block count, determining a profit value associated with copying each of the values of the data to a materialized view;

    based on the record count, determining a cost value associated with copying each of the values of the data to a materialized view;

    selecting a predetermined number of values such that the profit to cost ratio is maximal for the predetermined number of values;

    providing a materialized view comprising the predetermined number of values;

    determining whether the query is applicable to the materialized view;

    wherein the query comprises more than one predicate and determining whether the query is applicable to the materialized view comprises;

    determining whether the predicates comprise a conjunction of a predicate met by one of the values of the materialized view;

    executing the query against the materialized view if it is applicable to the materialized view.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×