Preview data aggregation

US 10,621,153 B2
Filed: 05/16/2017
Issued: 04/14/2020
Est. Priority Date: 05/16/2017
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method, comprising:

processing, at a first worker node, a first data chunk of a dataset to generate a first intermediate result, the processing of the first data chunk comprising inserting a first plurality of key-value pairs from the first data chunk into the first intermediate result, the dataset being partitioned into the first data chunk and a second data chunk;

generating, at a merger node, a key map based at least on a determination that a quantity of the first plurality of key-value pairs in the first intermediate result exceeds a threshold value, the key map being generated to include one or more keys of the key-value pairs in the first intermediate result;

processing, at a second worker node, the second data chunk to generate a second intermediate result, the processing of the second data chunk includes inserting, into the second intermediate result, a first key-value pair and a second key-value pair based at least on a first key associated with the first key-value pair and a second key associated with the second key-value pair being present in the key map, the processing of the second data chunk further includes omitting, from the second intermediate result, a third key-value pair based at least on a third key associated with the third key-value pair being absent from the key map, the first key-value pair and the second key-value pair being inserted in a same order as an order of the first key and the second key in the key map; and

generating a preview of the processing of the dataset, the preview being generated by at least merging the first intermediate result and the second intermediate result without identifying one or more key-value pairs from each of the first intermediate result and the second intermediate result that share a same key.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one respect, there is provided a method. The method can include processing a first data chunk to generate a first intermediate result. A key map can be generated based on a determination that a quantity of the key-value pairs in the first intermediate result exceeds a threshold. The key map can be generated to include keys in the first intermediate result. A second data chunk can be processed to generate a second intermediate result. The second data chunk can be processed based on the key map. The processing of the second data chunk can include omitting a key-value pair in the second data chunk from being inserted into the second intermediate result based on a key associated with the key-value pair being absent from the key map. A preview of the processing of the dataset can be generated based on the first intermediate result and the second intermediate result.

Citations

14 Claims

1. A computer implemented method, comprising:
- processing, at a first worker node, a first data chunk of a dataset to generate a first intermediate result, the processing of the first data chunk comprising inserting a first plurality of key-value pairs from the first data chunk into the first intermediate result, the dataset being partitioned into the first data chunk and a second data chunk;
  
  generating, at a merger node, a key map based at least on a determination that a quantity of the first plurality of key-value pairs in the first intermediate result exceeds a threshold value, the key map being generated to include one or more keys of the key-value pairs in the first intermediate result;
  
  processing, at a second worker node, the second data chunk to generate a second intermediate result, the processing of the second data chunk includes inserting, into the second intermediate result, a first key-value pair and a second key-value pair based at least on a first key associated with the first key-value pair and a second key associated with the second key-value pair being present in the key map, the processing of the second data chunk further includes omitting, from the second intermediate result, a third key-value pair based at least on a third key associated with the third key-value pair being absent from the key map, the first key-value pair and the second key-value pair being inserted in a same order as an order of the first key and the second key in the key map; and
  
  generating a preview of the processing of the dataset, the preview being generated by at least merging the first intermediate result and the second intermediate result without identifying one or more key-value pairs from each of the first intermediate result and the second intermediate result that share a same key.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein the threshold value corresponds to a quantity of key-value pairs required to be present in the preview.
  - 3. The computer-implemented method of claim 1, wherein the inserting of the key-value pairs from the first data chunk into the first intermediate result comprises:
    - selecting a fourth key-value pair from the first data chunk; and
      
      determining whether the first key-value pair is associated with a same key as a fifth key-value pair in the first intermediate result.
  - 4. The computer-implemented method of claim 3, further comprising:
    - aggregating the fourth key-value pair and the fifth key-value pair based at least in part on a determination that the fourth key-value pair and the fifth key-value pair are associated with the same key.
  - 5. The computer-implemented method of claim 4, wherein the aggregating comprises aggregating a first value of the fourth key-value pair and a second value of the fifth key-value pair, and wherein the first value and the second value are aggregated by addition, multiplication, division, subtraction, and/or comparison.
  - 6. The computer-implemented method of claim 1, wherein the first intermediate result and the second intermediate result are merged without determining whether a fourth key-value pair from the first intermediate result shares a same key as the first key-value pair or the second key-value pair from the second intermediate result.
  - 7. The computer-implemented method of claim 1, wherein the preview is further generated by merging, with the first intermediate result and/or the second intermediate result, a third intermediate result generated based on the key map.

8. A system, comprising:
- at least one data processor; and
  
  at least one memory storing instructions which, when executed by the at least one data processor, result in operations comprising;
  
  processing, at a first worker node, a first data chunk of a dataset to generate a first intermediate result, the processing of the first data chunk comprising inserting a first plurality of key-value pairs from the first data chunk into the first intermediate result, the dataset being partitioned into the first data chunk and a second data chunk;
  
  generating, at a merger node, a key map based at least on a determination that a quantity of the first plurality of key-value pairs in the first intermediate result exceeds a threshold value, the key map being generated to include one or more keys of the key-value pairs in the first intermediate result;
  
  processing, at a second worker node, the second data chunk to generate a second intermediate result, the processing of the second data chunk includes inserting, into the second intermediate result, a first key-value pair and a second key-value pair based at least on a first key associated with the first key-value pair and a second key associated with the second key-value pair being present in the key map, the processing of the second data chunk further includes omitting, from the second intermediate result, a third key-value pair based at least on a third key associated with the third key-value pair being absent from the key map, the first key-value pair and the second key-value pair being inserted in a same order as an order of the first key and the second key in the key map; and
  
  generating a preview of the processing of the dataset, the preview being generated by at least merging the first intermediate result and the second intermediate result without identifying one or more key-value pairs from each of the first intermediate result and the second intermediate result that share a same key.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8, wherein the threshold value corresponds to a quantity of key-value pairs required to be present in the preview.
  - 10. The system of claim 8, wherein the inserting of the key-value pairs from the first data chunk into the first intermediate result comprises:
    - selecting a fourth key-value pair from the first data chunk; and
      
      determining whether the first key-value pair is associated with a same key as a fifth key-value pair in the first intermediate result; and
      
      aggregating the fourth key-value pair and the fifth key-value pair based at least in part on a determination that the fourth key-value pair and the fifth key-value pair are associated with the same key.
  - 11. The system of claim 10, wherein the aggregating comprises aggregating a first value of the fourth key-value pair and a second value of the fifth key-value pair, and wherein the first value and the second value are aggregated by addition, multiplication, division, subtraction, and/or comparison.
  - 12. The system of claim 8, wherein the first intermediate result and the second intermediate result are merged without determining whether a fourth key-value pair from the first intermediate result shares a same key as the first key-value pair or the second key-value pair from the second intermediate result.
  - 13. The system of claim 8, wherein the preview is further generated by merging, with the first intermediate result and/or the second intermediate result, a third intermediate result generated based on the key map.

14. A non-transitory computer-readable storage medium including program code, which when executed by at least one data processor, cause operations comprising:
- processing, at a first worker node, a first data chunk of a dataset to generate a first intermediate result, the processing of the first data chunk comprising inserting a first plurality of key-value pairs from the first data chunk into the first intermediate result, the dataset being partitioned into the first data chunk and a second data chunk;
  
  generating, at a merger node, a key map based at least on a determination that a quantity of the first plurality of key-value pairs in the first intermediate result exceeds a threshold value, the key map being generated to include one or more keys of the key-value pairs in the first intermediate result;
  
  processing, at a second worker node, the second data chunk to generate a second intermediate result, the processing of the second data chunk includes inserting, into the second intermediate result, a first key-value pair and a second key-value pair based at least on a first key associated with the first key-value pair and a second key associated with the second key-value pair being present in the key map, the processing of the second data chunk further includes omitting, from the second intermediate result, a third key-value pair based at least on a third key associated with the third key-value pair being absent from the key map, the first key-value pair and the second key-value pair being inserted in a same order as an order of the first key and the second key in the key map; and
  
  generating a preview of the processing of the dataset, the preview being generated by at least merging the first intermediate result and the second intermediate result without identifying one or more key-value pairs from each of the first intermediate result and the second intermediate result that share a same key.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SAP SE
Original Assignee
SAP SE
Inventors
Transier, Frederik, Stammerjohann, Kai, Bohnsack, Nico
Primary Examiner(s)
Brooks, David T.

Application Number

US15/596,954
Publication Number

US 20180336230A1
Time in Patent Office

1,064 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/22   Indexing; Data structures t...

G06F 16/24556   Aggregation; Duplicate elim...

G06F 16/24561   Intermediate data storage t...

G06F 16/285   Clustering or classification

G06F 16/287   Visualization; Browsing

Preview data aggregation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Preview data aggregation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links