Distributed processing of streaming data records
First Claim
1. A method for distributed processing of streaming data records, the method comprising:
- parsing information in a received stream of data records to thereby identify a subset of the information relevant to a set of predetermined dimensions;
receiving only the subset of the information in the received streaming data records at a plurality of distributed computational nodes based at least in part on workloads of the distributed computational nodes, properties of the streaming data records, or a random assignment, each node comprising a processor and a storage element;
converting, at each node, a portion of the subset of the information in the received streaming data records into key-value pairs;
parsing, at each node, the converted key-value pairs of the subset of the information in the received streaming data records received at each said node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys;
re-distributing the keys of the converted subset of the received streaming data records among the distributed computational nodes in accordance with the predetermined dimensions stored on the nodes, wherein each distributed computational node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received streaming data records received at each node;
updating a database storing measures of the dimensions by collecting data from the computational nodes in accordance with the parsed and re-distributed streaming data records; and
using the database to respond to a query based on measures associated with one or more of the dimensions.
1 Assignment
0 Petitions
Accused Products
Abstract
Representative embodiments of a distributed processing method of facilitating interactive analytics of streaming data records by receiving the data records at a plurality of distributed computational nodes, establishing and storing dimensions corresponding to attributes of the data records, parsing the streaming data records to identify matches to at least one of the dimensions and based thereon, reducing the number of data records to create a targeted subset of the data, re-distributing the targeted subsets of the streaming data records among the distributed computational nodes in accordance with the dimensions stored on the nodes, updating a database storing measures of the dimensions in accordance with the targeted subsets of the streaming data records, and using the database to respond to a query based on measures associated with one or more of the dimensions.
18 Citations
13 Claims
-
1. A method for distributed processing of streaming data records, the method comprising:
-
parsing information in a received stream of data records to thereby identify a subset of the information relevant to a set of predetermined dimensions; receiving only the subset of the information in the received streaming data records at a plurality of distributed computational nodes based at least in part on workloads of the distributed computational nodes, properties of the streaming data records, or a random assignment, each node comprising a processor and a storage element; converting, at each node, a portion of the subset of the information in the received streaming data records into key-value pairs; parsing, at each node, the converted key-value pairs of the subset of the information in the received streaming data records received at each said node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys; re-distributing the keys of the converted subset of the received streaming data records among the distributed computational nodes in accordance with the predetermined dimensions stored on the nodes, wherein each distributed computational node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received streaming data records received at each node; updating a database storing measures of the dimensions by collecting data from the computational nodes in accordance with the parsed and re-distributed streaming data records; and using the database to respond to a query based on measures associated with one or more of the dimensions. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for distributed processing of streaming data records, the system comprising:
-
a data collector for (i) receiving the streaming data records, (ii) parsing information in the received streaming data records to thereby identify a subset of the information relevant to a set of predetermined dimensions, and (iii) distributing only the subset of the information in the received streaming data records to a plurality of distributed computational nodes based at least in part on workloads of the distributed computational nodes, properties of the streaming data records, or a random assignment; a data reorganizer for converting a portion of the subset of the information in the received streaming data records into key-value pairs, parsing the converted key-value pairs of the subset of the information in the received streaming data records received at each said node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys, and re-distributing the keys of the converted subset of the received streaming data records among the distributed computational nodes based on the predetermined dimensions stored thereon, wherein each distributed computational node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received streaming data records received at each node; a cube constructor for constructing an OLAP cube based on the re-distributed streaming data records by collecting data from the distributed computational nodes; and an interactive cube manipulator for querying information within the OLAP cube and receiving a response in return. - View Dependent Claims (9, 10)
-
-
11. A system for distributed processing of streaming data records, the system comprising:
-
a plurality of computing nodes, each node comprising a processor and a local storage device, wherein each node receives only a subset of information relevant to a set of predetermined dimensions in an input stream of data records based at least in part on workloads of the computing nodes, properties of the input stream of the data records, or a random assignment, converts a portion of the subset of the information in the received streaming data records into key-value pairs, parses the converted key value pairs of the subset of the information in the received input stream of the data records received at each node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys, and selectively redistributes some or all of the keys to another node in the plurality of computing nodes in accordance with the predetermined dimensions stored thereon, wherein each distributed computing node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received input stream of the data records received at each node; a network connecting the plurality of computing nodes, the network distributing and re-distributing the input stream of data records amongst the plurality of computing nodes; and a database for storing an OLAP cube having data collected from the computing nodes based on the parsed and re-distributed input stream of the data records. - View Dependent Claims (12, 13)
-
Specification