Distributed processing of streaming data records

US 8,738,650 B2
Filed: 05/23/2013
Issued: 05/27/2014
Est. Priority Date: 05/22/2012
Status: Active Grant

First Claim

Patent Images

1. A method for distributed processing of streaming data records, the method comprising:

parsing information in a received stream of data records to thereby identify a subset of the information relevant to a set of predetermined dimensions;

receiving only the subset of the information in the received streaming data records at a plurality of distributed computational nodes based at least in part on workloads of the distributed computational nodes, properties of the streaming data records, or a random assignment, each node comprising a processor and a storage element;

converting, at each node, a portion of the subset of the information in the received streaming data records into key-value pairs;

parsing, at each node, the converted key-value pairs of the subset of the information in the received streaming data records received at each said node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys;

re-distributing the keys of the converted subset of the received streaming data records among the distributed computational nodes in accordance with the predetermined dimensions stored on the nodes, wherein each distributed computational node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received streaming data records received at each node;

updating a database storing measures of the dimensions by collecting data from the computational nodes in accordance with the parsed and re-distributed streaming data records; and

using the database to respond to a query based on measures associated with one or more of the dimensions.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Representative embodiments of a distributed processing method of facilitating interactive analytics of streaming data records by receiving the data records at a plurality of distributed computational nodes, establishing and storing dimensions corresponding to attributes of the data records, parsing the streaming data records to identify matches to at least one of the dimensions and based thereon, reducing the number of data records to create a targeted subset of the data, re-distributing the targeted subsets of the streaming data records among the distributed computational nodes in accordance with the dimensions stored on the nodes, updating a database storing measures of the dimensions in accordance with the targeted subsets of the streaming data records, and using the database to respond to a query based on measures associated with one or more of the dimensions.

18 Citations

View as Search Results

13 Claims

1. A method for distributed processing of streaming data records, the method comprising:
- parsing information in a received stream of data records to thereby identify a subset of the information relevant to a set of predetermined dimensions;
  
  receiving only the subset of the information in the received streaming data records at a plurality of distributed computational nodes based at least in part on workloads of the distributed computational nodes, properties of the streaming data records, or a random assignment, each node comprising a processor and a storage element;
  
  converting, at each node, a portion of the subset of the information in the received streaming data records into key-value pairs;
  
  parsing, at each node, the converted key-value pairs of the subset of the information in the received streaming data records received at each said node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys;
  
  re-distributing the keys of the converted subset of the received streaming data records among the distributed computational nodes in accordance with the predetermined dimensions stored on the nodes, wherein each distributed computational node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received streaming data records received at each node;
  
  updating a database storing measures of the dimensions by collecting data from the computational nodes in accordance with the parsed and re-distributed streaming data records; and
  
  using the database to respond to a query based on measures associated with one or more of the dimensions.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising sending a pull request for the received streaming data records or receiving a push notification associated with the received streaming data records.
  - 3. The method of claim 1, further comprising labeling the received streaming data record with a time period.
  - 4. The method of claim 1, wherein re-distributing the keys of the received streaming data records comprises sending key-value pairs having the same key to one of the plurality of distributed computational nodes.
  - 5. The method of claim 1, wherein the database comprises an OLAP cube or a plurality of redundant OLAP cubes.
  - 6. The method of claim 5, wherein a cell in the OLAP cube comprises information derived from the received streaming data records.
  - 7. The method of claim 1, further comprising receiving a request for information from the database.

8. A system for distributed processing of streaming data records, the system comprising:
- a data collector for (i) receiving the streaming data records, (ii) parsing information in the received streaming data records to thereby identify a subset of the information relevant to a set of predetermined dimensions, and (iii) distributing only the subset of the information in the received streaming data records to a plurality of distributed computational nodes based at least in part on workloads of the distributed computational nodes, properties of the streaming data records, or a random assignment;
  
  a data reorganizer for converting a portion of the subset of the information in the received streaming data records into key-value pairs, parsing the converted key-value pairs of the subset of the information in the received streaming data records received at each said node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys, and re-distributing the keys of the converted subset of the received streaming data records among the distributed computational nodes based on the predetermined dimensions stored thereon, wherein each distributed computational node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received streaming data records received at each node;
  
  a cube constructor for constructing an OLAP cube based on the re-distributed streaming data records by collecting data from the distributed computational nodes; and
  
  an interactive cube manipulator for querying information within the OLAP cube and receiving a response in return.
- View Dependent Claims (9, 10)
- - 9. The system of claim 8, wherein re-distributing the streaming data records comprises sending similar records to the same distributed computational nodes.
  - 10. The system of claim 8, wherein the key in the key-value pairs comprises OLAP values used to identify the value in the key-value pair.

11. A system for distributed processing of streaming data records, the system comprising:
- a plurality of computing nodes, each node comprising a processor and a local storage device, wherein each node receives only a subset of information relevant to a set of predetermined dimensions in an input stream of data records based at least in part on workloads of the computing nodes, properties of the input stream of the data records, or a random assignment, converts a portion of the subset of the information in the received streaming data records into key-value pairs, parses the converted key value pairs of the subset of the information in the received input stream of the data records received at each node to (i) identify matches of the keys to at least one predetermined dimension and (ii) based thereon, combine the key-value pairs having identical keys, and selectively redistributes some or all of the keys to another node in the plurality of computing nodes in accordance with the predetermined dimensions stored thereon, wherein each distributed computing node receives the key corresponding to the predetermined dimension, thereby reducing a size of the portion of the subset of information in the received input stream of the data records received at each node;
  
  a network connecting the plurality of computing nodes, the network distributing and re-distributing the input stream of data records amongst the plurality of computing nodes; and
  
  a database for storing an OLAP cube having data collected from the computing nodes based on the parsed and re-distributed input stream of the data records.
- View Dependent Claims (12, 13)
- - 12. The system of claim 11, further comprising a user interface for sending a query to, and receiving a response from, the database.
  - 13. The system of claim 11, wherein the plurality of computing nodes comprises a Hadoop cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Guavus, Inc. (Thales SA)
Original Assignee
Guavus, Inc. (Thales SA)
Inventors
Bawa, Jaskaran Singh, Bisht, Bijay Singh, Srivastava, Anand Vivek, Bhowmik, Sumanta Kumar, Saraf, Atul Kumar
Primary Examiner(s)
Singh, Amresh

Application Number

US13/901,291
Publication Number

US 20130318034A1
Time in Patent Office

369 Days
Field of Search

707/603, 707/770
US Class Current

707/770
CPC Class Codes

G06F 16/27 Replication, distribution o...

G06F 16/283 Multi-dimensional databases...

Distributed processing of streaming data records

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

18 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed processing of streaming data records

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links