Distributed data stream processing method and system
First Claim
Patent Images
1. A method, comprising:
- determining a number of a plurality of division modules based on flow volume of a raw data stream;
dividing the raw data stream into a real-time data stream and one or more historical data streams based on the number of the plurality of division modules;
processing the real-time data stream and the one or more historical data streams in parallel, comprising;
dividing the real-time data stream into a plurality of data units based on a plurality of dimensions, the plurality of dimensions including a first dimension and a second dimension, wherein a first data unit is associated with the first dimension, and wherein a second data unit is associated with the second dimension;
determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data units and resources available to be used to process the plurality of data units;
processing, via the plurality of functional modules, the first data unit and the second data unit in parallel, the plurality of functional modules including a first functional module and a second functional module, wherein the first data unit is processed by the first functional module, and wherein the second data unit is processed by the second functional module; and
aggregating results of the processing performed by the first functional module and the second functional module;
separately generating respective results of the processing of the real-time data stream and the one or more historical data streams; and
integrating the respective generated processing results.
0 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present application relate to a distributed data stream processing method, a distributed data stream processing device, a computer program product for processing a raw data stream and a distributed data stream processing system. A distributed data stream processing method is provided. The method includes dividing a raw data stream into a real-time data stream and historical data streams, processing the real-time data stream and the historical data streams in parallel, separately generating respective results of the processing of the real-time data stream and the historical data streams, and integrating the generated processing results.
30 Citations
23 Claims
-
1. A method, comprising:
-
determining a number of a plurality of division modules based on flow volume of a raw data stream; dividing the raw data stream into a real-time data stream and one or more historical data streams based on the number of the plurality of division modules; processing the real-time data stream and the one or more historical data streams in parallel, comprising; dividing the real-time data stream into a plurality of data units based on a plurality of dimensions, the plurality of dimensions including a first dimension and a second dimension, wherein a first data unit is associated with the first dimension, and wherein a second data unit is associated with the second dimension; determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data units and resources available to be used to process the plurality of data units; processing, via the plurality of functional modules, the first data unit and the second data unit in parallel, the plurality of functional modules including a first functional module and a second functional module, wherein the first data unit is processed by the first functional module, and wherein the second data unit is processed by the second functional module; and aggregating results of the processing performed by the first functional module and the second functional module; separately generating respective results of the processing of the real-time data stream and the one or more historical data streams; and integrating the respective generated processing results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A device, comprising:
-
a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions which when executed cause the processor to; determine a number of a plurality of division modules based on flow volume of a raw data stream; divide the raw data stream into a real-time data stream and one or more historical data streams based on the number of the plurality of division modules; process the real-time data stream and the one or more historical data streams in parallel and separately generate respective processing results, comprising to; divide the real-time data stream into a plurality of data units based on a plurality of dimensions, the plurality of dimensions including a first dimension and a second dimension, wherein a first data unit is associated with the first dimension, and wherein a second data unit is associated with the second dimension; determine a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data units and resources available to be used to process the plurality of data units; process, via the plurality of functional modules, the first data unit and the second data unit in parallel, the plurality of functional modules including a first functional module and a second functional module, wherein the first data unit is processed by the first functional module, and wherein the second data unit is processed by the second functional module; and aggregate results of the processing performed by the first functional module and the second functional module; and integrate the respective generated processing results. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
determining a number of a plurality of division modules based on flow volume of a raw data stream; dividing the raw data stream into a real-time data stream and one or more historical data streams based on the number of the plurality of division modules; processing the real-time data stream and the one or more historical data streams in parallel, comprising; dividing the real-time data stream into a plurality of data units based on a plurality of dimensions, the plurality of dimensions including a first dimension and a second dimension, wherein a first data unit is associated with the first dimension, and wherein a second data unit is associated with the second dimension; determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data units and resources available to be used to process the plurality of data units; processing, via the plurality of functional modules, the first data unit and the second data unit in parallel, the plurality of functional modules including a first functional module and a second functional module, wherein the first data unit is processed by the first functional module, and wherein the second data unit is processed by the second functional module; and aggregating results of the processing performed by the first functional module and the second functional module; and integrating the respective generated processing results. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A system, comprising:
a plurality of application servers comprising; a determination module configured to determine a number of a plurality of division modules based on flow volume of a raw data stream; a data recognition module configured to divide the raw data stream into a real-time data stream and one or more historical data streams based on the number of the plurality of division modules; a parallel processing module configured to process the real-time data stream and the one or more historical data streams in parallel, and separately generate respective processing results, comprises to; divide the real-time data stream into a plurality of data units based on a plurality of dimensions, the plurality of dimensions including a first dimension and a second dimension, wherein a first data unit is associated with the first dimension, and wherein a second data unit is associated with the second dimension; determine a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data units and resources available to be used to process the plurality of data units; process, via the plurality of functional modules, the first data unit and the second data unit in parallel, the plurality of functional modules including a first functional module and a second functional module, wherein the first data unit is processed by the first functional module, and wherein the second data unit is processed by the second functional module; and aggregate results of the processing performed by the first functional module and the second functional module; and a data integration module configured to integrate the respective generated results of processing. - View Dependent Claims (20, 21, 22, 23)
Specification