Distributed data stream processing method and system
First Claim
Patent Images
1. A distributed data stream processing method, the method comprising:
- determining a number of a plurality of division modules based on flow volume of a raw data stream;
dividing the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules;
processing the real-time data stream and the historical data streams in parallel, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the processing of the real-time data stream comprises;
dividing the real-time data stream into a plurality of data blocks based on a first dimension;
dividing each data block into a plurality of data sub-blocks based on a second dimension;
determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks;
processing the plurality of data sub-blocks in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing the plurality of data sub-blocks in parallel comprises;
transmitting a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; and
transmitting a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; and
aggregating the results of the processing of the plurality of data sub-blocks;
separately generating respective results of the processing of the real-time data stream and the historical data streams; and
integrating the respective generated processing results.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present application relate to a distributed data stream processing method, a distributed data stream processing device, a computer program product for processing a raw data stream and a distributed data stream processing system. A distributed data stream processing method is provided. The method includes dividing a raw data stream into a real-time data stream and historical data streams, processing the real-time data stream and the historical data streams in parallel, separately generating respective results of the processing of the real-time data stream and the historical data streams, and integrating the generated processing results.
28 Citations
17 Claims
-
1. A distributed data stream processing method, the method comprising:
-
determining a number of a plurality of division modules based on flow volume of a raw data stream; dividing the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules; processing the real-time data stream and the historical data streams in parallel, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the processing of the real-time data stream comprises; dividing the real-time data stream into a plurality of data blocks based on a first dimension; dividing each data block into a plurality of data sub-blocks based on a second dimension; determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks; processing the plurality of data sub-blocks in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing the plurality of data sub-blocks in parallel comprises; transmitting a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; and transmitting a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; and aggregating the results of the processing of the plurality of data sub-blocks; separately generating respective results of the processing of the real-time data stream and the historical data streams; and integrating the respective generated processing results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A distributed data stream processing device, the device comprising:
-
at least one hardware processor configured to; determine a number of a plurality of division modules based on flow volume of a raw data stream; divide the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules; process the real-time data stream and the historical data streams in parallel and separately generate respective processing results, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the processing of the real-time data stream comprises; divide the real-time data stream into a plurality of data blocks based on a first dimension; divide each data block into a plurality of data sub-blocks based on a second dimension; determine a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks; process the plurality of data sub-blocks in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing the plurality of data sub-blocks in parallel comprises; transmit a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; and transmit a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; and aggregate the results of the processing of the plurality of data sub-blocks; integrate the respective generated processing results; and a memory coupled to the at least one processor and configured to provide the at least one processor with instructions. - View Dependent Claims (10, 11)
-
-
12. A computer program product for processing a raw data stream, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions to be executed by a hardware processor for:
-
determining a number of a plurality of division modules based on flow volume of a raw data stream; dividing the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules; processing the real-time data stream and the historical data streams in parallel, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the processing of the real-time data stream comprises; dividing the real-time data stream into a plurality of data blocks based on a first dimension; dividing each data block into a plurality of data sub-blocks based on a second dimension; determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks; processing the plurality of data units in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing the plurality of data sub-blocks in parallel comprises; transmitting a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; and transmitting a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; and aggregating the results of the processing of the plurality of data sub-blocks; separately generating respective results of the processing of the real-time data stream and the historical data streams; and integrating the respective generated processing results. - View Dependent Claims (13, 14)
-
-
15. A distributed data stream processing system, the system comprising:
a plurality of application servers comprising; a data recognition module configured to; determine, using a hardware processor, a number of a plurality of division modules based on flow volume of a raw data stream; divide, using the hardware processor, the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules; a parallel processing module configured to process, using a hardware processor, the real-time data stream and the historical data streams in parallel, and separately generate respective processing results, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the parallel processing module comprises; a horizontal division module configured to divide, using a hardware processor, the real-time data stream into a plurality of data blocks based on a first dimension; a plurality of vertical division modules configured to; divide, using a hardware processor, each data block in parallel into a plurality of data sub-blocks based on a second dimension, determine, using the hardware processor, a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks, and transmit, using the hardware processor, the plurality of data sub-blocks to a corresponding plurality of functional modules to process the data sub-blocks in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing of the plurality of data sub-blocks in parallel comprises;
transmit, using the hardware processor, a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; and
transmit, using the hardware processor, a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; anda result aggregation module configured to aggregate, using a hardware processor, the results of data processing by the plurality of functional modules; and a data integration module configured to integrate, using a hardware processor, the respective generated results of processing. - View Dependent Claims (16, 17)
Specification