Real-time partitioned processing streaming
First Claim
Patent Images
1. A system for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the system comprising:
- a message allocator processor that;
receives a plurality of data sets from one or more producer devices;
for each of the plurality of data sets;
identifies a tag or characteristic of the data set;
identifies an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or the characteristic; and
appends the data set to the identified initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the identified initial partition stream;
a partition controller processor that, for the identified initial partition stream of the plurality of initial partition streams, manages a set of task processors such that;
each task processor hi the set of task processors is designated to perform a task hi a workflow so as to process data sets in the identified initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes;
a first task processor designated to perform a first task;
a second task processor designated to perform a second task; and
a third task processor designated to perform a third task;
the first task processor hi the set of task processors is configured to;
generate, via performance of the first task, processed data sets corresponding to the data sets hi the identified initial partition stream;
facilitate storing the processed data sets at a first data store;
generate a processed partition stream that includes the processed data sets in the identified initial partition stream; and
facilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks;
the second task processor in the set of task processors is configured to;
generate, via performance of the second task, a score corresponding to each data set in the identified initial partition stream; and
facilitate storing the scores at a second data store; and
the third task processor in the set of task processors is configured to repeatedly;
retrieve a plurality of scores from the second data store, for each score in the plurality of scores;
generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; and
facilitate providing the real-time analytic variable to a client device,wherein the repeated retrieval of the plurality of scores and the repeated generation of the real-time analytic variable causes the real-time analytic variable to be updated in response to appending and task-performance processing of new data appended to the identified initial partition stream.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments related to processing data sets in real-time by using a distributed network to generate and process partitioned streams. Messages are assigned to partition streams. Within each stream, each of a set of processors perform a designated task. Results from the task are transmitted (directly or indirectly) to another processor in the stream. The distributed and ordered processing can allow results to be transmitted while or before the results are stored.
5 Citations
20 Claims
-
1. A system for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the system comprising:
-
a message allocator processor that; receives a plurality of data sets from one or more producer devices; for each of the plurality of data sets; identifies a tag or characteristic of the data set; identifies an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or the characteristic; and appends the data set to the identified initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the identified initial partition stream; a partition controller processor that, for the identified initial partition stream of the plurality of initial partition streams, manages a set of task processors such that; each task processor hi the set of task processors is designated to perform a task hi a workflow so as to process data sets in the identified initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes; a first task processor designated to perform a first task; a second task processor designated to perform a second task; and a third task processor designated to perform a third task; the first task processor hi the set of task processors is configured to; generate, via performance of the first task, processed data sets corresponding to the data sets hi the identified initial partition stream; facilitate storing the processed data sets at a first data store; generate a processed partition stream that includes the processed data sets in the identified initial partition stream; and facilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks; the second task processor in the set of task processors is configured to; generate, via performance of the second task, a score corresponding to each data set in the identified initial partition stream; and facilitate storing the scores at a second data store; and the third task processor in the set of task processors is configured to repeatedly; retrieve a plurality of scores from the second data store, for each score in the plurality of scores; generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; and facilitate providing the real-time analytic variable to a client device, wherein the repeated retrieval of the plurality of scores and the repeated generation of the real-time analytic variable causes the real-time analytic variable to be updated in response to appending and task-performance processing of new data appended to the identified initial partition stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the method comprising:
-
receiving, at a message allocator, a plurality of data sets from one or more producer devices; for each of the plurality of data sets, the message allocator is configured to; identifying a tag or characteristic of the data set; identifying an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or the characteristic; and appending the data set to the identified initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the identified initial partition stream; for the identified initial partition stream of the plurality of initial partition streams; managing a set of task processors such that; each task processor in the set of task processors is designated to perform a task in a workflow so as to process data sets in the identified initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes; a first task processor designated to perform a first task; a second task processor designated to perform a second task; and a third task processor designated to perform a third task; the first task processor hi the set of task processors is configured to; generate, via performance of the first task, processed datasets corresponding to the data sets in the identified initial partition stream; facilitate storing the processed data sets in a first data store; generate a processed partition stream that includes the processed data sets hi the identified initial partition stream; and facilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks; the second task processor hi the set of task processors is configured to; generate, via performance of the second task, a score corresponding to each data set hi the identified initial partition stream; and facilitate storing the scores in a second data store; and the third task processor in the set of processors is configured to repeatedly; retrieve a plurality of scores from the second data store, and for each score in the plurality of scores; generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; and facilitate providing the real-time analytic variable to a client device, wherein the repeated retrieval of the plurality of scores and the repeated generation of the real-time analytic variable causes the real-time analytic variable to be updated in response to appending and task performance processing of new data appended to the identified initial partition stream. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification