Real-time partitioned processing streaming

US 9,940,169 B2
Filed: 07/23/2015
Issued: 04/10/2018
Est. Priority Date: 07/23/2015
Status: Active Grant

First Claim

Patent Images

1. A system for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the system comprising:

a message allocator processor that;

receives a plurality of data sets from one or more producer devices;

for each of the plurality of data sets;

identifies a tag or characteristic of the data set;

identifies an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or the characteristic; and

appends the data set to the identified initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the identified initial partition stream;

a partition controller processor that, for the identified initial partition stream of the plurality of initial partition streams, manages a set of task processors such that;

each task processor hi the set of task processors is designated to perform a task hi a workflow so as to process data sets in the identified initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes;

a first task processor designated to perform a first task;

a second task processor designated to perform a second task; and

a third task processor designated to perform a third task;

the first task processor hi the set of task processors is configured to;

generate, via performance of the first task, processed data sets corresponding to the data sets hi the identified initial partition stream;

facilitate storing the processed data sets at a first data store;

generate a processed partition stream that includes the processed data sets in the identified initial partition stream; and

facilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks;

the second task processor in the set of task processors is configured to;

generate, via performance of the second task, a score corresponding to each data set in the identified initial partition stream; and

facilitate storing the scores at a second data store; and

the third task processor in the set of task processors is configured to repeatedly;

retrieve a plurality of scores from the second data store, for each score in the plurality of scores;

generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; and

facilitate providing the real-time analytic variable to a client device,wherein the repeated retrieval of the plurality of scores and the repeated generation of the real-time analytic variable causes the real-time analytic variable to be updated in response to appending and task-performance processing of new data appended to the identified initial partition stream.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments related to processing data sets in real-time by using a distributed network to generate and process partitioned streams. Messages are assigned to partition streams. Within each stream, each of a set of processors perform a designated task. Results from the task are transmitted (directly or indirectly) to another processor in the stream. The distributed and ordered processing can allow results to be transmitted while or before the results are stored.

5 Citations

20 Claims

1. A system for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the system comprising:
- a message allocator processor that;
  
  receives a plurality of data sets from one or more producer devices;
  
  for each of the plurality of data sets;
  
  identifies a tag or characteristic of the data set;
  
  identifies an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or the characteristic; and
  
  appends the data set to the identified initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the identified initial partition stream;
  
  a partition controller processor that, for the identified initial partition stream of the plurality of initial partition streams, manages a set of task processors such that;
  
  each task processor hi the set of task processors is designated to perform a task hi a workflow so as to process data sets in the identified initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes;
  
  a first task processor designated to perform a first task;
  
  a second task processor designated to perform a second task; and
  
  a third task processor designated to perform a third task;
  
  the first task processor hi the set of task processors is configured to;
  
  generate, via performance of the first task, processed data sets corresponding to the data sets hi the identified initial partition stream;
  
  facilitate storing the processed data sets at a first data store;
  
  generate a processed partition stream that includes the processed data sets in the identified initial partition stream; and
  
  facilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks;
  
  the second task processor in the set of task processors is configured to;
  
  generate, via performance of the second task, a score corresponding to each data set in the identified initial partition stream; and
  
  facilitate storing the scores at a second data store; and
  
  the third task processor in the set of task processors is configured to repeatedly;
  
  retrieve a plurality of scores from the second data store, for each score in the plurality of scores;
  
  generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; and
  
  facilitate providing the real-time analytic variable to a client device,wherein the repeated retrieval of the plurality of scores and the repeated generation of the real-time analytic variable causes the real-time analytic variable to be updated in response to appending and task-performance processing of new data appended to the identified initial partition stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system for processing data sets hi real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein each of the first task processor and the third task processor includes a virtual server.
  - 3. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein the set of task processors is managed such that a single stream is sent to a plurality of task processors in the set of task processors for parallel performance of tasks designated to be performed by the plurality of task processors, wherein the single stream includes a particular processed version of the identified initial partition stream.
  - 4. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein each of the first data store and the second data store is a part of a same network attached storage or storage area network.
  - 5. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein the managing the set of task processors further includes:
    - monitoring a latency of completing performance of one or more tasks using the data set relative to a time at which the data set was received or appended to the identified initial partition stream;
      
      comparing the latency to a threshold; and
      
      when it is determined that the latency exceeds the threshold;
      
      identifying a position in the workflow as a potential source of the latency exceeding the threshold, the position corresponding to a task processor designated to perform one or more tasks in the workflow;
      
      identifying a new task processor to be included in the set of task processors;
      
      designating the new task processor for performing part of the one or more tasks in the workflow; and
      
      modifying the designation of the new task processor so as to be designated to perform at least part of a remainder of the one or more tasks in the workflow.
  - 6. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein the partition controller further updates the identified initial partition stream so as to remove the data sets in the identified initial partition stream that have been processed by the first task processor via performance of the first task to generate corresponding processed data sets.
  - 7. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein the partition controller further streams the identified initial partition stream to the first task processor.
  - 8. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein the third task processor is controlled so as to further generate a second real-time analytic variable based on a subset of the plurality of scores;
    - and wherein the system further includes;
      
      a transceiver that transmits the second real-time analytic variable to the client device.
  - 9. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein the real-time analytic variable does not depend on data sets included in any partition stream, other than the identified initial partition stream, of the plurality of initial partition streams such that the identified initial partition stream facilitates data isolation in workflow processing.
  - 10. The system for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 1, wherein, for each of the plurality of data sets, the tag or the characteristic for the data set is identified based on an identifier associated with the producer device from which the data set was received.

11. A method for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the method comprising:
- receiving, at a message allocator, a plurality of data sets from one or more producer devices;
  
  for each of the plurality of data sets, the message allocator is configured to;
  
  identifying a tag or characteristic of the data set;
  
  identifying an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or the characteristic; and
  
  appending the data set to the identified initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the identified initial partition stream;
  
  for the identified initial partition stream of the plurality of initial partition streams;
  
  managing a set of task processors such that;
  
  each task processor in the set of task processors is designated to perform a task in a workflow so as to process data sets in the identified initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes;
  
  a first task processor designated to perform a first task;
  
  a second task processor designated to perform a second task; and
  
  a third task processor designated to perform a third task;
  
  the first task processor hi the set of task processors is configured to;
  
  generate, via performance of the first task, processed datasets corresponding to the data sets in the identified initial partition stream;
  
  facilitate storing the processed data sets in a first data store;
  
  generate a processed partition stream that includes the processed data sets hi the identified initial partition stream; and
  
  facilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks;
  
  the second task processor hi the set of task processors is configured to;
  
  generate, via performance of the second task, a score corresponding to each data set hi the identified initial partition stream; and
  
  facilitate storing the scores in a second data store; and
  
  the third task processor in the set of processors is configured to repeatedly;
  
  retrieve a plurality of scores from the second data store, and for each score in the plurality of scores;
  
  generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; and
  
  facilitate providing the real-time analytic variable to a client device,wherein the repeated retrieval of the plurality of scores and the repeated generation of the real-time analytic variable causes the real-time analytic variable to be updated in response to appending and task performance processing of new data appended to the identified initial partition stream.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method for processing data sets hi real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein each of the first task processor and the third task processor includes a virtual server.
  - 13. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein the set of task processors is managed such that a single stream is sent to a plurality of task processors in the set of task processors for parallel performance of tasks designated to be performed by the plurality of task processors, wherein the single stream includes a particular processed version of the identified initial partition stream.
  - 14. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein each of the first data store and the second data store is a part of a same network attached storage or storage area network.
  - 15. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein the managing the set of task processors further includes:
    - monitoring a latency of completing performance of one or more tasks using the data set relative to a time at which the data set was received or appended to the identified initial partition stream;
      
      comparing the latency to a threshold; and
      
      when it is determined that the latency exceeds the threshold;
      
      identifying a position in the workflow as a potential source of the latency exceeding the threshold, the position corresponding to a task processor designated to perform one or more tasks in the workflow;
      
      identifying a new task processor to be included in the set of task processors;
      
      designating the new task processor for performing part of the one or more tasks in the workflow; and
      
      modifying the designation of the new task processor so as to be designated to perform at least part of a remainder of the one or more tasks in the workflow.
  - 16. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, further comprising updating, via a partition controller, the identified initial partition stream so as to remove the data sets in the identified initial partition stream that have been processed by the first task processor via performance of the first task to generate corresponding processed data sets.
  - 17. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein the method further includes streaming the identified initial partition stream to the first task processor.
  - 18. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein the third task processor is controlled so as to further generate a second real-time analytic variable based on a subset of the plurality of scores;
    - and wherein the method further includes;
      
      facilitating providing the second real-time analytic variable to the client device.
  - 19. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein the real-time analytic variable does not depend on data sets included in any partition stream, other than the identified initial partition stream, of the plurality of initial partition streams such that the identified initial partition stream facilitates data isolation in workflow processing.
  - 20. The method for processing data sets in real-time by using the distributed network to generate and process partitioned streams as recited in claim 11, wherein, for each of the plurality of data sets, the tag or the characteristic for the data set is identified based on an identifier associated with the producer device from which the data set was received.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pearson Education Incorporated (Pearson plc)
Original Assignee
Pearson Education Incorporated (Pearson plc)
Inventors
Moudy, Christopher, Berns, Kevin
Primary Examiner(s)
Woolcock, Madhu

Application Number

US14/806,755
Publication Number

US 20170026441A1
Time in Patent Office

992 Days
Field of Search

709231, 709201
US Class Current
CPC Class Codes

G06F 9/5027 the resource being a machin...

Real-time partitioned processing streaming

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

5 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Real-time partitioned processing streaming

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

5 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others