LARGE SCALE REAL-TIME MULTISTAGED ANALYTIC SYSTEM USING DATA CONTRACTS
First Claim
1. At least one computing device implementing a large-scale real-time multistaged analytic system, each of the at least one computing device comprising:
- at least processor; and
a memory operatively connected with the at least one processor, the memory having instructions for the at least one processor to perform a method comprising;
executing a plurality of processing stages of a pipeline for analyzing data, each processing stage of the plurality of processing stages using a respective data contract that specifies a description of inter-stage data to be exchanged across a respective processing stage boundary of the each processing stage, such that the each processing stage is completely agnostic with respect to a data schema and semantics of the inter-stage data consumed by the each processing stage, whereinwhen a change occurs with respect to data provided to a processing stage of the pipeline, in order to accommodate the change, a data contract that specifies a description of the data to be provided to the processing stage is changed and processing logic of the processing stage of the pipeline remains unchanged.
2 Assignments
0 Petitions
Accused Products
Abstract
An analytic system may have a number of processing stages. One or more data sources may provide data to a first processing stage. The first processing stage may specify one or more data contracts having a schema describing a layout and types of data provided by the one or more data sources. Each of the processing stages may specify a respective data contract having a schema such that the processing stages may understand a layout and types of data provided as input to the processing stages. The data contracts me further specify a valid range of values for various items of data described by schemas. Data not conforming to a data contract may be automatically filtered out such that a corresponding processing stage may not be provided with the non-conforming data.
27 Citations
20 Claims
-
1. At least one computing device implementing a large-scale real-time multistaged analytic system, each of the at least one computing device comprising:
-
at least processor; and a memory operatively connected with the at least one processor, the memory having instructions for the at least one processor to perform a method comprising; executing a plurality of processing stages of a pipeline for analyzing data, each processing stage of the plurality of processing stages using a respective data contract that specifies a description of inter-stage data to be exchanged across a respective processing stage boundary of the each processing stage, such that the each processing stage is completely agnostic with respect to a data schema and semantics of the inter-stage data consumed by the each processing stage, wherein when a change occurs with respect to data provided to a processing stage of the pipeline, in order to accommodate the change, a data contract that specifies a description of the data to be provided to the processing stage is changed and processing logic of the processing stage of the pipeline remains unchanged. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A machine-readable storage medium for storing data, instructions, or other information for more than a transitory time period, the machine-readable storage medium having instructions recorded thereon for at least one processor of a computing device, when the at least one processor executes the instructions, the computing device performs a method comprising:
-
executing an analytic system having a plurality of processing stages of a pipeline, each processing stage of the plurality of processing stages having a respective data contract with a corresponding immediately adjacent processing stage within the pipeline, the respective data contract describing inter-stage data provided by either the corresponding immediately adjacent processing stage to the each processing stage or from the each processing stage to the corresponding immediately adjacent processing stage, the each processing stage being completely agnostic with respect to a data schema and semantics of the inter-stage data, wherein when a change occurs with respect to data provided to a processing stage of the pipeline, in order to accommodate the change, a data contract that specifies a description of the data to be provided to the processing stage is changed and processing logic of the processing stage of the pipeline remains unchanged. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A machine-implemented method for analyzing data provided from at least one data source, the machine-implemented method comprising:
-
executing an analytic system having a plurality of processing stages of a pipeline, each processing stage of the plurality of processing stages having a respective data contract with a corresponding immediately adjacent processing stage within the pipeline, the respective data contract specifying a data schema for inter-stage data produced either from the corresponding immediately adjacent processing stage to the each processing stage or from the each processing stage to the corresponding immediately adjacent processing stage, the each processing stage being completely agnostic with respect to the data schema and semantics of the inter-stage data, wherein; the machine-implemented method is implemented by at least one computing device, and when a change occurs with respect to data provided to a processing stage of the pipeline, in order to accommodate the change, a data contract that specifies a description of the data to be provided to the processing stage is changed and processing logic of the processing stage of the pipeline remains unchanged. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification