METHOD, APPARATUS AND SYSTEM FOR DATA ANALYSIS
First Claim
1. A method for data analysis, comprising:
- retrieving pipeline data from a pipeline data set piece by piece, wherein each piece of pipeline data includes attribute values of multiple views;
performing normalization sorting of the retrieved pipeline data based on the attribute value in a predefined view;
obtaining an attribute value entry list by extracting attribute value entries from the normalization sorted pipeline data piece by piece;
obtaining a first characteristic value list by performing deduplication operation on the attribute value entry list through mapper operation;
obtaining a second characteristic value list by performing accumulation operation on the first characteristic value list through reducer operation; and
obtaining a result of the predefined indicator by analyzing the second characteristic value list.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, apparatus and system for data analysis are provided by the embodiments of the disclosure, which may solve the problem of low efficiency of the data analysis. The disclosed method includes: retrieving pipeline data from a pipeline data set piece by piece, wherein each piece of pipeline data includes attribute values of multiple views; performing normalization sorting of the retrieved pipeline data based on the attribute value in a predefined view; obtaining an attribute value entry list by extracting attribute value entries from the normalization sorted pipeline data; obtaining a first characteristic value list by performing deduplication operation on the attribute value entry list through mapper operation; obtaining a second characteristic value list by performing accumulation operation on the first characteristic value list through reducer operation; and obtaining a result of the predefined indicator by analyzing the second characteristic value list.
6 Citations
18 Claims
-
1. A method for data analysis, comprising:
-
retrieving pipeline data from a pipeline data set piece by piece, wherein each piece of pipeline data includes attribute values of multiple views; performing normalization sorting of the retrieved pipeline data based on the attribute value in a predefined view; obtaining an attribute value entry list by extracting attribute value entries from the normalization sorted pipeline data piece by piece; obtaining a first characteristic value list by performing deduplication operation on the attribute value entry list through mapper operation; obtaining a second characteristic value list by performing accumulation operation on the first characteristic value list through reducer operation; and obtaining a result of the predefined indicator by analyzing the second characteristic value list. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for data analysis, comprising:
-
a data retrieving unit, configured to retrieve pipeline data from a pipeline data set piece by piece, wherein each piece of pipeline data includes attribute values of multiple views; a data sorting unit, configured to perform normalization sorting of the retrieved pipeline data based on the attribute value in a predefined view; an attribute extraction unit, configured to obtain an attribute value entry list by extracting attribute value entries from the normalization sorted pipeline data; an attribute deduplication unit, configured to obtain a first characteristic value list by performing deduplication operation on the attribute value entry list through mapper operation; an attribute accumulation unit, configured to obtain a second characteristic value list by performing accumulation operation on the first characteristic value list through reducer operation; and a result analysis unit, configured to obtain a result of the predefined indicator by analyzing the second characteristic value list. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system for data analysis, comprising a control server and a computing server, wherein,
the control server is configured to sort multiple data analysis tasks in priority order, submit data analysis tasks to the computing server, and receive and record the data analysis status of the computing server; the computing server is configured to; retrieve pipeline data from a pipeline data set piece by piece, wherein each piece of pipeline data includes attribute values of multiple views; perform normalization sorting of the retrieved pipeline data based on the attribute value in a predefined view; obtain an attribute value entry list by extracting attribute value entries from the normalization sorted pipeline data; obtain a first characteristic value list by performing deduplication operation on the attribute value entry list through mapper operation; obtain a second characteristic value list by performing accumulation operation on the first characteristic value list through reducer operation; and obtain a result of the predefined indicator by analyzing the second characteristic value list. - View Dependent Claims (18)
Specification