Normalizing ingested data sets based on fuzzy comparisons to known data sets
First Claim
1. A method for ingesting data for a data model using a network computer that employs one or more processors to execute instructions that perform actions, comprising:
- providing one or more raw data sets to an ingestion engine, wherein each raw data set includes one or more raw records;
providing one or more ingestion rules associated with one or more confidence scores and one or more known data sets based on a type of the one or more raw records;
employing the ingestion engine to iteratively execute the one or more ingestion rules, performing further actions, including;
providing a comparison of one or more portions of the one or more raw records to the one or more known data sets;
transforming contents of the one or more raw records into one or more model record values based on the comparison to the one or more known data sets;
storing the one or more model record values in one or more model records;
providing a score value that indicates a confidence level that the one or more model records are correct based on the one or more confidence scores; and
storing an association of the one or more ingestion rules used to transform the raw record contents into the model record values stored in the one or more model records; and
when the score value that indicates the confidence level of the one or more model records is less than a threshold value, performing further actions, including;
providing a user-interface to interactively edit the one or more raw records or the one or more ingestion rules, wherein the edited one or more ingestion rules produce an increase change or a decrease change in the one or more confidence scores, wherein the one or more changed confidence scores are employed to provide the score value; and
storing the one or more model records in a data store, wherein the one or more model records are added to the data model.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed towards normalizing ingested data sets based on fuzzy comparisons to known data sets. Raw data sets that each include raw records may be provided to an ingestion engine. Ingestion rules and known data sets may be provided based on the raw records. The ingestion engine may be employed to iteratively execute the ingestion rules. A comparison of the raw records to the known data sets may be performed. Contents of the raw records may be transformed into model record values and stored in model records. A score value that indicates a confidence level that the model records are correct may be provided. An association of the one or more ingestion rules used to transform the raw record contents into the model record values for each of the one or more model records may be added to a data model.
250 Citations
28 Claims
-
1. A method for ingesting data for a data model using a network computer that employs one or more processors to execute instructions that perform actions, comprising:
-
providing one or more raw data sets to an ingestion engine, wherein each raw data set includes one or more raw records; providing one or more ingestion rules associated with one or more confidence scores and one or more known data sets based on a type of the one or more raw records; employing the ingestion engine to iteratively execute the one or more ingestion rules, performing further actions, including; providing a comparison of one or more portions of the one or more raw records to the one or more known data sets; transforming contents of the one or more raw records into one or more model record values based on the comparison to the one or more known data sets; storing the one or more model record values in one or more model records; providing a score value that indicates a confidence level that the one or more model records are correct based on the one or more confidence scores; and storing an association of the one or more ingestion rules used to transform the raw record contents into the model record values stored in the one or more model records; and when the score value that indicates the confidence level of the one or more model records is less than a threshold value, performing further actions, including; providing a user-interface to interactively edit the one or more raw records or the one or more ingestion rules, wherein the edited one or more ingestion rules produce an increase change or a decrease change in the one or more confidence scores, wherein the one or more changed confidence scores are employed to provide the score value; and storing the one or more model records in a data store, wherein the one or more model records are added to the data model. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for ingesting data for a data model, comprising:
-
a network computer, comprising; a transceiver that communicates over the network; a memory that stores at least instructions; and a processor device that executes instructions that perform actions, including; providing one or more raw data sets to an ingestion engine, wherein each raw data set includes one or more raw records; providing one or more ingestion rules associated with one or more confidence scores and one or more known data sets based on a type of the one or more raw records; employing the ingestion engine to iteratively execute the one or more ingestion rules, performing further actions, including; providing a comparison of one or more portions of the one or more raw records to the one or more known data sets; transforming contents of the one or more raw records into one or more model record values based on the comparison to the one or more known data sets; storing the one or more model record values in one or more model records; providing a score value that indicates a confidence level that the one or more model records are correct based on the one or more confidence scores; and storing an association of the one or more ingestion rules used to transform the raw record contents into the model record values stored in the one or more model records; when the score value that indicates the confidence level of the one or more model records is less than a threshold value, performing further actions, including; providing a user-interface to interactively edit the one or more raw records or the one or more ingestion rules, wherein the edited one or more ingestion rules produce an increase change or a decrease change in the one or more confidence scores, wherein the one or more changed confidence scores are employed to provide the score value; and storing the one or more model records in a data store, wherein the one or more model records are added to the data model; and a client computer, comprising; a transceiver that communicates over the network; a memory that stores at least instructions; and a processor device that executes instructions that perform actions, including; providing the user-interface to a user; and providing one or more user interactions to the ingestion engine. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A processor readable non-transitory storage media that includes instructions for ingesting data for a data model, wherein execution of the instructions by a hardware processor performs actions, comprising:
-
providing one or more raw data sets to an ingestion engine, wherein each raw data set includes one or more raw records; providing one or more ingestion rules associated with one or more confidence scores and one or more known data sets based on a type of the one or more raw records; employing the ingestion engine to iteratively execute the one or more ingestion rules, performing further actions, including; providing a comparison of one or more portions of the one or more raw records to the one or more known data sets; transforming contents of the one or more raw records into one or more model record values based on the comparison to the one or more known data sets; storing the one or more model record values in one or more model records; providing a score value that indicates a confidence level that the one or more model records are correct based on the one or more confidence scores; and storing an association of the one or more ingestion rules used to transform the raw record contents into the model record values stored in the one or more model records; when the score value that indicates the confidence level of the one or more model records is less than a threshold value, performing further actions, including; providing a user-interface to interactively edit the one or more raw records or the one or more ingestion rules, wherein the edited one or more ingestion rules produce an increase change or a decrease change in the one or more confidence scores, wherein the one or more changed confidence scores are employed to provide the score value; and storing the one or more model records in a data store, wherein the one or more model records are added to the data model. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A network computer for ingesting data for a data model, comprising:
-
a transceiver that communicates over the network; a memory that stores at least instructions; and a processor device that executes instructions that perform actions, including; providing one or more raw data sets to an ingestion engine, wherein each raw data set includes one or more raw records; providing one or more ingestion rules associated with one or more confidence scores and one or more known data sets based on a type of the one or more raw records; employing the ingestion engine to iteratively execute the one or more ingestion rules, performing further actions, including; providing a comparison of one or more portions of the one or more raw records to the one or more known data sets; transforming contents of the one or more raw records into one or more model record values based on the comparison to the one or more known data sets; storing the one or more model record values in one or more model records; providing a score value that indicates a confidence level that the one or more model records are correct based at least on the one or more confidence scores; storing an association of the one or more ingestion rules used to transform the raw record contents into the model record values stored in the one or more model records; when the score value that indicates the confidence level of the one or more model records is less than a threshold value, performing further actions, including; providing a user-interface to interactively edit the one or more raw records or the one or more ingestion rules, wherein the edited one or more ingestion rules produce an increase change or a decrease change in the one or more confidence scores, wherein the one or more changed confidence scores are employed to provide the score value; and storing the one or more model records in a data store, wherein the one or more model records are added to the data model. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
Specification