Centralized data reconciliation using artificial intelligence mechanisms
First Claim
1. A centralized data reconciliation system, comprising:
- at least one processor;
at least one non-transitory data storage storing thereon an custom dictionary that includes tokens associated with a first self-describing data stream and a second self-describing data stream, the tokens used for the data matching and the custom dictionary being dynamically updateable based on data streams that are received by the data reconciliation system; and
at least one non-transitory computer readable medium storing machine-readable instructions that cause the at least one processor to;
convert at least two data streams originating at a first data system and a second data system into respective at least two self-describing data streams including the first self-describing data stream and the second self-describing data stream, wherein the first self-describing data stream includes respective data records and a first data model and the second self-describing data stream includes respective data records and a second data model;
map the data records in the first self-describing data stream that include entities and entity attributes to entities and entity attributes in the data records of the second self-describing data stream by employing one or more of the custom dictionary and rules of data reconciliation via one or more two-way matchings;
generate respective confidence scores for the mappings wherein the confidence scores indicate a degree of matching between the mapped data records based at least on rules of data reconciliation;
identify one or more of the data records in the second self-describing data stream that match one or more of the data records in the first self-describing data stream from the mappings based at least on the confidence scores;
determine unmatched data records from the data records from the first and the second self-describing data streams based at least on the confidence scores;
classify the unmatched data records into categorized records and irreconcilable records, the categorized records are categorized into one or more reason categories, and the irreconcilable records including the unmatched data records that could not be categorized into the reason categories;
generate one or more of reasons and recommendations for at least a subset of the categorized records; and
automatically update one or more of the custom dictionary, the reason categories and the rules of data reconciliation based on user inputs received for the irreconcilable records for which the reasons and recommendations could not be generated.
1 Assignment
0 Petitions
Accused Products
Abstract
A centralized data reconciliation system processes at least two data streams transmitting data related to one of a plurality of processes and executes a data reconciliation procedure. Unmatched data records identified during the data reconciliation procedure are further categorized into categorized records based on various reason categories and irreconcilable records which could not be categorized into the reason categories. The irreconcilable records are flagged for user input. The user input is recorded to further train the data reconciliation system. The at least two data streams are initially converted into self-describing data streams from which the entities and entity attributes are extracted using the data models received from the data streams. The data records from the first and second self-describing data streams are mapped. The matched pairs and unmatched pairs are selected from the mappings based on respective confidence scores that are estimated in accordance with the rules of data reconciliation.
-
Citations
20 Claims
-
1. A centralized data reconciliation system, comprising:
-
at least one processor; at least one non-transitory data storage storing thereon an custom dictionary that includes tokens associated with a first self-describing data stream and a second self-describing data stream, the tokens used for the data matching and the custom dictionary being dynamically updateable based on data streams that are received by the data reconciliation system; and at least one non-transitory computer readable medium storing machine-readable instructions that cause the at least one processor to; convert at least two data streams originating at a first data system and a second data system into respective at least two self-describing data streams including the first self-describing data stream and the second self-describing data stream, wherein the first self-describing data stream includes respective data records and a first data model and the second self-describing data stream includes respective data records and a second data model; map the data records in the first self-describing data stream that include entities and entity attributes to entities and entity attributes in the data records of the second self-describing data stream by employing one or more of the custom dictionary and rules of data reconciliation via one or more two-way matchings; generate respective confidence scores for the mappings wherein the confidence scores indicate a degree of matching between the mapped data records based at least on rules of data reconciliation; identify one or more of the data records in the second self-describing data stream that match one or more of the data records in the first self-describing data stream from the mappings based at least on the confidence scores; determine unmatched data records from the data records from the first and the second self-describing data streams based at least on the confidence scores; classify the unmatched data records into categorized records and irreconcilable records, the categorized records are categorized into one or more reason categories, and the irreconcilable records including the unmatched data records that could not be categorized into the reason categories; generate one or more of reasons and recommendations for at least a subset of the categorized records; and automatically update one or more of the custom dictionary, the reason categories and the rules of data reconciliation based on user inputs received for the irreconcilable records for which the reasons and recommendations could not be generated. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for data reconciliation comprising:
-
receiving at least two data streams including at least a first data stream and a second data stream from a first data system and a second data system respectively; building respective feature vectors from the first data stream and the second data stream; converting the first data stream and second data stream into respective self-describing data streams that each includes a respective data model based on the feature vectors; extracting entities and attributes of entities to be matched from the self-describing data streams using the respective data models; mapping the entities from a first one of the self-describing data streams to a second one of the self-describing data streams by employing an custom dictionary that enables mapping the entities in one or more two-way matches using rules of data reconciliation; estimating a confidence score for each of the mappings; identifying matched records and unmatched records from the mappings based on a comparison of the confidence scores with a confidence score threshold; categorizing a subset of the unmatched records into one or more reason categories and another subset of the unmatched records that could not be categorized as irreconcilable records; framing one or more hypotheses and a respective discrepancy confidence score for each of the hypotheses, the discrepancy confidence score being indicative of a confidence level of the hypotheses for the categorized records based on one or more of the rules of data reconciliation that were not fulfilled by the irreconcilable records; and generating a report including one or more of a reason and a recommendation for the categorized records having the respective discrepancy confidence scores above a confidence threshold; and flagging for user intervention, the irreconcilable records having the respective discrepancy confidence scores below a confidence threshold. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transitory computer-readable storage medium comprising machine-readable instructions that cause a processor to:
-
convert at least two data streams including a first data stream and a second data stream originating respectively at a first data system and a second data system into respective self-describing data streams including a first self-describing data stream and a second self-describing data stream, wherein the first self-describing data stream includes respective data records and a first data model and the second self-describing data stream includes respective data records and a second data model; map the data records in the first self-describing data stream that include entities and entity attributes to entities and entity attributes in the data records of the second self-describing data stream by employing one or more of an custom dictionary and rules of data reconciliation via one or more two-way matchings; generate respective confidence scores for the mappings wherein the confidence scores indicate a degree of matching between the mapped data records based at least on rules of data reconciliation; identify one or more of the data records in the second self-describing data stream that match one or more of the data records in the first self-describing data stream from the mappings based at least on the confidence scores; determine unmatched data records from the data records from the first and the second self-describing data streams based at least on the confidence scores; classify the unmatched records into categorized records and irreconcilable records, the categorized records being categorized into one or more reason categories, and the irreconcilable records include the unmatched data records that could not be categorized into the reason categories; generate one or more of reasons and recommendations for at least a subset of the irreconcilable records; and automatically update one or more of the custom dictionary, the reason categories and the rules of data reconciliation based on user inputs received for the irreconcilable records for which the reasons and recommendations could not be generated. - View Dependent Claims (19, 20)
-
Specification