DATA AGGREGATION
First Claim
1. A method of linking data records from a plurality of sources, the method comprising the steps of:
- a) receiving data records from a plurality of data sources;
b) cleaning the received data records;
c) matching received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model, based on matching results, that represents the data records as a map of linked data records, wherein the data records are not merged by the matching step; and
d) generating a report based on at least the matching results represented by the virtual data model.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus for aggregating data and for building a virtual data model 4 of an organisation'"'"'s data which will typically be held in a plurality of different data source 1. The method and apparatus function by first standardising and splitting 2 the data into different types, then performing a cleaning operation 3 on the standardised and split data and from this building a virtual data model 4 which includes the cleaned data as well as an audit trail. The process and apparatus then perform matching and de-duplication operations on the cleaned data. This allows the output of a data set which has been improved, standardised and is of known quality.
-
Citations
43 Claims
-
1. A method of linking data records from a plurality of sources, the method comprising the steps of:
-
a) receiving data records from a plurality of data sources;
b) cleaning the received data records;
c) matching received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model, based on matching results, that represents the data records as a map of linked data records, wherein the data records are not merged by the matching step; and
d) generating a report based on at least the matching results represented by the virtual data model.
-
-
7. A method of linking data records according to claim 44 wherein the report is based on at least the matching results represented by the virtual data model and the audit trail.
-
11. A method of linking data records according to claim 48 in which the data types comprise names and addresses, and the cleaning step is applied to names and addresses included in the received data records.
-
12. A method of linking data records according to claim 48 in which the data types include at least one of:
- dates;
reference numbers;
telephone numbers;
e-mail addresses and cleaning is carried out in respect of any one or any combination of these other data types.
- dates;
-
14. A method of linking data records according to claim 51 in which the predetermined standard comprises a predetermined list.
-
15. A method of linking data records according to claim 52 which is such as to allow a user to select at least one list against which data records are to be standardized.
-
17. A method of linking data records according to claim 54 which is such as to allow a user to select at least one rule which is applied to the data records in the cleaning step.
-
18. A method of linking data records according to claim 54 in which the rules are used to at least one of:
- change the data records to a standardized form, correct data records, and complete data records.
-
22. A method of linking data records according to claim 59 in which the step of matching data records comprises the step of comparing a plurality of data items in respective data records to decide whether the data records relate to a common entity.
-
23. A method of linking data records according to claim 60 in which at least one threshold level of similarity between data items is specified, such that the threshold must be met or exceeded before a match is determined.
-
24. A method of linking data records according to claim 59 in which matching rules specify a plurality of matching criteria at least one of which must be met before a match can be determined.
-
25. A method of linking data records according to claim 62 in which each matching criterion identifies at least one predetermined type of data item and at least one similarity threshold.
-
29. A method of linking data records according to claim 66 in which an audit trail is maintained so as to keep a record of changes made to the data records in the de-duplication step.
-
31. A method of linking data records according to claim 66 in which the de-duplication step is performed iteratively.
-
33. A method of linking data records according to claim 44 in which the audit trail comprises a measure of the quality of the data records.
-
34. A computer program product comprising at least one data carrier carrying a computer program comprising code portions that when loaded and run on a computer cause the computer to carry out a method of:
-
a) receiving data records from a plurality of data sources;
b) cleaning the received data records;
c) matching received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model, based on matching results, that represents the data records as a map of linked data records, wherein the data records are not merged by the matching step; and
d) generating a report based on at least the matching results represented by the virtual data model.
-
-
35. A method of generating a virtual data model comprising the steps of:
-
a) receiving data from a plurality of data sources;
b) cleaning the received data records; and
c) linking received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model, based on matching results, that represents the data records as a map of linked data records.
-
-
36. A method of linking data records from a plurality of sources comprising the steps of:
-
a) receiving data records from a plurality of data sources;
b) cleaning the received data records, wherein an audit trail is maintained for any changes made to the received data records during the cleaning step;
c) matching received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model while maintaining data records as unmerged entities; and
e) generating a report based on at least the virtual data model and the audit trail, the report indicating the quality of the data records from a plurality of sources.
-
-
37. A method of aggregating data comprising the steps of:
-
a) receiving data from a plurality of sources;
b) cleaning the received data, whilst maintaining an audit trail of any changes made to the data in the cleaning step;
c) creating a data set comprising the cleaned data and the audit trail; and
d) generating output data using said data set.
-
-
38. Apparatus arranged under the control of software for linking data records from a plurality of sources by:
-
a) receiving data records from a plurality of data sources;
b) cleaning the received data records;
c) matching received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model, based on matching results, that represents the data records as a map of linked data records, wherein the data records are not merged by the matching step; and
d) generating a report based on at least the matching results represented by the virtual data model.
-
- 39. Apparatus according to claim 76 which is arranged to output a query notification when unable to automatically clean a data item.
- 40. Apparatus according to claim 77 which is arranged to, allow input of a decision to resolve the query, and complete the cleaning step for that data item based on that decision.
-
41. Apparatus according to claim 76 which is arranged to learn from a decision input to resolve a query to aid in the cleaning of future data items.
-
42. Apparatus which is arranged to produce a computer program product comprising at least one data carrier carrying a computer program comprising code portions that when loaded and run on a computer, arrange the computer as an apparatus to:
-
a) receive data records from a plurality of data sources;
b) clean the received data records;
c) match received data records together according to matching rules for examining the similarity between respective data records, and building a virtual data model, based on matching results, that represents the data records as a map of linked data records, wherein the data records are not merged by the matching step; and
d) generate a report based on at least the matching results represented by the virtual data model. - View Dependent Claims (5, 10, 32)
-
-
43. A method of generating a virtual data model representing data held by an organization in a plurality of distinct data sources comprising the steps of:
-
a) receiving data from the plurality of data sources;
b) cleaning the received data, whilst maintaining an audit trail of any changes made to the data in the cleaning step;
c) creating a virtual data model, comprising the cleaned data, linkages between the cleaned data, and the audit trail.
-
Specification