Finding similarities in data records
First Claim
1. A computer-implemented method comprising:
- executing an action on first data and second data as part of a first similarity function, the first similarity function performed to determine a similarity between the first data and the second data; and
using a result of executing the action to enable;
execution of the first similarity function, where the first similarity function is performed to determine a similarity between the first data and third data, without having to execute the action on the first data;
or execution of a second similarity function that;
is different from the first similarity function;
requires execution of the action on the first data or the second data; and
is performed to determine a similarity between the first data and the second data or fourth data without having to execute the action on the first data or the second data.
2 Assignments
0 Petitions
Accused Products
Abstract
System(s) and/or method(s) (“tools”) are described that enable actions to be reused that are common to multiple similarity functions. The tools may do so, in one embodiment, by composing similarity functions into a single, composed function that performs actions once that are common to multiple similarity functions. This composed function may also permit data to be analyzed in one pass and/or render unnecessary a merge operation. The tools may also enable actions to be reused when a similarity function is performed multiple times. The tools may do so, in one embodiment, by retaining a result of performing an action and using that result when performing the similarity function again. The tools may also enable records to be compared using a flip-window algorithm. This algorithm may be an efficient way in which to compare records in a table to determine which of those records are similar or duplicates.
23 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
executing an action on first data and second data as part of a first similarity function, the first similarity function performed to determine a similarity between the first data and the second data; and
using a result of executing the action to enable;
execution of the first similarity function, where the first similarity function is performed to determine a similarity between the first data and third data, without having to execute the action on the first data;
orexecution of a second similarity function that;
is different from the first similarity function;
requires execution of the action on the first data or the second data; and
is performed to determine a similarity between the first data and the second data or fourth data without having to execute the action on the first data or the second data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more computer-readable media having computer-readable instructions therein that, when executed by a computer, cause the computer to perform acts comprising:
-
receiving multiple similarity functions performance of which are capable of producing multiple results, the multiple results capable of being merged into a single result with a merge operation; and
composing the multiple similarity functions into a single function capable of producing the single result. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer-implemented method comprising:
-
comparing records of a first window to provide one or more first sets of duplicate records;
comparing records of a second window and at least one duplicate record of each set of the first sets of duplicate records to provide one or more second sets of duplicate records; and
comparing records of a third window and at least one duplicate record of each set of the second sets of duplicate records to provide one or more third sets of duplicate records. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification