System for matching pattern-based data
First Claim
1. A system for matching pattern-based data, comprising a computer which executes:
- a pattern construction module for deriving a first pattern from a first input set of values and a second pattern from a second input set of values;
a similarity computation module for computing a similarity of the first pattern and the second pattern;
a learning module, wherein the system learns the first pattern and the second pattern by constructing an automaton comprised of nodes and transitions between the nodes, and support levels are calculated for the nodes;
a delimiter removal module that removes one or more delimiters from the first pattern by calculating the support level at one of the nodes by summing support values of incoming transitions to that node;
a matching module for matching the first input set of values with the second input set of values based on the similarity computation; and
an output module that outputs the similarity match to a user.
0 Assignments
0 Petitions
Accused Products
Abstract
A pattern-based data matching system matches pattern-based data. The data matching system generates a regular expression pattern for input datasets and describes similarity measures between the generated patterns. The data matching system analyzes an input dataset in terms of symbol classes, generalizing input values into a general pattern to allow identification or extrapolation of overlap between input datasets, aiding in matching fields in databases that are being merged and in learning a pattern for an input dataset. For each sequence of data values, the present system computes a compact pattern describing the sequence. Embodiments of the data matching system comprise noise reduction and repetitive pattern discovery in the input dataset and calculation of recall and precision of the generated pattern.
-
Citations
7 Claims
-
1. A system for matching pattern-based data, comprising a computer which executes:
-
a pattern construction module for deriving a first pattern from a first input set of values and a second pattern from a second input set of values; a similarity computation module for computing a similarity of the first pattern and the second pattern; a learning module, wherein the system learns the first pattern and the second pattern by constructing an automaton comprised of nodes and transitions between the nodes, and support levels are calculated for the nodes; a delimiter removal module that removes one or more delimiters from the first pattern by calculating the support level at one of the nodes by summing support values of incoming transitions to that node; a matching module for matching the first input set of values with the second input set of values based on the similarity computation; and an output module that outputs the similarity match to a user. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product having a plurality of executable instruction codes that are stored on a computer-readable storage medium, for matching pattern-based data, comprising:
-
instruction codes that derive a first pattern from a first input set of values and a second pattern from a second input set of values; instruction codes that compute a similarity of the first pattern and the second pattern; instruction codes that perform a learning of the first pattern and the second pattern by constructing an automaton comprised of nodes and transitions between the nodes; instruction codes that calculate support levels for the nodes; instruction codes that remove one or more delimiters from the first pattern by calculating the support level at one of the nodes by summing support values of incoming transitions to that node; instruction codes that match the first input set of values with the second input set of values based on the similarity computation; and instruction codes that output the similarity match to a user.
-
-
7. A method of matching pattern-based data, comprising:
-
deriving a first pattern from a first input set of values and a second pattern from a second input set of values; computing by using a computer processor a similarity of the first pattern and the second pattern; performing a learning of the first pattern and the second pattern by constructing an automaton comprised of nodes and transitions between the nodes; calculating support levels for the nodes; removing one or more delimiters from the first pattern by calculating the support level at one of the nodes by summing support values of incoming transitions to that node; initializing an expansion factor; expanding a language of the first and second patterns at the expansion factor, the expanding of the language diminishing a size of the first and second patterns and decreasing a precision of the first and second patterns; matching the first input set of values with the second input set of values based on the similarity computation; and outputting the similarity match to a user.
-
Specification