PATTERN TREE-BASED RULE LEARNING
First Claim
Patent Images
1. A method comprising:
- obtaining a set of Uniform Resource Locators (URLs) and corresponding content from a targeted website;
decomposing each URL into a group of key-value pairs;
constructing, by a processor, a tree having a plurality of nodes, each node of the tree representing a group of URLs having a common pattern;
identifying one or more pairs of nodes corresponding to duplicate content in which a first node in a pair of nodes corresponds to first content that substantively matches second content corresponding to a second node in the pair of nodes;
generating a candidate rule for each of the one or more pairs of nodes, the candidate rule relating a URL of the first node to a URL of the second node; and
selecting one or more of the candidate rules as one or more deployable rules.
2 Assignments
0 Petitions
Accused Products
Abstract
A pattern tree is constructed based on a plurality of key-value pairs representing portions of a data set. In some implementations, the pattern tree may be used for learning one or more rules for interacting with a source of the data set.
-
Citations
20 Claims
-
1. A method comprising:
-
obtaining a set of Uniform Resource Locators (URLs) and corresponding content from a targeted website; decomposing each URL into a group of key-value pairs; constructing, by a processor, a tree having a plurality of nodes, each node of the tree representing a group of URLs having a common pattern; identifying one or more pairs of nodes corresponding to duplicate content in which a first node in a pair of nodes corresponds to first content that substantively matches second content corresponding to a second node in the pair of nodes; generating a candidate rule for each of the one or more pairs of nodes, the candidate rule relating a URL of the first node to a URL of the second node; and selecting one or more of the candidate rules as one or more deployable rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
-
obtaining a set of data from a source; generating a plurality of key-value pairs from the set of data, the key-value pairs representing data portions of the set of data; and constructing a pattern tree by; creating a root node for the set of data; choosing a particular key having a smallest distribution of values; splitting the data into multiple subgroups according to the values of the particular key identified as having the smallest distribution of values to generate additional nodes of a next hierarchical level. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A computing device comprising:
-
a processor coupled to computer-readable storage media containing instructions executable by the processor; tree construction component implemented by the processor to construct a pattern tree based on a plurality of key-value pairs, the key-value pairs representing portions of a plurality of Uniform Resource Locators (URLs); and a rule generation component implemented to identify a pair of nodes of the tree relating to duplicate content for generating a rule. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification