Classification method of labeled ordered trees using support vector machines
First Claim
1. A data processing method for controlling a computer to classify XML semi-structured data instances in a database, said XML semi-structured data instances each having a vector representable structure having vector representable substructures as attributes, the method comprising:
- a first step of inputting the vector representable structure instances including their substructure instances; and
a second step of successively computing an inner product of successive vector representable structure instances including their substructure instances and storing the successively computed inner products, the method further comprising the steps of;
based on structures of positive instances and structures of negative instances stored in memory, obtaining a classification rule for classifying the positive instances and the negative instances and storing the classification rule in memory, each structure of the positive instances being a labeled ordered tree in which a node indication mark is added to a node to be indicated, and each structure of the negative instances being a labeled ordered tree in which a node indication mark is added to a node not to be indicated;
adding a node indication mark to a node in the labeled ordered tree to be processed and classifying the labeled ordered tree as positive or negative based on the classification rule stored in the memory; and
in response to the classification result, outputting a processing result of the labeled ordered tree with the node indication mark added to a node, to thereby create a classification rule;
wherein the first step further comprises, if the given substructures includes lower substructures, computing a sum of matches for the substructures.
1 Assignment
0 Petitions
Accused Products
Abstract
To achieve classification of semistructured data with a Kernel method for labeled ordered trees, instances having a labeled ordered tree structure are input and their inner product is computed, the result of which is used for classification learning of the instances. In the inner product computation, a sum of matches is computed for descendant nodes of non-leaf nodes of the labeled ordered trees by applying dynamic programming based on correspondence in which order of the nodes is maintained.
26 Citations
11 Claims
-
1. A data processing method for controlling a computer to classify XML semi-structured data instances in a database, said XML semi-structured data instances each having a vector representable structure having vector representable substructures as attributes, the method comprising:
- a first step of inputting the vector representable structure instances including their substructure instances; and
a second step of successively computing an inner product of successive vector representable structure instances including their substructure instances and storing the successively computed inner products, the method further comprising the steps of;
based on structures of positive instances and structures of negative instances stored in memory, obtaining a classification rule for classifying the positive instances and the negative instances and storing the classification rule in memory, each structure of the positive instances being a labeled ordered tree in which a node indication mark is added to a node to be indicated, and each structure of the negative instances being a labeled ordered tree in which a node indication mark is added to a node not to be indicated;
adding a node indication mark to a node in the labeled ordered tree to be processed and classifying the labeled ordered tree as positive or negative based on the classification rule stored in the memory; and
in response to the classification result, outputting a processing result of the labeled ordered tree with the node indication mark added to a node, to thereby create a classification rule;
wherein the first step further comprises, if the given substructures includes lower substructures, computing a sum of matches for the substructures. - View Dependent Claims (2, 3, 4, 5, 6, 7)
- a first step of inputting the vector representable structure instances including their substructure instances; and
-
8. A program product encoded on a computer readable meddium and comprising code for controlling a computer to execute classifying XML semi-structured data instances in a database, said XML semi-structured data instances each having a vector representable structure having vector representable substructures as attributes, the method comprising:
- a first step of inputting the vector representable structure instances including their substructure instances; and
a second step of successively computing an inner product of successive vector representable structure instances including their substructure instances and storing the successively computed inner products, the method further comprising the steps of;
based on structures of positive instances and structures of negative instances stored in memory, obtaining a classification rule for classifying the positive instances and the negative instances and storing the classification rule in memory, each structure of the positive instances being a labeled ordered tree in which a node indication mark is added to a node to be indicated, and each structure of the negative instances being a labeled ordered tree in which a node indication mark is added to a node not to be indicated;
adding a node indication mark to a node in the labeled ordered tree to be processed and classifying the labeled ordered tree as positive or negative based on the classification rule stored in the memory; and
in response to the classification result, outputting a processing result of the labeled ordered tree with the node indication mark added to a node, to thereby create a classification rule;
wherein the first step further comprises, if the given substructures includes lower substructures, computing a sum of matches for the substructures. - View Dependent Claims (9, 10)
- a first step of inputting the vector representable structure instances including their substructure instances; and
-
11. A computer readable recording medium in which a program for controlling a computer to classify XML semi-structured data instances in a database, said XML semi-structured data instances each having a vector representable structure having vector representable substructures as attributes, the method comprising:
- a first step of inputting the vector representable structure instances including their substructure instances; and
a second step of successively computing an inner product of successive vector representable structure instances including their substructure instances and storing the successively computed inner products, the method further comprising the steps of;
based on structures of positive instances and structures of negative instances stored in memory, obtaining a classification rule for classifying the positive instances and the negative instances and storing the classification rule in memory, each structure of the positive instances being a labeled ordered tree in which a node indication mark is added to a node to be indicated, and each structure of the negative instances being a labeled ordered tree in which a node indication mark is added to a node not to be indicated;
adding a node indication mark to a node in the labeled ordered tree to be processed and classifying the labeled ordered tree as positive or negative based on the classification rule stored in the memory; and
in response to the classification result, outputting a processing result of the labeled ordered tree with the node indication mark added to a node, to thereby create a classification rule;
wherein the first step further comprises, if the given substructures includes lower substructures, computing a sum of matches for the substructures.
- a first step of inputting the vector representable structure instances including their substructure instances; and
Specification