System and method for analyzing language using supervised machine learning method
First Claim
1. A system for analyzing Japanese language using supervised learning method, the system comprising:
- sentence data storage means for storing sentence data which do not include solutions for a target problem;
problem expression storage means for storing problem expression data comprising a problem expression which indicates an object of a language analysis and information of expressions corresponding to said problem expression;
problem expression extraction processing means for extracting a portion which corresponds to any one of the expressions corresponding to the problem expression from said sentence data by using a predetermined language analysis and replacing the extracted portion of the sentence data with the problem expression;
supervised data creation processing means for creating a plurality of supervised data, which is formed as a pair of a problem and either a solution or a solution candidate, wherein the pair comprises the sentence data in which the portion is replaced with the problem expression as the problem and either the portion extracted from said sentence data by the problem expression extracting processing means as the solution or the portion extracted from other sentence data except said sentence data, which are stored in said sentence data storage means as the solution candidate;
supervised data features obtaining processing means for obtaining a plurality of predetermined syntactic supervised data features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from each sentence of the supervised data using syntactic analysis and then generating solution/features pairs of each sentence of the supervised data, wherein the solution/features pairs are a positive example having the plurality of supervised data features and the solution and negative examples having the plurality of supervised data features and each one of the solution candidates;
machine learning processing means for performing machine learning, processing on the solution/features pairs using a kernel function executed as a support vector machine, by classifying the solution based upon generating a hyperplane which maximizes an interval of the positive and negative examples and divides these two examples by the hyperplane on a space having dimensions determined by the plurality of obtained featuresand storing the hyperplane as the result of the machine learning processing in the learning result storing database;
object sentence data obtaining processing means for inputting object sentence data and obtaining a plurality of syntactic object sentence features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from the input object sentence data using the syntactic analysis; and
solution extrapolation processing means for using the stored hyperplane to determine which divided part of the space does the plurality of the syntactic object sentence features belong to, and estimates a determined part with highest probability as the solution as classified for the plurality of syntactic object sentence features.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for analyzing language using supervised learning method. The system extracts portions matching the structures of problem expressions from a raw corpus that is not supplemented with analysis information, then converts the extracted portions corresponding to the problem expressions into supervised data including problems and solutions and stores in the data storage. The system extracts sets of solutions and features from the supervised data stored in the data storage, carries out machine learning processing using the sets and stores learned results as to what kind of solution is the most straightforward for which feature in the learning results database. The system then extracts sets of features from the inputting object data, extrapolates analysis information showing the most optimum for a certain feature, from the sets of features based on the learning results database.
100 Citations
7 Claims
-
1. A system for analyzing Japanese language using supervised learning method, the system comprising:
-
sentence data storage means for storing sentence data which do not include solutions for a target problem; problem expression storage means for storing problem expression data comprising a problem expression which indicates an object of a language analysis and information of expressions corresponding to said problem expression; problem expression extraction processing means for extracting a portion which corresponds to any one of the expressions corresponding to the problem expression from said sentence data by using a predetermined language analysis and replacing the extracted portion of the sentence data with the problem expression; supervised data creation processing means for creating a plurality of supervised data, which is formed as a pair of a problem and either a solution or a solution candidate, wherein the pair comprises the sentence data in which the portion is replaced with the problem expression as the problem and either the portion extracted from said sentence data by the problem expression extracting processing means as the solution or the portion extracted from other sentence data except said sentence data, which are stored in said sentence data storage means as the solution candidate; supervised data features obtaining processing means for obtaining a plurality of predetermined syntactic supervised data features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from each sentence of the supervised data using syntactic analysis and then generating solution/features pairs of each sentence of the supervised data, wherein the solution/features pairs are a positive example having the plurality of supervised data features and the solution and negative examples having the plurality of supervised data features and each one of the solution candidates; machine learning processing means for performing machine learning, processing on the solution/features pairs using a kernel function executed as a support vector machine, by classifying the solution based upon generating a hyperplane which maximizes an interval of the positive and negative examples and divides these two examples by the hyperplane on a space having dimensions determined by the plurality of obtained featuresand storing the hyperplane as the result of the machine learning processing in the learning result storing database; object sentence data obtaining processing means for inputting object sentence data and obtaining a plurality of syntactic object sentence features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from the input object sentence data using the syntactic analysis; and solution extrapolation processing means for using the stored hyperplane to determine which divided part of the space does the plurality of the syntactic object sentence features belong to, and estimates a determined part with highest probability as the solution as classified for the plurality of syntactic object sentence features. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A Japanese language ellipsis analysis processing method for carrying out ellipsoidal analysis including transformation by paraphrasing using machine learning method, the method comprising:
-
storing sentence data, which do not include solutions for a target problem, in a sentence data storage; storing problem expression data, each data comprising a problem expression that is the object of language analysis and information of expressions corresponding to that problem expression, in a problem expression storage; extracting a portion of each sentence data that matches any of the expressions corresponding to the problem expression using a predetermined language analysis method and replacing the extracted portion of the sentence data with the problem expression; creating supervised data as a pair of a problem and either a solution or a solution candidate for each sentence data, the problem being the sentence data in which the extracted portion has been replaced with the problem expression, the solution being the extracted portion of the sentence data, and the solution candidate being extracted from other sentence data; obtaining a plurality of predetermined syntactic supervised data features, which include one or more of a part of speech, root formm, lexical category, dependency structure and modification structure, from each sentence of the supervised data using syntactic analysis and then generating solution/features pairs, for each sentence of the supervised data, wherein the solution/features pairs are a positive example having the plurality of supervised data features and the solution and negative examples having the plurality of supervised data features and each one of the solution candidates; performing machine learning on the solution/features pairs using a kernel function executed as a support vector machine, by classifying the solution based upon generating a hyperplane which maximizes an interval of the positive and negative examples and divides these two examples by the hyperplane on a space having dimensions determined by the plurality of obtained features and storing the hyperplane as a result of the machine learning in a learning result database; inputting object sentence data and obtaining a plurality of syntactic object sentence features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from the input object sentence data using syntactic analysis; and using the stored hyperplane to determine which divided part of the space does the plurality of the syntactic object sentence features belong to, and estimates a determined part with highest probability as the solution as classified for the plurality of syntactic object sentence features.
-
-
7. An apparatus analyzing Japanese language using supervised learning method, the system comprising:
-
sentence data storage storing sentence data which do not include solutions for a target problem; problem expression storage storing problem expression data comprising a problem expression which indicates an object of a language analysis and information of expressions corresponding to said problem expression; and a controller, extracting a portion which corresponds to any one of the expressions corresponding to the problem expression from the sentence data by using a predetermined language analysis and replacing the extracted portion of the sentence data with the problem expression, creating a plurality of supervised data which is formed as a pair of a problem and either a solution or a solution candidate, wherein the pair comprises the sentence data in which the portion is replaced with the problem expression as the problem and the portion extracted from said sentence data by the problem expression extracting processing means as the solution or the portion extracted from other sentence data as the solution candidate, obtaining a plurality of predetermined syntactic supervised data features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from each supervised data using syntactic analysis and then generating solution/features pairs for each sentence of the supervised data, wherein the solution/features pairs are a positive example having the plurality of supervised data features and the solution and negative examples having the plurality of supervised data features and each one of the solution candidates, performing machine learning processing on the solution/features pairs using a kernel function executed as a support vector machine, classifying the solution based upon generating a hyperplane which maximizes an interval of the positive and negative examples and divides these two examples by the hyperplane on a space having dimensions determined by the plurality of obtained features and storing the hyperplane in a learning result database, inputting object sentence data and obtaining a plurality of syntactic object sentence features, which include one or more of a part of speech, root form, lexical category, dependency structure and modification structure from the input object sentence data using the syntactic analysis, and using the stored hyperplane in determining which divided part of the space does the plurality of the syntactic object sentence features belong to, and estimating a determined part with highest probability as the solution as classified for the plurality of syntactic object sentence features.
-
Specification