Keyword extraction apparatus for Japanese texts
First Claim
1. A keyword extraction apparatus for extracting keywords from Japanese text data, comprising:
- sentence segmentation means for segmenting the Japanese text data into sentence-by-sentence data;
analytical information storage means for storing information regarding mutual continuation between morphemes;
morpheme analysis means for dividing the sentence-by-sentence data segmented by the sentence segmentation means into morphemes and for analyzing the morphemes;
morpheme information storage means for storing morpheme information on a morpheme-by-morpheme basis, the morpheme information including part of speech information, semantic classification information, sentence pattern information, and noted term information;
morpheme information development means for developing morpheme information with respect to each morpheme analyzed by the morpheme analysis means, on a basis of the morpheme information stored in the morpheme information storage means;
keyword candidate extraction means for extracting keyword candidates from the sentence-by-sentence data, on a basis of the morpheme information developed by the morpheme information development means;
noted term information storage means for storing information regarding case classes of keyword candidates, among all of the keyword candidates, that immediately precede noted terms;
case class conversion information storage means for storing relational information between case types and the case classes;
case information acquisition means for acquiring case classes of the keyword candidates on a basis of the information stored in the noted term information storage means, and for acquiring case types corresponding to the acquired case classes on a basis of the relational information stored in the case class conversion information storage means;
frequency information acquisition means for acquiring an appearance frequency of each keyword candidate by classifying each keyword candidate into the case types obtained from the case information acquisition means, and for acquiring a number of all morphemes in the Japanese text data, the number of all morphemes being indicative of a length of the Japanese text data;
importance calculating means for calculating a frequency score on a basis of the appearance frequency of each keyword candidate and the number of all morphemes in the Japanese text data, for calculating a class-by-class appearance frequency of each keyword candidate in the Japanese text data, and for calculating an overall importance of each keyword candidate on a basis of the corresponding frequency score and the class-by-class appearance frequency; and
keyword finalizing means for determining keywords from the keyword candidates, wherein the keywords have a corresponding overall importance obtained from the importance calculating means which exceeds a predetermined value.
1 Assignment
0 Petitions
Accused Products
Abstract
Sentence segmentation means performing sentence segmentation on the Japanese text data to be processed. Morpheme analysis means divides sentence-by-sentence data into morphemes and analyzes the resultant morphemes on the basis of information regarding morpheme-by-morpheme continuation contained in an analytical dictionary. Morpheme dictionary information development means develops the contents of the morpheme dictionary including part of speech information, semantic classification information, sentence pattern information and noted term information. Keyword candidate extraction means extracts keyword candidates from sentence-by-sentence data on the basis of the part of speech information and the like of each morpheme. Case information acquisition means acquires case information from information regarding the classes of case of keyword candidates immediately preceding noted terms stored in a noted term table and case class classification information for stored in a case class conversion table. Frequency information acquisition means acquires the appearance frequency of each keyword candidate. Importance calculation means calculates the importance of each keyword candidate as keyword. Keyword finalizing means definitely determines as true keywords only those keyword candidates having degrees of importance above a designated level of importance.
-
Citations
3 Claims
-
1. A keyword extraction apparatus for extracting keywords from Japanese text data, comprising:
-
sentence segmentation means for segmenting the Japanese text data into sentence-by-sentence data; analytical information storage means for storing information regarding mutual continuation between morphemes; morpheme analysis means for dividing the sentence-by-sentence data segmented by the sentence segmentation means into morphemes and for analyzing the morphemes; morpheme information storage means for storing morpheme information on a morpheme-by-morpheme basis, the morpheme information including part of speech information, semantic classification information, sentence pattern information, and noted term information; morpheme information development means for developing morpheme information with respect to each morpheme analyzed by the morpheme analysis means, on a basis of the morpheme information stored in the morpheme information storage means; keyword candidate extraction means for extracting keyword candidates from the sentence-by-sentence data, on a basis of the morpheme information developed by the morpheme information development means; noted term information storage means for storing information regarding case classes of keyword candidates, among all of the keyword candidates, that immediately precede noted terms; case class conversion information storage means for storing relational information between case types and the case classes; case information acquisition means for acquiring case classes of the keyword candidates on a basis of the information stored in the noted term information storage means, and for acquiring case types corresponding to the acquired case classes on a basis of the relational information stored in the case class conversion information storage means; frequency information acquisition means for acquiring an appearance frequency of each keyword candidate by classifying each keyword candidate into the case types obtained from the case information acquisition means, and for acquiring a number of all morphemes in the Japanese text data, the number of all morphemes being indicative of a length of the Japanese text data; importance calculating means for calculating a frequency score on a basis of the appearance frequency of each keyword candidate and the number of all morphemes in the Japanese text data, for calculating a class-by-class appearance frequency of each keyword candidate in the Japanese text data, and for calculating an overall importance of each keyword candidate on a basis of the corresponding frequency score and the class-by-class appearance frequency; and keyword finalizing means for determining keywords from the keyword candidates, wherein the keywords have a corresponding overall importance obtained from the importance calculating means which exceeds a predetermined value. - View Dependent Claims (2)
-
-
3. A keyword extraction method for extracting keywords from Japanese text data, comprising the steps of:
-
a) segmenting the Japanese text data into sentence-by-sentence data; b) storing information regarding mutual continuation between morphemes; c) partitioning the segmented sentence-by-sentence data into morphemes and analyzing the morphemes; d) storing morpheme information on a morpheme-by-morpheme basia, the morpheme information including part of speech information, semantic classification information, sentence pattern information, and noted term information; e) developing morpheme information with respect to each analyzed morpheme on a basis of the stored morpheme information; f) extracting keyword candidates from segmented sentence-by-sentence data on a basis of the stored morpheme information; g) storing information regarding case classes of keyword candidates, among all of the keyword candidates, that immediately precede noted terms; h) storing relational information between case types and the case classes; i) acquired case classes of the keyword candidates on a basis of the information regarding case classes; j) acquiring case types corresponding to the acquired case classes on a basis of the relational information; k) acquiring an appearance frequency of each keyword candidate by classifying each keyword candidate into the acquired case types; l) acquiring a number of all morphemes in the Japanese text data, the number of all morphemes being indicative of a length of the Japanese text data; m) calculating a frequency score on a basis of the appearance frequency of each keyword candidate and the number of all morphemes in the Japanese text data; n) calculating a class-by-class appearance frequency of each keyword candidate in the Japanese text data; o) calculating an overall importance of each keyword candidate on a basis of the corresponding frequency score and the class-by-class appearance frequency; and p) determining keywords from all of the keyword candidates, wherein the keywords have a corresponding overall importance which exceeds a threshold value.
-
Specification