×

Keyword extraction apparatus for Japanese texts

  • US 5,619,410 A
  • Filed: 03/29/1994
  • Issued: 04/08/1997
  • Est. Priority Date: 03/29/1993
  • Status: Expired due to Fees
First Claim
Patent Images

1. A keyword extraction apparatus for extracting keywords from Japanese text data, comprising:

  • sentence segmentation means for segmenting the Japanese text data into sentence-by-sentence data;

    analytical information storage means for storing information regarding mutual continuation between morphemes;

    morpheme analysis means for dividing the sentence-by-sentence data segmented by the sentence segmentation means into morphemes and for analyzing the morphemes;

    morpheme information storage means for storing morpheme information on a morpheme-by-morpheme basis, the morpheme information including part of speech information, semantic classification information, sentence pattern information, and noted term information;

    morpheme information development means for developing morpheme information with respect to each morpheme analyzed by the morpheme analysis means, on a basis of the morpheme information stored in the morpheme information storage means;

    keyword candidate extraction means for extracting keyword candidates from the sentence-by-sentence data, on a basis of the morpheme information developed by the morpheme information development means;

    noted term information storage means for storing information regarding case classes of keyword candidates, among all of the keyword candidates, that immediately precede noted terms;

    case class conversion information storage means for storing relational information between case types and the case classes;

    case information acquisition means for acquiring case classes of the keyword candidates on a basis of the information stored in the noted term information storage means, and for acquiring case types corresponding to the acquired case classes on a basis of the relational information stored in the case class conversion information storage means;

    frequency information acquisition means for acquiring an appearance frequency of each keyword candidate by classifying each keyword candidate into the case types obtained from the case information acquisition means, and for acquiring a number of all morphemes in the Japanese text data, the number of all morphemes being indicative of a length of the Japanese text data;

    importance calculating means for calculating a frequency score on a basis of the appearance frequency of each keyword candidate and the number of all morphemes in the Japanese text data, for calculating a class-by-class appearance frequency of each keyword candidate in the Japanese text data, and for calculating an overall importance of each keyword candidate on a basis of the corresponding frequency score and the class-by-class appearance frequency; and

    keyword finalizing means for determining keywords from the keyword candidates, wherein the keywords have a corresponding overall importance obtained from the importance calculating means which exceeds a predetermined value.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×