×

Cross lingual text classification apparatus and method

  • US 7,467,079 B2
  • Filed: 02/24/2004
  • Issued: 12/16/2008
  • Est. Priority Date: 09/29/2003
  • Status: Expired due to Fees
First Claim
Patent Images

1. A text classification, comprising:

  • a text input device for receiving an entered text;

    a storage device for storing a concept thesaurus file for use in classifying an entered text to be classified, a cross lingual word sense-based knowledge file corresponding to a plurality of languages including a first and a second language, and a word-based classification knowledge file;

    a processing unit for executing a classification of the entered text to be classified to assign a category to the entered text; and

    an output device for outputting the classification result, wherein;

    said text input device receives first entered text to be classified in the first language,said processing unit is configured to;

    extract a word from said first entered text to be classified;

    convert the extracted a word into a word sense using said concept thesaurus file;

    compare the word sense resulting from the conversion with information on each category included in said cross lingual word sense-based classification knowledge file to calculate a first score for each category;

    compare the extracted word with word classification information included in said word-based classification knowledge file to calculate a second score for each category; and

    integrate said first and second scores for each category to determine a category for the first text to be classified in the first language for assigning a category to the first entered text, andsaid word-based classification knowledge file is generated by learning a word-based classification knowledge using words included in a labeled text in the first language,wherein;

    said text to be classified which has been assigned the category by said text classification apparatus is used for learning the word-based classification knowledge as a labeled text in the first language used in the generation of said word-based classification knowledge file.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×