×

CREATING A TERMS DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA

  • US 20100174528A1
  • Filed: 01/04/2010
  • Published: 07/08/2010
  • Est. Priority Date: 01/05/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method of creating a terms dictionary with named entities or terminologies included in text data, comprising:

  • acquiring token sequence data by performing morphological analysis for the text data;

    distinguishing tokens of the token sequence data by using a category dictionary to extract uncategorized words;

    comparing each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word, wherein the uncategorized-word comparison rule includes a token composed of a first character string and a first regular expression for use in extracting the matching uncategorized word;

    comparing a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words, wherein the token-sequence comparison rule includes a token sequence including a second character string and a second regular expression for use in extracting the matching token sequence; and

    permitting a user to select whether to register the registration candidate words in the category dictionary.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×