×

Extracting terms from document data including text segment

  • US 9,043,339 B2
  • Filed: 05/21/2013
  • Issued: 05/26/2015
  • Est. Priority Date: 10/02/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented system including a memory and a processor communicatively coupled to the memory for extracting terms from electronic document data that includes a text segment, the computer system comprising:

  • a first extraction unit that uses a first text processing information to extract a noun word from the document data;

    a second extraction unit that uses a second text processing information to extract a term candidate in relation to the extracted noun word from the document data or from a corpus that includes text data described in the same language used in the document data;

    a weight assignment unit that, in order to determine which one of a plurality of noun word types the extracted noun word and the extracted term candidate each belong to, uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each of the extracted noun word and the extracted term candidate;

    a determination unit that determines the type to which the extracted noun word and the extracted term candidate each belong, based on the assigned weight; and

    an output unit which follows the determination to output the extracted noun word and the extracted term candidate each in association with the determined type,wherein the weight assignment unit uses the third text processing information to select which type to assign the weight from the plurality of types and assigns the weight to the selected type for each of the extracted noun word and the extracted term candidate by;

    obtaining a number of times a genitive case word modifies the extracted noun word and a number of times a genitive case word modifies the extracted term candidate, in the document data or in the corpus including the text data described in the same language used in the document data; and

    selecting the type to be assigned a weight according to whether or not the obtained number of times is in a predetermined threshold value range.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×