EXTRACTING TERMS FROM DOCUMENT DATA INCLUDING TEXT SEGMENT
First Claim
1. A computer-implemented system including a memory and a processor communicatively coupled to the memory for extracting terms from electronic document data that includes a text segment, the computer system comprising:
- a first extraction unit that uses a first text processing information to extract a noun word from the document data;
a second extraction unit that uses a second text processing information to extract a term candidate in relation to the extracted noun word from the document data or from a corpus that includes text data described in the same language used in the document data;
a weight assignment unit that, in order to determine which one of a plurality of noun word types the extracted noun word and the extracted term candidate each belong to, uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each of the extracted noun word and the extracted term candidate;
a determination unit that determines the type to which the extracted noun word and the extracted term candidate each belong, based on the assigned weight; and
an output unit which follows the determination to output the extracted noun word and the extracted term candidate each in association with the determined type.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer system, method, and article of manufacture for extracting a term from electronic document data that includes a text segment. The system includes: a first extraction unit that uses a first text processing information to extract a noun word from the document data; a second extraction unit that uses a second text processing information to extract a term candidate in relation to the noun word or a corpus that includes text data described in the same language used in the document data; a weight assignment unit that uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each noun word and term candidate; a determination unit that determines the type to which the noun word and term candidate belong; and an output unit to output the noun word and term candidate.
5 Citations
20 Claims
-
1. A computer-implemented system including a memory and a processor communicatively coupled to the memory for extracting terms from electronic document data that includes a text segment, the computer system comprising:
-
a first extraction unit that uses a first text processing information to extract a noun word from the document data; a second extraction unit that uses a second text processing information to extract a term candidate in relation to the extracted noun word from the document data or from a corpus that includes text data described in the same language used in the document data; a weight assignment unit that, in order to determine which one of a plurality of noun word types the extracted noun word and the extracted term candidate each belong to, uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each of the extracted noun word and the extracted term candidate; a determination unit that determines the type to which the extracted noun word and the extracted term candidate each belong, based on the assigned weight; and an output unit which follows the determination to output the extracted noun word and the extracted term candidate each in association with the determined type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented method for extracting terms from electronic document data that includes a text segment, the method comprising the steps of:
-
using a first text processing information to extract a noun word from the document data and storing the extracted noun word in a storage unit; using a second text processing information to extract a term candidate in relation to the extracted noun word from the document data or from a corpus that includes text data described in the same language used in the document data and storing the extracted term candidate in the storage unit; in order to determine which noun word type out of a plurality of types the extracted noun word and the extracted term candidate each belong to, using a third text processing information to select which type to assign a weight from the plurality of types, assigning the weight to the selected type for each of the extracted noun word and the extracted term candidate, and storing the assigned weight in the storage unit; determining the type to which the extracted noun word and the extracted term candidate each belong, based on the assigned weight; and following the determination to output the extracted noun word and the extracted term candidate each in association with the determined type onto a display device.
-
Specification