DICTIONARY CREATION DEVICE
First Claim
1. A dictionary creation device comprising:
- an input/output process recording means for recording information indicating an input/output process for input words and output words output relating to said input words, in a dictionary growth process for gathering words in each category by repeatedly receiving input of words in each category, outputting from document data words related to the input words that were input, adding the output words to the input words until prescribed conditions are reached and outputting from document data words related to the input words;
a gathered-by-category word memory means for storing words gathered by the dictionary growth process by category;
a boundary word identification means for identifying boundary words belonging to multiple categories out of the words gathered by the dictionary growth process;
a category membership degree calculation means for calculating a category membership degree indicating the extent to which a boundary word belongs to the categories for each category to which the boundary word belongs, so that the category membership degree may become high, when the boundary word turns into an input word of the category, or when the boundary word turns into an output word of the category, on the basis of the information recorded in the input/output process recording means; and
a category update means for determining categories to which the boundary words belong on the basis of category membership degrees calculated by the category membership degree calculation means, and updating information stored in the gathered-by-category word memory means so as to reflect the determination results.
1 Assignment
0 Petitions
Accused Products
Abstract
A boundary word identification unit (103) identifies a boundary word belonging to a plurality of categories among words gathered in dictionary growth processing. Then, a category membership degree calculation unit (104) calculates, for each category to which the boundary word belongs, a category membership degree indicating a degree to which the boundary word belongs to the category on the basis of information recorded in a gathering process memory unit (108). Next, a category update unit (105) determines the category to which the boundary word belongs on the basis of the category membership degree calculated by the category membership degree calculation unit (104) and updates information stored in a gathered-by-category word memory unit (109) so that the determination result is reflected.
-
Citations
13 Claims
-
1. A dictionary creation device comprising:
-
an input/output process recording means for recording information indicating an input/output process for input words and output words output relating to said input words, in a dictionary growth process for gathering words in each category by repeatedly receiving input of words in each category, outputting from document data words related to the input words that were input, adding the output words to the input words until prescribed conditions are reached and outputting from document data words related to the input words; a gathered-by-category word memory means for storing words gathered by the dictionary growth process by category; a boundary word identification means for identifying boundary words belonging to multiple categories out of the words gathered by the dictionary growth process; a category membership degree calculation means for calculating a category membership degree indicating the extent to which a boundary word belongs to the categories for each category to which the boundary word belongs, so that the category membership degree may become high, when the boundary word turns into an input word of the category, or when the boundary word turns into an output word of the category, on the basis of the information recorded in the input/output process recording means; and a category update means for determining categories to which the boundary words belong on the basis of category membership degrees calculated by the category membership degree calculation means, and updating information stored in the gathered-by-category word memory means so as to reflect the determination results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A word gathering method comprising:
-
an input/output process recording step for recording information indicating an input/output process for input words and output words output by said input words, in a dictionary growth process for gathering words in each category by repeatedly receiving input of words in each category, outputting from document data words related to the input words that were input, adding the output words to the input words until prescribed conditions are reached and outputting from document data words related to the input words; a gathered-by-category word memory step for storing words gathered by the dictionary growth process by category; a boundary word identification step for identifying boundary words belonging to multiple categories out of the words gathered by the dictionary growth process; a category membership degree calculation step for calculating a category membership degree indicating the extent to which the boundary words belong to the categories for each category to which the boundary words belong, so that the category membership degree may become high, when the boundary word turns into an input word of the category, or when the boundary word turns into an output word of the category, on the basis of the information recorded in the input/output process recording step; and a category update step for determining categories to which the boundary words belong on the basis of category membership degrees calculated by the category membership degree calculation step, and updating information stored in the gathered-by-category word memory step so as to reflect the determination results.
-
-
13. A computer-readable recording medium on which is recorded a program that causes a computer to function as:
-
an input/output process recording means for recording information indicating an input/output process for input words and output words output by said input words, in a dictionary growth process for gathering words in each category by repeatedly receiving input of words in each category, outputting from document data words related to the input words that were input, adding the output words to the input words until prescribed conditions are reached and outputting from document data words related to the input words; a gathered-by-category word memory means for storing words gathered by the dictionary growth process by category; a boundary word identification means for identifying boundary words belonging to multiple categories out of the words gathered by the dictionary growth process; a category membership degree calculation means for calculating a category membership degree indicating the extent to which the boundary words belong to the categories for each category to which the boundary words belong, so that the category membership degree may become high, when the boundary word turns into an input word of the category, or when the boundary word turns into an output word of the category, on the basis of the information recorded in the input/output process recording means; and a category update means for determining categories to which the boundary words belong on the basis of category membership degrees calculated by the category membership degree calculation means, and updating information stored in the gathered-by-category word memory means so as to reflect the determination results.
-
Specification