Dictionary creation device and dictionary creation method
First Claim
1. A dictionary creation device that creates a dictionary which is used for searching, classifying, or filtering information written as text and in which keywords are registered per category, the dictionary creation device comprising:
- a classification information acquisition unit that acquires classification information regarding categories and text information from at least a first information source and a second information source which differ from an information source for information written as text and searched;
a keyword extraction unit that extracts a keyword from the acquired text information;
a dictionary registration and deletion unit that registers or deletes the extracted keyword in dictionaries corresponding to the first information source and the second information source, in accordance with a category of the first information source and a category of the second information source, respectively, based upon the classification information acquired by said classification information acquisition unit and the keyword extracted by said keyword extraction unit;
a keyword database that stores the extracted keyword, said keyword database being a non-transitory computer-readable storage medium; and
a dictionary combining and editing unit that edits the category of the first information source in the dictionary corresponding to the first information source and the category of the second information source in the dictionary corresponding to the second information source to create, as a category level structure of a combined dictionary, a new category level structure including the category of the first information source and the category of the second information source, based on a degree of overlap between characteristic keywords that are keywords characterizing classification information regarding the category of the first information source and characteristic keywords that are keywords characterizing classification information regarding the category of the second information source,wherein said dictionary combining and editing unit (i) compares a first set, which is a set of characteristic keywords in a first category included in the first information source, with a second set, which is a set of characteristic keywords in a second category included in the second information source, and (ii) edits and combines the dictionaries corresponding to the first information source and the second information source such that the second category is placed in a lower level subordinate to the first category as an intersecting set of the first set and the second set is less common to the first set and more common to the second set.
0 Assignments
0 Petitions
Accused Products
Abstract
A dictionary creation device and dictionary creation method which optimally create and update a dictionary for classifying, searching, or extracting text information in accordance with a changes in content of text information groups. The dictionary creation device includes a keyword extraction unit that extracts a keyword from inputted text information; a keyword statistics unit that finds statistics regarding an appearance of the keyword; a keyword assessment value calculation unit that calculates an assessment value of the extracted keyword based on the statistics regarding the appearance of the keyword; a determination unit that determines whether or not to register or delete the keyword based on the calculated assessment value; a dictionary registration and deletion unit which registers or deletes the keyword in or from a dictionary database based on a result of the determination performed by the determination unit; and the dictionary database.
14 Citations
19 Claims
-
1. A dictionary creation device that creates a dictionary which is used for searching, classifying, or filtering information written as text and in which keywords are registered per category, the dictionary creation device comprising:
-
a classification information acquisition unit that acquires classification information regarding categories and text information from at least a first information source and a second information source which differ from an information source for information written as text and searched; a keyword extraction unit that extracts a keyword from the acquired text information; a dictionary registration and deletion unit that registers or deletes the extracted keyword in dictionaries corresponding to the first information source and the second information source, in accordance with a category of the first information source and a category of the second information source, respectively, based upon the classification information acquired by said classification information acquisition unit and the keyword extracted by said keyword extraction unit; a keyword database that stores the extracted keyword, said keyword database being a non-transitory computer-readable storage medium; and a dictionary combining and editing unit that edits the category of the first information source in the dictionary corresponding to the first information source and the category of the second information source in the dictionary corresponding to the second information source to create, as a category level structure of a combined dictionary, a new category level structure including the category of the first information source and the category of the second information source, based on a degree of overlap between characteristic keywords that are keywords characterizing classification information regarding the category of the first information source and characteristic keywords that are keywords characterizing classification information regarding the category of the second information source, wherein said dictionary combining and editing unit (i) compares a first set, which is a set of characteristic keywords in a first category included in the first information source, with a second set, which is a set of characteristic keywords in a second category included in the second information source, and (ii) edits and combines the dictionaries corresponding to the first information source and the second information source such that the second category is placed in a lower level subordinate to the first category as an intersecting set of the first set and the second set is less common to the first set and more common to the second set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
8. A dictionary creation method for creating a dictionary which is used for searching, classifying, or filtering information written as text and in which keywords are registered per category, the dictionary creation method comprising:
-
acquiring classification information regarding categories and text information from at least a first information source and a second information source which differ from an information source for information written as text and searched; extracting a keyword from the acquired text information; registering or deleting the extracted keyword in dictionaries corresponding to the first information source and the second information source, in accordance with a category of the first information source and a category of the second information source, respectively, based upon the classification information acquired and the keyword extracted; and editing the category of the first information source in the dictionary corresponding to the first information source and the category of the second information source in the dictionary corresponding to the second information source to create, as a category level structure of a combined dictionary, a new category level structure including the category of the first information source and the category of the second information source, based on a degree of overlap between characteristic keywords that are keywords characterizing classification information regarding the category of the first information source and characteristic keywords that are keywords characterizing classification information regarding the category of the second information source, wherein said editing (i) compares a first set, which is a set of characteristic keywords in a first category included in the first information source, with a second set, which is a set of characteristic keywords in a second category included in the second information source, and (ii) edits and combines the dictionaries corresponding to the first information source and the second information source such that the second category is placed in a lower level subordinate to the first category as an intersecting set of the first set and the second set is less common to the first set and more common to the second set.
-
-
9. A program recorded on a non-transitory computer-readable storage medium for creating a dictionary which is used for searching, classifying, or filtering information written as text and in which keywords are registered per category, the program causing a dictionary creation device to perform steps comprising:
-
acquiring classification information regarding categories and text information from at least a first information source and a second information source which differ from an information source for information written as text and searched; extracting a keyword from the acquired text information; registering or deleting the extracted keyword in dictionaries corresponding to the first information source and the second information source, in accordance with a category of the first information source and a category of the second information source, respectively, based upon the classification information acquired and the keyword extracted; and editing the category of the first information source in the dictionary corresponding to the first information source and the category of the second information source in the dictionary corresponding to the second information source to create, as a category level structure of a combined dictionary, a new category level structure including the category of the first information source and the category of the second information source, based on a degree of overlap between characteristic keywords that are keywords characterizing classification information regarding the category of the first information source and characteristic keywords that are keywords characterizing classification information regarding the category of the second information source, wherein said editing (i) compares a first set, which is a set of characteristic keywords in a first category included in the first information source, with a second set, which is a set of characteristic keywords in a second category included in the second information source, and (ii) edits and combines the dictionaries corresponding to the first information source and the second information source such that the second category is placed in a lower level subordinate to the first category as an intersecting set of the first set and the second set is less common to the first set and more common to the second set.
-
-
18. A dictionary creation method for creating a dictionary which is used for searching, classifying, or filtering information written as text and in which keywords are registered per category, the dictionary creation method comprising:
-
acquiring classification information regarding categories and text information from at least a first information source and a second information source which differ from an information source for information written as text and searched extracting a keyword from the acquired text information; registering or deleting the extracted keyword in dictionaries corresponding to the first information source and the second information source, in accordance with a category of the first information source and a category of the second information source, respectively, based upon the classification information acquired and the keyword extracted; storing, in a keyword database that is a non-transitory computer-readable storage medium, the extracted keyword; and editing the category of the first information source in the dictionary corresponding to the first information source and the category of the second information source in the dictionary corresponding to the second information source to create, as a category level structure of a combined dictionary, a new category level structure including the category of the first information source and the category of the second information source, based on a degree of overlap between characteristic keywords that are keywords characterizing classification information regarding the category of the first information source and characteristic keywords that are keywords characterizing classification information regarding the category of the second information source, wherein said editing (i) compares a first set, which is a set of characteristic keywords in a first category included in the first information source, with a second set, which is a set of characteristic keywords in a second category included in the second information source, and (ii) edits and combines the dictionaries corresponding to the first information source and the second information source such that the second category is placed in a lower level subordinate to the first category as an intersecting set of the first set and the second set is less common to the first set and more common to the second set.
-
-
19. A program recorded on a non-transitory computer-readable storage medium for creating a dictionary which is used for searching, classifying, or filtering information written as text and in which keywords are registered per category, the program causing a dictionary creation device to perform steps comprising:
-
acquiring classification information regarding categories and text information from at least a first information source and a second information source which differ from an information source for information written as text and searched; extracting a keyword from the acquired text information; registering or deleting the extracted keyword in dictionaries corresponding to the first information source and the second information source, in accordance with a category of the first information source and a category of the second information source, respectively, based upon the classification information acquired and the keyword extracted; storing the extracted keyword in a keyword database; and editing the category of the first information source in the dictionary corresponding to the first information source and the category of the second information source in the dictionary corresponding to the second information source to create, as a category level structure of a combined dictionary, a new category level structure including the category of the first information source and the category of the second information source, based on a degree of overlap between characteristic keywords that are keywords characterizing classification information regarding the category of the first information source and characteristic keywords that are keywords characterizing classification information regarding the category of the second information source wherein said editing (i) compares a first set, which is a set of characteristic keywords in a first category included in the first information source, with a second set, which is a set of characteristic keywords in a second category included in the second information source, and (ii) edits and combines the dictionaries corresponding to the first information source and the second information source such that the second category is placed in a lower level subordinate to the first category as an intersecting set of the first set and the second set is less common to the first set and more common to the second set.
-
Specification