Apparatus and Method of Detecting Community-Specific Expression
First Claim
1. A device for searching for an expression specific to a predetermined community from a set of documents used in the predetermined community, the device comprising the following means (a) to (d):
- (a) means for extracting an n-gram collocation specifically used by the community;
(b) means for selecting a first word stem which is a possible core of a specific expression;
(c) means for selecting an extended word stem based on values calculated using a statistical significance of the first word stem, and a statistical significance of a second word stem which contains a previous or subsequent element of the first word stem; and
(d) means for selecting an expression specific to the predetermined community from the extended word stems according to a word formation rule of a certain language.
0 Assignments
0 Petitions
Accused Products
Abstract
Conventional publications concerning collections of community specific expressions include collections of technical terms including nouns and compound nouns in technical fields. However, application to new expressions other than nouns is difficult. Even in the field of collection of unknown words and new words, the objective is limited substantially to nouns, and no techniques of collecting new expressions systematically have been proposed. The invention solves the above problem by (a) means for extracting n-gram collocations specific in a predetermined community from a set of documents used in the community, (b) means for selecting a radical which might be a core of specific expressions, (c) means for expanding the selected radical toward the front and back, and (d) means for screening the expanded radicals according to the grammar.
-
Citations
11 Claims
-
1. A device for searching for an expression specific to a predetermined community from a set of documents used in the predetermined community, the device comprising the following means (a) to (d):
-
(a) means for extracting an n-gram collocation specifically used by the community; (b) means for selecting a first word stem which is a possible core of a specific expression; (c) means for selecting an extended word stem based on values calculated using a statistical significance of the first word stem, and a statistical significance of a second word stem which contains a previous or subsequent element of the first word stem; and (d) means for selecting an expression specific to the predetermined community from the extended word stems according to a word formation rule of a certain language. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for searching for an expression specific to a predetermined community from a set of documents used in the predetermined community, the method comprising the steps of:
-
(a) extracting an n-gram collocation specifically used by the community; (b) selecting a first word stem which is a possible core of a specific expression; (c) selecting an extended word stem based on values calculated using a statistical significance of the first word stem, and a statistical significance of a second word stem which contains a previous or subsequent element of the first word stem; and (d) selecting an expression specific to the predetermined community from the extended word stems according to a word formation rule of a certain language. - View Dependent Claims (7, 8)
-
-
9. A program for searching for an expression specific to a predetermined community from a set of documents used in the predetermined community, the program controlling a computer to operate the following means (a) to (d):
-
(a) means for extracting an n-gram collocation specifically used by the community; (b) means for selecting a first word stem which is a possible core of a specific expression; (c) means for selecting an extended word stem based on values calculated using a statistical significance of the first word stem, and a statistical significance of a second word stem which contains a previous or subsequent element of the first word stem; and (d) means for selecting an expression specific to the predetermined community from the extended word stems according to a word formation rule of a certain language. - View Dependent Claims (10, 11)
-
Specification