Keyword extracting device
First Claim
1. A computer implemented keyword extracting device comprising:
- a text data input part for inputting a text,a pattern storage part storing at least a pattern to generate keyword candidates, said pattern is represented by character strings in regular expression or its equivalent,an extracting part for extracting character strings from the text through pattern matching using said pattern stored in said pattern storage part,a keyword candidate generating part generating keyword candidates, the keyword candidates including at least portions of the character strings being extracted at the extracting part, andan output part outputting the keyword candidates as keywords,wherein the pattern to generate keyword candidates is an unnecessary pattern representing unnecessary head or end character strings,the keyword candidate generating part includes an unnecessary character string removing part, andsaid unnecessary character string removing part extracts keyword candidates including character strings matched with the unnecessary pattern from the keyword candidates so that (i) if the extracted keyword candidates are same as character strings matched with the unnecessary pattern, removes the extracted keyword candidates, and (ii) if the extracted keyword candidates are not same as character strings matched with the unnecessary pattern, takes the keyword candidates, in which said matched character string is removed, as a keyword candidates.
1 Assignment
0 Petitions
Accused Products
Abstract
A keyword extracting device which extracts keywords collectively and efficiently while improving descriptive property and reusability of the information for keyword extracting. A text data input inputs a text. A pattern processor carries out matching and replacement of a character string based on a pattern in regular expression or its equivalent. A pattern storage stores at least a keyword component pattern representing a character string capable of being a component of a keyword. A keyword component extractor extracts, as keyword components, all character strings which are matched with a keyword component pattern and are not overlapped with each other by using the pattern processor for a text. A keyword candidate set generator generates a keyword candidate set from each keyword. And, a keyword output outputs each keyword candidate of a keyword candidate set as a keyword.
159 Citations
17 Claims
-
1. A computer implemented keyword extracting device comprising:
-
a text data input part for inputting a text, a pattern storage part storing at least a pattern to generate keyword candidates, said pattern is represented by character strings in regular expression or its equivalent, an extracting part for extracting character strings from the text through pattern matching using said pattern stored in said pattern storage part, a keyword candidate generating part generating keyword candidates, the keyword candidates including at least portions of the character strings being extracted at the extracting part, and an output part outputting the keyword candidates as keywords, wherein the pattern to generate keyword candidates is an unnecessary pattern representing unnecessary head or end character strings, the keyword candidate generating part includes an unnecessary character string removing part, and said unnecessary character string removing part extracts keyword candidates including character strings matched with the unnecessary pattern from the keyword candidates so that (i) if the extracted keyword candidates are same as character strings matched with the unnecessary pattern, removes the extracted keyword candidates, and (ii) if the extracted keyword candidates are not same as character strings matched with the unnecessary pattern, takes the keyword candidates, in which said matched character string is removed, as a keyword candidates. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer implemented keyword extracting device comprising:
-
a text data input part for inputting a text. a pattern storage part storing at least a pattern to generate keyword candidates, said pattern is represented by character strings in regular expression or its equivalent, an extracting part for extracting character strings from the text through pattern matching using said pattern stored in said pattern storage part, a keyword candidate generating part generating keyword candidates, the keyword candidates including at least portions of the character strings being extracted at the extracting part, and a part-of-speech analyzing part for dividing the text into words and analyzing a part-of-speech of each divided word, wherein the extracting part extracts character strings corresponding to a prescribed sequence of part-of-speeches from the text, based on a result of the part-of-speech analysis at the part-of-speech analyzing part, the pattern to generate keyword candidates is an extra keyword component pattern, which represents keyword components hardly generated as character strings corresponding to the prescribed sequence of part-of-speeches, the extra keyword component extracting part extracts character strings matched with the extra keyword component pattern through pattern matching and replaces the extracted character strings in the text with special character strings, the extracting part extracts character strings corresponding to the prescribed sequence of part-of-speeches from the text being replaced at the extra keyword component extracting part, based on a result of the part-of-speech analysis at the part-of-speech analyzing part, the keyword candidate generating part takes the character strings being extracted at the extra keyword component extracting part as the keyword candidates as well as the character strings extracted at the extracting part strings extracts at the extracting part, and an outputting part outputting the keyword candidates as keywords.
-
Specification