Method for segmenting a text into words
First Claim
1. A method of segmenting a text into words for which a dictionary search is carried out using a character string in the text as a search key, and in which it is determined whether a word retrieved from a word dictionary can be grammatically connected to a preceding word, said method comprising the steps of:
- segmenting a character string of the text into words using only words registered in the word dictionary;
identifying a word in the text undergoing segmentation processing as a possible unknown word, when said segmentation processing reaches a deadlock;
verifying that the possible unknown word is the unknown word based on character kind information in the text; and
continuing said segmentation processing for a portion of said text which follows the identified unknown word.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of segmenting a text into words in which a dictionary search is made while using a character string in the text as a search key, and it is checked whether a word retrieved from the dictionary can be grammatically connected to another word adjacent thereto or not. Segmentation processing is carried out using only words registered in a word dictionary, processing for identifying an unknown word is carried out when the segmentation processing comes to a deadlock, and then the segmentation processing is continued for that portion of the text which follows the identified unknown word.
194 Citations
8 Claims
-
1. A method of segmenting a text into words for which a dictionary search is carried out using a character string in the text as a search key, and in which it is determined whether a word retrieved from a word dictionary can be grammatically connected to a preceding word, said method comprising the steps of:
-
segmenting a character string of the text into words using only words registered in the word dictionary; identifying a word in the text undergoing segmentation processing as a possible unknown word, when said segmentation processing reaches a deadlock; verifying that the possible unknown word is the unknown word based on character kind information in the text; and continuing said segmentation processing for a portion of said text which follows the identified unknown word. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of segmenting a text into words comprising:
-
a first step of carrying out a dictionary search using a character string in said text as a search key; a second step of checking whether a word of said text retrieved from a dictionary can be grammatically connected to a preceding word of said text; a third step of preserving a result obtained during the segmenting processing when a backtracking processing is carried out for a deadlock encountered in the processing in at least one of said first and second steps; a fourth step of determining the presence of an unknown word in the text, based on character kind information of the text; and a fifth step of resuming the segmenting processing utilizing the preserved result when the presence of the unknown word is determined. - View Dependent Claims (8)
-
Specification