Morphological analyzer, natural language processor, morphological analysis method and program
First Claim
1. A method of performing morphological analysis on a natural language text string using a computer having a memory, the method comprising:
- selecting, using a processor, whether or not to decompose a decomposable complex word in response to a request from a natural language processing application that utilizes a morphological analysis result;
receiving the natural language text string to be processed, wherein the text string is in an agglutinative language and comprises more than one complex word, wherein each complex word comprises a linguistic unit having a semantic meaning;
decomposing the received text string into tokens and storing the tokens in a work area of the memory;
when it is selected not to decompose a decomposable complex word, determining whether each token is decomposable;
if a token is not decomposable, registering the non-decomposable token on a token list stored in a given area of the memory;
generating token strings based on the token list and storing the token strings in the work area of the memory;
selecting optimum token strings from the generated token strings; and
outputting the selected optimum token strings to the natural language processing application for further processing.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention can include a token list generating unit 11 for decomposing a natural language text to be processed into tokens that are components of the natural language text and registering them on a token list, and a token string selecting unit 13 for selecting optimum token strings for composing the natural language text to be processed on the basis of the token list generated by the token list generating unit 11. The token list generating unit 11 registers, on the token list, tokens among the tokens obtained by decomposing the natural language text to be processed except tokens decomposable into smaller tokens according to conditions imposed on the morphological analysis.
-
Citations
8 Claims
-
1. A method of performing morphological analysis on a natural language text string using a computer having a memory, the method comprising:
-
selecting, using a processor, whether or not to decompose a decomposable complex word in response to a request from a natural language processing application that utilizes a morphological analysis result; receiving the natural language text string to be processed, wherein the text string is in an agglutinative language and comprises more than one complex word, wherein each complex word comprises a linguistic unit having a semantic meaning; decomposing the received text string into tokens and storing the tokens in a work area of the memory; when it is selected not to decompose a decomposable complex word, determining whether each token is decomposable; if a token is not decomposable, registering the non-decomposable token on a token list stored in a given area of the memory; generating token strings based on the token list and storing the token strings in the work area of the memory; selecting optimum token strings from the generated token strings; and outputting the selected optimum token strings to the natural language processing application for further processing. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system for performing morphological analysis on a natural language text string, the computer system comprising:
-
a memory; and a processor configured to; select whether or not to decompose a decomposable complex word in response to a request from a natural language processing application that utilizes a morphological analysis result; receive the natural language text string to be processed, wherein the text string is in an agglutinative language and comprises more than one complex word, wherein each complex word comprises a linguistic unit having a semantic meaning; decompose the received text string into tokens and storing the tokens in a work area of the memory; when it is selected not to decompose a decomposable complex word, determine whether each token is decomposable; if a token is not decomposable, register the non-decomposable token on a token list stored in a given area of the memory; generate token strings based on the token list and storing the token strings in the work area of the memory; select optimum token strings from the generated token strings; and output the selected optimum token strings to the natural language processing application for further processing.
-
-
8. A computer-readable medium, having stored thereon a computer program having a plurality of code sections executable by a computer for causing the computer to perform the steps of:
-
selecting whether or not to decompose a decomposable complex word in response to a request from a natural language processing application that utilizes a morphological analysis result; receiving the natural language text string to be processed, wherein the text string is in an agglutinative language and comprises more than one complex word, wherein each complex word comprises a linguistic unit having a semantic meaning; decomposing the received text string into tokens and storing the tokens in a work area of a memory of the computer; when it is selected not to decompose a decomposable complex word, determining whether each token is decomposable; if a token is not decomposable, registering the non-decomposable token on a token list stored in a given area of the memory; generating token strings based on the token list and storing the token strings in the work area of the memory; selecting optimum token strings from the generated token strings; and outputting the selected optimum token strings to the natural language processing application for further processing.
-
Specification