Language analyzer for morphemically and syntactically analyzing natural languages by using block analysis and composite morphemes

US 5,225,981 A
Filed: 06/16/1991
Issued: 07/06/1993
Est. Priority Date: 10/03/1986
Status: Expired due to Term

First Claim

Patent Images

1. A language analyzer for morphemically and syntactically analyzing natural languages used in an automatic translator, comprising:

memory means for containing therein morpheme data of a predetermined languages and additional data representing information to be referred to in order that a plurality of morphemes are combined with each other;

first analysis means for morphemically analyzing an inputted character array representing sentences of said language by referring to said morpheme data to break said inputted character array into morphemes and to define respective aspects of said morphemes, havingmeans for distinguishing a part of said inputted character array defining a composite morpheme or a block composed of a plurality of morphemes from the other parts of said inputted character array by referring to said additional data and said morpheme data;

means for treating each of said composite morpheme and said block as a single unit and such that they are regarded as a single morpheme, at the time of syntax analysis, for an inputted character array and for outputting identification as to said one unit;

exclusion means for creating an excluded block containing composite morpheme data or a block which was part of said input string, in response to output from said distinguishing means; and

second analysis means for syntactically analyzing said inputted character array by applying syntactic rules to an analysis result of said first analysis means to describe structures of said sentences, havingmeans for analyzing said part of character array defining the block in preference to analysis for other parts of character array being outside of said excluded block, andmeans for analyzing said inputted character array after analyzing said part of character array by regarding said excluded block as a single morpheme without applying said syntax rules to a relation between a morpheme outside of said excluded block and a morpheme in said excluded block to reduce the number of improper solutions of syntax analysis.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language analyzer includes a dictionary having stored therein dictionary data including morpheme data for words, compound words and phrases, and a parsing analyzer for conducting morphological analysis for an inputted sentence by referring to the dictionary. The dictionary contain data for the coupling degree indicating the coupling degree between each of words constituting the compound words or phrases and the parsing analyzer refers to the dictionary for the respective words contained in said inputted sentence and, when a plurality of dictionary data are retrieved for one word in combination with other words, selects the combination of words of a higher coupling degree by referring to the data for said coupling degree.

Citations

22 Claims

1. A language analyzer for morphemically and syntactically analyzing natural languages used in an automatic translator, comprising:
- memory means for containing therein morpheme data of a predetermined languages and additional data representing information to be referred to in order that a plurality of morphemes are combined with each other;
  
  first analysis means for morphemically analyzing an inputted character array representing sentences of said language by referring to said morpheme data to break said inputted character array into morphemes and to define respective aspects of said morphemes, havingmeans for distinguishing a part of said inputted character array defining a composite morpheme or a block composed of a plurality of morphemes from the other parts of said inputted character array by referring to said additional data and said morpheme data;
  
  means for treating each of said composite morpheme and said block as a single unit and such that they are regarded as a single morpheme, at the time of syntax analysis, for an inputted character array and for outputting identification as to said one unit;
  
  exclusion means for creating an excluded block containing composite morpheme data or a block which was part of said input string, in response to output from said distinguishing means; and
  
  second analysis means for syntactically analyzing said inputted character array by applying syntactic rules to an analysis result of said first analysis means to describe structures of said sentences, havingmeans for analyzing said part of character array defining the block in preference to analysis for other parts of character array being outside of said excluded block, andmeans for analyzing said inputted character array after analyzing said part of character array by regarding said excluded block as a single morpheme without applying said syntax rules to a relation between a morpheme outside of said excluded block and a morpheme in said excluded block to reduce the number of improper solutions of syntax analysis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22)
- - 2. A language analyzer according to claim 1, wherein said memory means contains therein distinguishing indications each of which indicates that a memory reference unit with one of said distinguishing indications is a memory reference unit representing a numerical value.
  - 3. A language analyzer according to claim 2, wherein said language analyze further comprises calculating means for calculating a numerical value, for two or more successive memory reference units with one of said distinguishing indications by referring to said memory means.
  - 4. A language analyzer according to claim 3, wherein when two successive memory reference units each having said distinguishing indication are adjoined by a memory reference unit representing a currency symbol or a dimensional unit, said first analysis means seizes a combination of said calculated numerical value and said memory reference unit representing a currency symbol or a dimensional unit as one composite morpheme.
  - 5. A language analyzer according to claim 1, wherein said first analysis means has semantic nature information providing means for providing a noun phrase which comprises a proper noun with semantic nature information, and means for distinguishing said noun phrase as one composite morpheme.
  - 6. A language analyzer according to claim 5, wherein said providing means provides a noun phrase which comprises a proper noun with the same semantic information as that of a noun adjacent to said proper noun.
  - 7. A language analyzer according to claim 6, wherein said first analysis means has a table for storing said predetermined syntactic patterns, and verifies an inputted character array, with said table in order to distinguish two or more successive memory reference units forming one of said predetermined syntactic patterns from other parts of said inputted character array.
  - 8. A language analyzer according to claim 1, when two or more successive memory reference units in an inputted character array forms one of predetermined syntactic patterns, said first analysis means distinguishes a part of character array corresponding to said successive memory reference units as one composite morpheme having a specific semantic information in accordance with said one of predetermined syntactic patterns.
  - 9. A language analyzer according to claim 1, wherein said first analysis means estimates a grammatical nature and a semantic nature of a derivative in an inputted character array which is not contained in said memory means by an affix of said derivative.
  - 10. A language analyzer according to claim 1, wherein said first analysis means includes means for judging whether the block exists in an inputted character array or not from morpheme contained within said inputted character array andmeans for estimating a syntactic attribute and a role of the block according to said morphological aspect when said judging means judges that said block exists in said inputted character array, and said second analysis means analyzes said block first in preference to portions of said inputted character array other than said block.
  - 11. A language analyzer according to claim 10, wherein said predetermined language is English, and said first analysis means distinguishes an appositional expression as a block in view of a morphological aspect of an inputted character array.
  - 12. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which begins with a conjunction which is preceded by a comma and ends with a next period as a block, estimates a syntactic attribute of said block at a clause and estimates a syntactic role of said block at a sentence or a clause.
  - 13. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which begins with a relative which is preceded by a comma and ends with another comma as a block, estimates a syntactic attribute of said block at a clause and estimates a syntactic role of said block at an adverb or an adjective.
  - 14. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which is put between a pair of quotation marks and ends with a period of a block, estimates a syntactic attribute of said block at a clause.
  - 15. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which begins with a proper noun which is followed by a comma and a noun after said comma and ends with a next period or a comma as a block for an appositional expression, estimates a syntactic attribute of said block at a noun phrase and estimates a syntactic role of said block at a noun and an appositional noun.
  - 16. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which begins with a proper noun which is followed by a comma and a succeeded article and ends with a next period or a comma as a block for an appositional expression, estimates a syntactic attribute of said block at a noun phrase and estimates a syntactic role of said block at a noun and an appositional noun.
  - 17. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which is a sequence of capital heading words and ends with a word which is followed by a next word whose head is not a capital as a block for an appositional expression, estimates a syntactic attribute or said block at a noun phrase and estimates a syntactic role of said block at a proper noun.
  - 18. A language analyzer according to claim 11, wherein said first analysis means distinguishes a portion of an inputted character array which begins with a word let'"'"'s and ends with a next comma and a portion of an inputted character array which begins with words let us which is preceded by a comma and ends with a next period of a comma as blocks, estimates syntactic attributes of said blocks at imperatives and estimates syntactic roles of said blocks at invitations.
  - 19. A language analyzer according to claim 18, wherein said second analysis means parses and inputted character array except a portion thereof distinguished as a block whose syntactic attribute and role are estimated at an imperative and an invitation respectively by said first parsing means.
  - 20. A language analyzer according to claim 10, wherein said first parsing means distinguishes a portion of an inputted character array comprising two or more words coupled by a hyphen as a block at an adjective phrase.
  - 22. A language analyzer according to claim 10, wherein said predetermined language is English, and said first analysis means distinguishes a portion of an inputted character array which begins with an auxiliary verb or a be verb preceded by a comma and followed by a pronoun and ends with an interrogation mark preceded by said pronoun or begins with a negative form of auxiliary verb or a be verb preceded by a comma and followed by a pronoun and ends with an interrogation mark preceded by said pronoun as a block in view of a morphological aspect of an inputted character array, estimates a syntactic attribute of said block at an affirmative or a negative sentence and estimates a syntactic role of said block at a tag question.

21. A language analyzer according to claim 86, wherein said second analysis means analyzes an inputted character array, except a portion thereof distinguished as a block whose syntactic attribute and role are estimated at an affirmative or a negative sentence and a tag question respectively by said first analysis means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ricoh Company Limited
Original Assignee
Ricoh Company Limited
Inventors
Yokogawa, Toshihiko
Primary Examiner(s)
Hayes, Gail O.

Application Number

US07/714,990
Time in Patent Office

751 Days
Field of Search

364/419
US Class Current

704/2
CPC Class Codes

G06F 16/30   of unstructured textual dat...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/268   Morphological analysis

Language analyzer for morphemically and syntactically analyzing natural languages by using block analysis and composite morphemes

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Language analyzer for morphemically and syntactically analyzing natural languages by using block analysis and composite morphemes

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links