×

Method and apparatus for efficient morphological text analysis using a high-level language for compact specification of inflectional paradigms

  • US 5,475,587 A
  • Filed: 07/12/1991
  • Issued: 12/12/1995
  • Est. Priority Date: 06/28/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. For use in computer-based morphological text analysis of natural languages, a computer implemented method for creating a data structure for computer-based generation and recognition of word forms in a natural language, the computer implemented method comprising the steps of:

  • a. providing a morphological description of a natural language, the description comprising statements in a morphological description language, the morphological description language comprising statements arranged according to a pre-determined syntax, the syntax permitting the specification of inflectional morphologic paradigms, the morphologic paradigms comprising form rules including surface form rules and intermediate form rules, the form rules comprising a left-hand-side identifier and a right-hand-side specifying a word stem and, optionally, the concatenation or removal of an affix, including a prefix or a suffix, the stems comprising the identifiers of other form rules or form sets, or a keyword, said keyword being either a keyword LEX or a keyword NIL, the affixes comprising strings of characters or the identifier or an affix variable, the syntax capable of specifying that the form rules of one morphological paradigm are inherited by another morphological paradigm, the syntax permitting the stem in a form rule to be an indicator to a string in a lexicon, the syntax permitting the stem in a form rule to be an indicator that the form rule is not used in the given paradigm via the keyword NIL, the syntax permitting a form set identifier to represent a plurality of left-hand-side form rule identifiers and the form set identifier to be used as the stem in the right-hand-side of a form rule, the syntax permitting an affix variable to identify a set of affix strings with the affix variable being used as an affix in a right-hand-side of a form rule, said morphological description stored in a memory device;

    b. disambiguating the stem components of the right-hand-sides of the form rules in each paradigm, the disambiguation process comprising the steps of;

    i. determining in each form rule whether the stem component is an identifier of another form rule;

    ii. replacing each stem component that is an identifier with a link to the identified form rule;

    iii. determining in each form rule whether the stem component is an identifier in a form set;

    c. determining for each paradigm whether there is a declaration stating that the paradigm inherits the form rules of another parent paradigm;

    d. creating form rules for the paradigms that will inherit the form rule from a parent paradigm by sharing references to the form rules of the parent paradigm;

    e. replacing, for each form rule that contains a right-hand-side reference to a form set, the form rule with a set of form rules, one for each form in the corresponding form set, each created form rule corresponding to the form set rule containing the right-hand-side reference to the form set;

    f. checking each surface form for cycles, the cycle check process comprising the steps of;

    i. creating a cycle check list initialized to empty;

    ii. locating a surface form rule;

    iii. checking stem components on the right-hand-side to determine if the stem is an identifier to another form rule;

    iv. comparing the stem that is an identifier of another form rule to the entries on the cycle check list;

    v. adding the stem that is an identifier to the cycle check list unless the identifier is included in the cycle check list;

    vi. checking the form rule referenced by the identifier for cycles;

    g. providing a set of orthographic rules; and

    h. conflating the set of orthographic rules, the process of conflation comprising the steps of;

    i. finding the set of form rules that match one of the orthographic rules in terms of an operator, an affix and an affix type;

    ii. creating an inner form rule variant, the form rule variant comprising the stem form rule from the right-hand-side of the matching form rule as the right-hand-side stem and as the affix, an affix sequence comprising character strings and string variables, indicating the correct context determined by the orthographic rule, and as the operator a minus; and

    iii. creating an outer form rule variant, the outer form rule variant comprising the newly created outer form rule as the right-hand-side stem and as the affix, an affix sequence comprising character strings and string variables, indicating the correct spelling as determined by the orthographic rule and as the operator a plus.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×