×

Method of text processing

  • US 7,409,334 B1
  • Filed: 07/22/2004
  • Issued: 08/05/2008
  • Est. Priority Date: 07/22/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method of text processing, comprising the steps of:

  • a) receiving at least one textual unit, where each textual unit includes at least one sub-textual unit;

    b) selecting a language;

    c) if a plurality of textual units are received, and said plurality of textual units do not include delimiters, then inserting delimiters between adjacent textual units;

    d) selecting one of said delimited textual units;

    e) if the selected language is selected from the group of languages consisting of Russian, Somali, user-definable language, then;

    i) getting a morphology of the selected textual unit from a corresponding look-up table;

    ii) outputting the corresponding input in the look-up table; and

    iii) if there are any unprocessed textual units from the textual units received in step (a), selecting one of said unprocessed textual units and returning to substep (i), otherwise stopping;

    f) setting a value n equal to the total number of sub-textual units in the selected textual unit and setting a value s equal to n;

    g) setting a test-suffix equal to the rightmost s sub-textual units in said selected textual unit and setting stem equal to n−

    s leftmost sub-textual units in said selected textual unit;

    h) comparing the test-suffix to an inflected suffix field for each entry within a rules database, where each entry in said rules database further includes a base-suffix field, a model number field, a part of speech field, and a morphological feature field;

    i) if no match is made in step (h) then setting s equal to s−

    1 and returning to step (g);

    j) identifying all model numbers from the model number field of the rules database that correspond to the inflected suffixes in the rules database that matched test-suffix in step (h);

    k) identifying all base suffixes from the base-suffix field of the rules database that correspond to the model numbers identified in step (j);

    l) combining stem with each base suffix identified in step (k) to create at least one test-lemma;

    m) comparing the at least one test-lemma to a lemma field for each entry in a lexicon database where each entry in said lexicon database further includes a model number field, a part of speech field, a morphological feature field, a definition field, and an exception field;

    n) if no match is found in step (m) then outputting a message to that effect, selecting the next unprocessed textual unit if any and returning to step (f), otherwise stopping;

    o) identifying a model number for each lemma that matches the test lemma;

    p) identifying each entry in the rules database that has a model number that matches one of the model numbers identified in step (o);

    q) combining stem with each inflected suffix field of each entry identified in step (p) to form inflected forms of the textual unit;

    r) outputting a user-definable subset of the result of step (q) and a user-definable subset of the corresponding entries in the rules database and the lexicon database; and

    s) if there are any unprocessed textual units selecting an unprocessed textual unit and returning to step (f).

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×