×

Means for resolving ambiguities in text based upon character context

  • US 5,133,023 A
  • Filed: 05/19/1988
  • Issued: 07/21/1992
  • Est. Priority Date: 10/15/1985
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining a value related to the probability of occurrence, within a reference sequence of occurrences of elements, said elements comprising characters selected from the types of characters consisting of letters, punctuation, digits, blank, and other nonletters, of an input window of N element candidates, comprising the steps of:

  • for each of a plurality of sets of predefined groups of elements, wherein said groups comprise groups selected from letter groups and nonletter groups and no group contains both letters and nonletters, and wherein for each set of groups of elements an element is assigned to at most one group belonging to said set, associating each element candidate in the input window to one group within each set of groups thereby forming, for each set of groups, an associated window of N groups;

    for each set of groups, determining a value related to the probability of occurrence of the associated window of groups in said reference sequence, andcomputing said value related to the probability of occurrence of said input window by combining the values related to the probability of occurrence of each of said windows of groups to obtain an aggregate window value, whereinfor a first selected set of groups of characters;

    the letter groups of said set comprise a plurality of generic letter groups of characters, each of said plurality of generic letter groups containing two letters;

    a specific lowercase character and its associated uppercase character; and

    the nonletter groups of said set comprise;

    a generic letter independent group which contains all and only nonletters which are not substantially more contextually distinguishable from other characters based on the order of generic letter groups and nonletter characters in the reference sequence than on the order of uppercase and lowercase groups and nonletter characters in the reference sequence and which are not substantially useful in contextually distinguishing among generic letters based on the order of characters in said reference sequence, anda plurality of generic letter dependent groups the characters within a generic letter dependent group are contextually indistinguishable from each other based on the order of characters occurring in said reference sequence;

    for a second selected set of groups of characters;

    the letter groups of said set comprise a first group of characters having the property of being uppercase letters and a second group of characters having the property of being lowercase letters, andeach nonletter group is a group having the property that characters contained in said group are substantially not contextually distinguishable from each other based on the order of characters occurring in said reference sequence; and

    for a third selected set of groups of characters;

    said set comprises a letter group containing all letters; and

    the nonletter groups of said set comprise a generic letter independent group and a plurality of generic letter dependent groups.

View all claims
  • 11 Assignments
Timeline View
Assignment View
    ×
    ×