Means for resolving ambiguities in text based upon character context

US 5,133,023 A
Filed: 05/19/1988
Issued: 07/21/1992
Est. Priority Date: 10/15/1985
Status: Expired due to Term

First Claim

Patent Images

1. A method for determining a value related to the probability of occurrence, within a reference sequence of occurrences of elements, said elements comprising characters selected from the types of characters consisting of letters, punctuation, digits, blank, and other nonletters, of an input window of N element candidates, comprising the steps of:

for each of a plurality of sets of predefined groups of elements, wherein said groups comprise groups selected from letter groups and nonletter groups and no group contains both letters and nonletters, and wherein for each set of groups of elements an element is assigned to at most one group belonging to said set, associating each element candidate in the input window to one group within each set of groups thereby forming, for each set of groups, an associated window of N groups;

for each set of groups, determining a value related to the probability of occurrence of the associated window of groups in said reference sequence, andcomputing said value related to the probability of occurrence of said input window by combining the values related to the probability of occurrence of each of said windows of groups to obtain an aggregate window value, whereinfor a first selected set of groups of characters;

the letter groups of said set comprise a plurality of generic letter groups of characters, each of said plurality of generic letter groups containing two letters;

a specific lowercase character and its associated uppercase character; and

the nonletter groups of said set comprise;

a generic letter independent group which contains all and only nonletters which are not substantially more contextually distinguishable from other characters based on the order of generic letter groups and nonletter characters in the reference sequence than on the order of uppercase and lowercase groups and nonletter characters in the reference sequence and which are not substantially useful in contextually distinguishing among generic letters based on the order of characters in said reference sequence, anda plurality of generic letter dependent groups the characters within a generic letter dependent group are contextually indistinguishable from each other based on the order of characters occurring in said reference sequence;

for a second selected set of groups of characters;

the letter groups of said set comprise a first group of characters having the property of being uppercase letters and a second group of characters having the property of being lowercase letters, andeach nonletter group is a group having the property that characters contained in said group are substantially not contextually distinguishable from each other based on the order of characters occurring in said reference sequence; and

for a third selected set of groups of characters;

said set comprises a letter group containing all letters; and

the nonletter groups of said set comprise a generic letter independent group and a plurality of generic letter dependent groups.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of identifying an object within a set of object candidates includes the steps of:

calculating the probability of occurrence of each member of a set of string candidates, wherein each string candidate contains one member of the set of object candidates, the calculating employing formulae using a method of groups and projections; and

identifying one of the objects based on the calculated probability.

177 Citations

5 Claims

1. A method for determining a value related to the probability of occurrence, within a reference sequence of occurrences of elements, said elements comprising characters selected from the types of characters consisting of letters, punctuation, digits, blank, and other nonletters, of an input window of N element candidates, comprising the steps of:
- for each of a plurality of sets of predefined groups of elements, wherein said groups comprise groups selected from letter groups and nonletter groups and no group contains both letters and nonletters, and wherein for each set of groups of elements an element is assigned to at most one group belonging to said set, associating each element candidate in the input window to one group within each set of groups thereby forming, for each set of groups, an associated window of N groups;
  
  for each set of groups, determining a value related to the probability of occurrence of the associated window of groups in said reference sequence, andcomputing said value related to the probability of occurrence of said input window by combining the values related to the probability of occurrence of each of said windows of groups to obtain an aggregate window value, whereinfor a first selected set of groups of characters;
  
  the letter groups of said set comprise a plurality of generic letter groups of characters, each of said plurality of generic letter groups containing two letters;
  
  a specific lowercase character and its associated uppercase character; and
  
  the nonletter groups of said set comprise;
  
  a generic letter independent group which contains all and only nonletters which are not substantially more contextually distinguishable from other characters based on the order of generic letter groups and nonletter characters in the reference sequence than on the order of uppercase and lowercase groups and nonletter characters in the reference sequence and which are not substantially useful in contextually distinguishing among generic letters based on the order of characters in said reference sequence, anda plurality of generic letter dependent groups the characters within a generic letter dependent group are contextually indistinguishable from each other based on the order of characters occurring in said reference sequence;
  
  for a second selected set of groups of characters;
  
  the letter groups of said set comprise a first group of characters having the property of being uppercase letters and a second group of characters having the property of being lowercase letters, andeach nonletter group is a group having the property that characters contained in said group are substantially not contextually distinguishable from each other based on the order of characters occurring in said reference sequence; and
  
  for a third selected set of groups of characters;
  
  said set comprises a letter group containing all letters; and
  
  the nonletter groups of said set comprise a generic letter independent group and a plurality of generic letter dependent groups.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as in claim 1, wherein the step of obtaining an aggregate window value comprises the steps of:
    - calculating a first value which is the product of the probability of the window of groups associated with said first set and the probability of the window of groups associated with said second set; and
      
      dividing said first value by the probability of the window of groups associated with said third set.
  - 3. A method as in claim 2 wherein said step of computing said value related to the probability of occurrence of said input window of element candidates further comprises the steps of:
    - determining an aggregate unigram value related to the probability of occurrence of each nonletter in the input window; and
      
      combining said aggregate unigram value with said aggregate window value to obtain said value related to the probability of occurrence of said input window.
  - 4. A method as in claim 3 wherein said aggregate unigram value is determined by the steps of:
    - for each nonletter in said input window which is not uniquely specified by the conjunction of the properties of the groups containing said element candidate;
      
      determining a first associated value related to the probability of occurrence of said element candidate in said reference string;
      
      determining a second associated value related to the probability of occurrence in said reference string of a selected group containing said element candidate;
      
      forming a value related to the ratio of said first and second associated values; and
      
      combining said values related to obtain said aggregate unigram value.
  - 5. A method as in claim 4, wherein said selected group has the property that its elements are substantially not distinguishable from each other based on the order of elements occurring in said reference sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
The Palantir Corp.
Inventors
Bokser, Mindy R.
Primary Examiner(s)
COUSO, JOSE L

Application Number

US07/157,399
Time in Patent Office

1,524 Days
Field of Search

281/41-43, 382/14, 382/15, 382/36-38, 382/40, 364/513, 364/513.5
US Class Current

382/230
CPC Class Codes

G06V 30/10 Character recognition

G06V 30/274 Syntactic or semantic conte...

Means for resolving ambiguities in text based upon character context

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

177 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Means for resolving ambiguities in text based upon character context

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

177 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links