×

Systems and methods for processing text-based electronic documents

  • US 7,106,905 B2
  • Filed: 08/23/2002
  • Issued: 09/12/2006
  • Est. Priority Date: 08/23/2002
  • Status: Active Grant
First Claim
Patent Images

1. A method for processing a text-based electronic document, the method comprising:

  • performing optical character recognition processing on the text-based electronic document by comparing at least one word in the text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule;

    if the at least one word does not conform to the predefined rule;

    fragmenting the at least one word into word fragments;

    combining at least two consecutive word fragments of the at least one word to form a combination of the word fragments; and

    comparing the combination of the word fragments to the native language dictionary such that, if the combination of the word fragments conforms to the predefined rule, the combination is used in the text-based electronic document; and

    determining whether the at least one word matches a combination of;

    a word entry in the native language dictionary; and

    at least one of a common prefix and a common suffix such that, if the combination conforms to the predefined rule, the combination is used in the text-based electronic document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×