Systems and methods for processing text-based electronic documents
First Claim
Patent Images
1. A method for processing a text-based electronic document, the method comprising:
- performing optical character recognition processing on the text-based electronic document by comparing at least one word in the text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule;
if the at least one word does not conform to the predefined rule;
fragmenting the at least one word into word fragments;
combining at least two consecutive word fragments of the at least one word to form a combination of the word fragments; and
comparing the combination of the word fragments to the native language dictionary such that, if the combination of the word fragments conforms to the predefined rule, the combination is used in the text-based electronic document; and
determining whether the at least one word matches a combination of;
a word entry in the native language dictionary; and
at least one of a common prefix and a common suffix such that, if the combination conforms to the predefined rule, the combination is used in the text-based electronic document.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for processing text-based electronic documents are provided. Briefly described, one embodiment of a method for processing a text-based electronic document comprises the steps of: comparing at least one word in a text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; for each of the at least one word that does not conform to the predefined rule, fragmenting the at least one word into word fragments; combining at least two consecutive word fragments; and comparing the combination of the word fragments to the native language dictionary.
55 Citations
25 Claims
-
1. A method for processing a text-based electronic document, the method comprising:
-
performing optical character recognition processing on the text-based electronic document by comparing at least one word in the text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; if the at least one word does not conform to the predefined rule; fragmenting the at least one word into word fragments;
combining at least two consecutive word fragments of the at least one word to form a combination of the word fragments; and
comparing the combination of the word fragments to the native language dictionary such that, if the combination of the word fragments conforms to the predefined rule, the combination is used in the text-based electronic document; anddetermining whether the at least one word matches a combination of;
a word entry in the native language dictionary; and
at least one of a common prefix and a common suffix such that, if the combination conforms to the predefined rule, the combination is used in the text-based electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for processing a text-based electronic document, the system comprising:
logic configured to perform optical character recognition processing, the logic being operative to; compare at least one word in the text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; fragment the at least one word into word fragments if the at least one word does not conform to the predefined rule;
combine at least two consecutive word fragments of the at least one word to form a combination of the word fragments; and
compare the combination of the word fragments to the native language dictionary such that, if the combination of the word fragments conforms to the predefined rule, the combination is used in the text-based electronic document; anddetermine whether the at least one word matches a combination of;
a word entry in the native language dictionary; and
at least one of a common prefix and a common suffix if the at least one word does not conform to the predefined rule such that, if the combination conforms to the predefined rule, the combination is used in the text-based electronic document.- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
22. A system for processing a text-based electronic document, the system comprising:
-
a means for comparing, during optical character recognition processing, at least one word in the text-based electronic document to a native language dictionary to determine whether the at least one word conforms to a predefined rule; a word fragmentation means for fragmenting the at least one word into word fragments if the at least one word does not conform to the predefined rule; a word fragment integration means for combining at least two consecutive word fragments of the at least one word to form a combination of the word fragments; a means for comparing the combination of the word fragments to the native language dictionary such that, if the combination of the word fragments conforms to the predefined rule, the combination is used in the text-based electronic document; and a means for determining whether the at least one word matches a combination of;
a word entry in the native language dictionary; and
at least one of a common prefix and a common suffix such that, if the combination conforms to the predefined rule, the combination is used in the text-based electronic document. - View Dependent Claims (23, 24, 25)
-
Specification