Document based character ambiguity resolution
First Claim
1. A computer program product, stored on a machine-readable medium, comprising instructions operable to cause a programmable processor to:
- search a document for a word that contains an end-of-line hyphen;
create a solution set for the word containing the end-of-line hyphen, wherein each solution in the solution set is obtained by identifying the end-of-line hyphen as either a soft-hyphen that has been inserted into the word for typesetting purposes or a hard-hyphen that belongs in the word;
search a dictionary for each solution in the solution set; and
use the results from the dictionary search to identify the end-of-line hyphen as either a soft-hyphen or a hard-hyphen.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for document based ambiguous character resolution. An application searches a document for words that do not contain ambiguous characters and adds them to a dictionary, then searches the document for words that do contain ambiguous characters. For each ambiguous word, a set of candidate solutions is created by resolving the ambiguous characters in all possible ways. The dictionary is searched for words matching members of the candidate solution set. When a single member is matched, the ambiguous characters are resolved accordingly. When no member or more than one member is matched, a user is prompted to resolve the ambiguous characters. Alternatively, when more than one member is matched, the ambiguous characters are resolved to obtain the largest word, the smallest word, the most words, or the fewest words.
35 Citations
18 Claims
-
1. A computer program product, stored on a machine-readable medium, comprising instructions operable to cause a programmable processor to:
-
search a document for a word that contains an end-of-line hyphen; create a solution set for the word containing the end-of-line hyphen, wherein each solution in the solution set is obtained by identifying the end-of-line hyphen as either a soft-hyphen that has been inserted into the word for typesetting purposes or a hard-hyphen that belongs in the word; search a dictionary for each solution in the solution set; and use the results from the dictionary search to identify the end-of-line hyphen as either a soft-hyphen or a hard-hyphen. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for classifying an end-of-line hyphen as either a soft-hyphen that has been inserted into the word for typesetting purposes or a hard-hyphen that belongs in the word, comprising:
-
searching a document for a word that contains an end-of-line hyphen; creating a solution set for the word containing the end-of-line hyphen, wherein each solution in the solution set is obtained by identifying the end-of-line hyphen as either a soft-hyphen that has been inserted into the word for typesetting purposes or a hard-hyphen that belongs in the word; searching a dictionary for each solution in the solution set; and using the results from the dictionary search to identify the end-of-line hyphen as either a soft-hyphen or a hard-hyphen. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification