Script-agnostic text reflow for document images
First Claim
1. A computer-implemented process for reflowing a binarized region of text from a document image into a specified-width display area on an electronic display in a script-agnostic manner, comprising:
- using a computer to perform the following process actions;
segmenting the text region into candidate lines of text, said candidate lines of text comprising at least one line of text comprising substantially only accent or diacritic marks, or both;
merging candidate lines of text comprising substantially only accent or diacritic marks, or both, into the closest adjacent candidate text line;
designating each remaining candidate text line as a final text line;
segmenting each final text line into candidate text words;
identifying inter-word punctuation and diacritic marks, if any, and merging each identified mark into the closest adjacent candidate text word;
designating final text words based on the remaining candidate text words;
segmenting the final text lines into paragraphs; and
for each paragraph, reflowing the final text words found therein so as to fit into said specified-width display area while maintaining the original sequential order.
2 Assignments
0 Petitions
Accused Products
Abstract
Script-agnostic text reflow technique embodiments are presented that generally reflow text found in an image of a document in a manner that functions across multiple scripts, multiple fonts of a script and multiple languages using the same script. This generally involves segmenting regions of text in a document image into individual words and doing this without relying on any script-specific characteristics or requiring any form of character recognition. While segmenting text, the possible presence of accents, diacritics and punctuation marks is considered.
-
Citations
20 Claims
-
1. A computer-implemented process for reflowing a binarized region of text from a document image into a specified-width display area on an electronic display in a script-agnostic manner, comprising:
-
using a computer to perform the following process actions; segmenting the text region into candidate lines of text, said candidate lines of text comprising at least one line of text comprising substantially only accent or diacritic marks, or both; merging candidate lines of text comprising substantially only accent or diacritic marks, or both, into the closest adjacent candidate text line; designating each remaining candidate text line as a final text line; segmenting each final text line into candidate text words; identifying inter-word punctuation and diacritic marks, if any, and merging each identified mark into the closest adjacent candidate text word; designating final text words based on the remaining candidate text words; segmenting the final text lines into paragraphs; and for each paragraph, reflowing the final text words found therein so as to fit into said specified-width display area while maintaining the original sequential order. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-implemented process for segmenting a binarized region of text from a document image in a script-agnostic manner, comprising:
-
using a computer to perform the following process actions; segmenting the text region into candidate lines of text, said candidate lines of text comprising at least one line of text comprising substantially only accent or diacritic marks, or both; merging candidate lines of text comprising substantially only accent or diacritic marks, or both, into the closest adjacent candidate text line; designating each remaining candidate text lines as a final text line; and segmenting each final text line into text words. - View Dependent Claims (19)
-
-
20. A computer-implemented process for reflowing a binarized region of text comprising black pixels representing text and white pixels representing non-text areas from a document image into a specified-width display area on an electronic display without the need for information about the script employed for the text, comprising:
-
using a computer to perform the following process actions; segmenting the text region into candidate lines of text; removing the candidate designation from each candidate text line that is attributable to inter-line black pixel noise rather than to text; merging candidate lines of text comprising substantially only accent or diacritic marks, or both, into the closest adjacent candidate text line; designating each remaining candidate text line as a final text line; segmenting each final text line into candidate text words; removing the candidate designation from each candidate text word that is attributable to inter-word black pixel noise rather than text; identifying inter-word punctuation and diacritic marks, if any, and merging each identified mark into the closest adjacent candidate text word; designating final text words based on the remaining candidate text words; segmenting the final text lines into paragraphs; and for each paragraph, reflowing the final text words found therein so as to fit into said specified-width display area while maintaining the original sequential order, wherein said reflowing comprises vertically aligning word or words being moved from one final text line to the next with the words associated with the final text line receiving the word or words being moved.
-
Specification