Selective display of OCR'ed text and corresponding images from publications on a client device
First Claim
1. A computer-implemented method for displaying a document, the method comprising:
- identifying a document including at least one text segment generated responsive to an Optical Character Recognition (OCR) process performed on an image segment, wherein the text segment includes a plurality of characters;
generating a quality score for each of the plurality of characters;
generating a segment quality measure for the text segment by averaging the generated quality scores, the segment quality measure indicating a quality of the text segment; and
responsive to the segment quality measure not meeting a quality threshold, displaying the image segment instead of the text segment on a display of a client device.
2 Assignments
0 Petitions
Accused Products
Abstract
Text is extracted from a source image of a publication using an Optical Character Recognition (OCR) process. A document is generated containing text segments of the extracted text. The document includes a control module that responds to user interactions with the displayed document. Responsive to a user selection of a displayed text segment, a corresponding image segment from the source image containing the text is retrieved and rendered in place of the selected text segment. The user can select again to toggle the display back to the text segment. Each text segment can be tagged with a garbage score indicating its quality. If the garbage score of a text segment exceeds a threshold value, the corresponding image segment can be automatically displayed instead.
-
Citations
20 Claims
-
1. A computer-implemented method for displaying a document, the method comprising:
-
identifying a document including at least one text segment generated responsive to an Optical Character Recognition (OCR) process performed on an image segment, wherein the text segment includes a plurality of characters; generating a quality score for each of the plurality of characters; generating a segment quality measure for the text segment by averaging the generated quality scores, the segment quality measure indicating a quality of the text segment; and responsive to the segment quality measure not meeting a quality threshold, displaying the image segment instead of the text segment on a display of a client device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium encoded with executable computer program code for:
-
identifying a document including at least one text segment generated responsive to an Optical Character Recognition (OCR) process performed on an image segment, wherein the text segment includes a plurality of characters; generating a quality score for each of the plurality of characters; generating a segment measure for text segment by averaging the generated quality scores, the segment quality measure indicating a quality of the text segment; and responsive to the segment quality measure not meeting a quality threshold, displaying the image segment instead of the text segment on a display of a client device. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for displaying a publication, the system comprising:
-
a computer processor; and a non-transitory computer-readable storage medium encoded with computer program code adapted to execute on the computer processor for; identifying a document including at least one text segment generated responsive to an Optical Character Recognition (OCR) process performed on an image segment, wherein the text segment includes a plurality of characters; generating a quality score for each of the plurality of characters; generating a segment quality measure for the text segment by averaging the generated quality scores, the segment quality measure indicating a quality of the text segment; and responsive to the segment quality measure not meeting a quality threshold, displaying the image segment instead of the text segment on a display of a client device. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification