Removing character from text in non-image form where location of character in image of text falls outside of valid content boundary
First Claim
Patent Images
1. A method comprising:
- receiving, by a processor, data representing an image of text, and data representing the text in non-image form;
determining, by the processor, a valid content boundary within the image of the text, the valid content boundary dividing a portion of the image corresponding to valid text of the image from and excluding another portion of the image corresponding to one or more of stray marks, dirt, debris, and handwritten notes;
for each character of a plurality of characters within the text in the non-image form,determining, by the processor, a location of the character within the image of the text; and
where the location of the character within the image of the text falls outside the valid content boundary, removing the character from the data representing the text in the non-image form, by the processor.
2 Assignments
0 Petitions
Accused Products
Abstract
Data representing an image of text is received, as is data representing the text in non-image form. A valid content boundary within the image of the text is determined. For each character within the text in the non-image form, a location of the character within the image of the text is determined. Where the location of the character within the image of the text falls outside the valid content boundary, the character is removed from the data representing the text in the non-image form.
52 Citations
22 Claims
-
1. A method comprising:
-
receiving, by a processor, data representing an image of text, and data representing the text in non-image form; determining, by the processor, a valid content boundary within the image of the text, the valid content boundary dividing a portion of the image corresponding to valid text of the image from and excluding another portion of the image corresponding to one or more of stray marks, dirt, debris, and handwritten notes; for each character of a plurality of characters within the text in the non-image form, determining, by the processor, a location of the character within the image of the text; and where the location of the character within the image of the text falls outside the valid content boundary, removing the character from the data representing the text in the non-image form, by the processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable data storage medium storing a computer program executable by a processor, execution of the computer program by the processor causing a method to be performed, the method comprising:
-
receiving first data representing an image of text and a marking other than the text, and second data representing the text and the marking in non-image form, the marking represented within the second data as if the marking were part of the text; determining a valid content boundary within the image, the valid content boundary dividing a portion of the image corresponding to the text from and excluding another portion of the image corresponding to the marking; for each character of a plurality of characters within the second data, determining a location of the character within the first data; and where the location of an image portion within the first data corresponding to the character falls outside the valid content boundary, removing the character from the second data. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computing system comprising:
-
a processor; a computer-readable data storage medium to store first data representing an image of text and a marking other than the text, and second data representing the text and the marking in non-image form, the marking represented within the second data as if the marking were part of the text; and logic executable by the processor to determine a valid content boundary within the image, the valid content boundary dividing a portion of the image corresponding to the text from and excluding another portion of the image corresponding to the marking, and for each character of a plurality of characters within the second data, to remove the character from the second data where the logic has determined that a location of an image portion within the first data corresponding to the character falls outside the valid content boundary. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification