Methods and systems for assessing the quality of automatically generated text
First Claim
1. A computer-implemented method of assessing quality of computer-generated text in a document image, the method comprising:
- identifying text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image;
identifying a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value;
segmenting the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters;
determining a representative text quality score for each segment of the document image; and
storing the representative text quality scores in association with the segments.
2 Assignments
0 Petitions
Accused Products
Abstract
A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model.
15 Citations
21 Claims
-
1. A computer-implemented method of assessing quality of computer-generated text in a document image, the method comprising:
-
identifying text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image; identifying a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value; segmenting the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters; determining a representative text quality score for each segment of the document image; and storing the representative text quality scores in association with the segments. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for segmenting a document image, comprising:
-
a processor for executing computer program instructions; a non-transitory computer-readable storage medium storing executable computer program instructions, the computer-readable storage medium comprising; a score module configured to identify text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image; a score analysis module configured to; identify a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value; segment the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters; and determine a representative text quality score for each segment of the document image; and a text quality database configured to store the representative text quality scores in association with the segments. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing executable computer program instructions for performing steps comprising:
-
identifying text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image; identifying a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value; segmenting the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters; determining a representative text quality score for each segment of the document image; and storing the representative text quality scores in association with the segments. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification