METHODS AND SYSTEMS FOR ASSESSING THE QUALITY OF AUTOMATICALLY GENERATED TEXT
2 Assignments
0 Petitions
Accused Products
Abstract
A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model.
4 Citations
41 Claims
-
1-20. -20. (canceled)
-
21. A computer-implemented method of assessing quality of computer-generated text in a document image, the method comprising:
-
identifying text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image; identifying a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value; segmenting the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters; determining a representative text quality score for each segment of the document image; and storing the representative text quality scores in association with the segments. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
-
28. A computer system for segmenting a document image, comprising:
-
a processor for executing computer program instructions; a non-transitory computer-readable storage medium storing executable computer program instructions, the computer-readable storage medium comprising; a score module configured to identify text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image; a score analysis module configured to; identify a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value; segment the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters; and determine a representative text quality score for each segment of the document image; and a text quality database configured to store the representative text quality scores in association with the segments. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A non-transitory computer-readable storage medium storing executable computer program instructions for performing steps comprising:
-
identifying text quality scores associated with a plurality of digital text characters generated from the document image, a text quality score for a target character describing a likelihood of the target character being at a location of the target character within the document image; identifying a plurality of subsets of characters having associated text quality scores that differ from text quality scores associated with neighboring characters in the document image by more than a threshold value; segmenting the document image into a plurality of segments associated with different text quality scores responsive to the identified plurality of subsets of characters; determining a representative text quality score for each segment of the document image; and storing the representative text quality scores in association with the segments. - View Dependent Claims (36, 37, 38, 39, 40, 41)
-
Specification