Multiple hypothesis testing for word detection
First Claim
1. A processor implemented method for determining words in a character sequence output during Optical Character Recognition (OCR), the method comprising:
- determining a set of one or more bifurcation points for the character sequence, wherein each bifurcation point identifies a location to split the character sequence into two or more words and wherein the one or more bifurcation points are determined based on a separation between adjacent characters in the character sequence;
generating a plurality of hypotheses, each hypothesis comprising one or more words formed by the character sequence, at least one of the hypotheses being generated based on the one or more bifurcation points;
computing a plurality of normalized scores, each normalized score corresponding to a hypothesis, wherein the normalized score for a corresponding hypothesis is based, in part, on a length of each word in a set of the one or more words associated with the corresponding hypothesis; and
selecting a hypothesis from the plurality of hypotheses based on a corresponding normalized score associated with the selected hypothesis.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments disclosed pertain to Optical Character Recognition using Multiple Hypothesis Testing based techniques on images occurring in a variety of settings, including images captured by mobile stations. In some embodiments, a set of bifurcation points for a character cluster in an image may be determined. The character cluster may comprise non-uniformly spaced text or closely spaced text. A plurality of hypotheses may be determined for the character cluster, where each hypothesis is based on a subset of the bifurcation points and comprises a set of words generated from the character cluster. A plurality of scores corresponding to the plurality of hypotheses may be determined, where each score corresponds to a hypothesis, and a hypothesis may be selected from among the plurality of hypotheses based on a score associated with the selected hypothesis.
-
Citations
20 Claims
-
1. A processor implemented method for determining words in a character sequence output during Optical Character Recognition (OCR), the method comprising:
-
determining a set of one or more bifurcation points for the character sequence, wherein each bifurcation point identifies a location to split the character sequence into two or more words and wherein the one or more bifurcation points are determined based on a separation between adjacent characters in the character sequence; generating a plurality of hypotheses, each hypothesis comprising one or more words formed by the character sequence, at least one of the hypotheses being generated based on the one or more bifurcation points; computing a plurality of normalized scores, each normalized score corresponding to a hypothesis, wherein the normalized score for a corresponding hypothesis is based, in part, on a length of each word in a set of the one or more words associated with the corresponding hypothesis; and selecting a hypothesis from the plurality of hypotheses based on a corresponding normalized score associated with the selected hypothesis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus comprising:
-
a processor configured to; determine a set of one or more bifurcation points for a character sequence output during Optical Character Recognition (OCR), wherein each bifurcation point identifies a location to split the character sequence into two or more words and wherein the one or more bifurcation points are determined based on a separation between adjacent characters in the character sequence; generate a plurality of hypotheses comprising one or more words formed by the character sequence, at least one of the hypotheses being generated based on the one or more bifurcation points compute a plurality of normalized scores, each normalized score corresponding to a hypothesis, wherein the normalized score for a corresponding hypothesis is based, in part, on a length of each word in the set of the one or more words associated with the corresponding hypothesis; and select a hypothesis from the plurality of hypotheses based on a corresponding normalized score associated with the selected hypothesis. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus comprising:
-
processing means, the processing means further comprising; means for determining a set of one or more bifurcation points for a character sequence output from Optical Character Recognition (OCR), wherein each bifurcation point identifies a location to split the character sequence into two or more words and wherein the one or more bifurcation points are determined based on a separation between adjacent characters in the character sequence; generating a plurality of hypotheses comprising one or more words formed by the character sequence, at least one of the hypotheses being generated based on the one or more bifurcation points means for computing a plurality of normalized scores, each normalized score corresponding to a hypothesis, wherein the normalized score for a corresponding hypothesis is based, in part, on the length of each word in the set of the one or more words associated with the corresponding hypothesis; and means for selecting a hypothesis from the plurality of hypotheses based on a corresponding normalized score associated with the selected hypothesis.
-
Specification