Segmentation of an Input by Cut Point Classification
First Claim
1. A computer-implemented method comprising:
- receiving, by a computerized text recognition system, an input in a script;
applying a heuristic to the input to insert a plurality of cut points;
generating a probability for each of the plurality of cut points, wherein the probability indicates a likelihood that the cut point is correct;
selecting a plurality of segments of the input defined by cut points having a probability over a threshold; and
providing the plurality of segments of the input to a character recognizer.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for segmenting an input by cut point classification and training a cut classifier. A method may include receiving, by a computerized text recognition system, an input in a script. A heuristic may be applied to the input to insert multiple cut points. For each of the cut points, a probability may be generated and the probability may indicate a likelihood that the cut point is correct. Multiple segments of the input may be selected, and the segments may be defined by cut points having a probability over a threshold. Next, the segments of the input may be provided to a character recognizer. Additionally, a method may include training a cut classifier using a machine learning technique, based on multiple text training examples, to determine the correctness of a cut point in an input.
20 Citations
36 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a computerized text recognition system, an input in a script; applying a heuristic to the input to insert a plurality of cut points; generating a probability for each of the plurality of cut points, wherein the probability indicates a likelihood that the cut point is correct; selecting a plurality of segments of the input defined by cut points having a probability over a threshold; and providing the plurality of segments of the input to a character recognizer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method comprising:
-
receiving a plurality of text training examples, each text training example including; a portion of text, a plurality of cut points that separate the portion of text into a plurality of segments, and for each of the plurality of cut points in the text training example, an indication of the correctness of the cut point; and training a cut classifier using a machine learning technique, based on the plurality of text training examples, to determine the correctness of a cut point in an input. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
a processor configured to; receive, by a computerized text recognition system, an input in a script; apply a heuristic to the input to insert a plurality of cut points; generate a probability for each of the plurality of cut points, wherein the probability indicates a likelihood that the cut point is correct; select a plurality of segments of the input defined by cut points having a probability over a threshold; and provide the plurality of segments of the input to a character recognizer. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
-
29. A system comprising:
-
a processor configured to; receive a plurality of text training examples, each text training example including; a portion of text, a plurality of cut points that separate the portion of text into a plurality of segments, and for each of the plurality of cut points in the text training example, an indication of the correctness of the cut point; and train a cut classifier using a machine learning technique, based on the plurality of text training examples, to determine the correctness of a cut point in an input. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
-
Specification