Method for aligning a text image to a transcription of the image
First Claim
1. A method of operating a machine to align image words in a text image to transcription words in a transcription associated with the text image;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data the processor executes to operate the machine;
the processor being connected to the memory device for accessing the data stored therein;
the method comprising;
operating the processor to obtain an image definition data structure defining a text image including a plurality of glyphs representing characters in an image character set;
operating the processor to obtain a transcription data structure associated with the text image and including a plurality of transcription labels indicating character codes representing characters in the image character set;
operating the processor to produce an ordered image word sequence of image words occurring in the text image;
each image word being an image region in the text image including at least one glyph;
operating the processor to produce an ordered transcription word sequence of transcription words occurring in the transcription data structure;
each transcription word being a sequence of at least one transcription label indicating a character code in the image character set; and
operating the processor to perform an alignment operation to align image words in the ordered image word sequence with transcription words in the ordered transcription word sequence subject to a constraint of maintaining word order in each of the ordered transcription word sequence and the ordered image word sequence during alignment;
the alignment operation producing an image-transcription alignment data structure indicating each image word in the ordered image word sequence paired with either no (a null) transcription word or with at most one transcription word in the ordered transcription word sequence.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined. The transcription is analyzed to identify words, and transcription word lengths are computed using an estimated image character width of glyphs in the text image. The sequence of observed image word lengths is then matched to the sequence of computed transcription word lengths using a dynamic programming algorithm that finds a best path through a two-dimensional lattice of nodes and transitions between nodes, where the transitions represent pairs of sequences of zero or more word lengths. An output data structure contains entries, each of which pairs a transcription word with a matching image word location.
-
Citations
15 Claims
-
1. A method of operating a machine to align image words in a text image to transcription words in a transcription associated with the text image;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data the processor executes to operate the machine;
the processor being connected to the memory device for accessing the data stored therein;
the method comprising;operating the processor to obtain an image definition data structure defining a text image including a plurality of glyphs representing characters in an image character set; operating the processor to obtain a transcription data structure associated with the text image and including a plurality of transcription labels indicating character codes representing characters in the image character set; operating the processor to produce an ordered image word sequence of image words occurring in the text image;
each image word being an image region in the text image including at least one glyph;operating the processor to produce an ordered transcription word sequence of transcription words occurring in the transcription data structure;
each transcription word being a sequence of at least one transcription label indicating a character code in the image character set; andoperating the processor to perform an alignment operation to align image words in the ordered image word sequence with transcription words in the ordered transcription word sequence subject to a constraint of maintaining word order in each of the ordered transcription word sequence and the ordered image word sequence during alignment;
the alignment operation producing an image-transcription alignment data structure indicating each image word in the ordered image word sequence paired with either no (a null) transcription word or with at most one transcription word in the ordered transcription word sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- the machine including a processor and a memory device for storing data;
-
10. A method of operating a machine to label image locations of image regions in a text image with label data available in a transcription data structure associated with the text image;
- the machine including a processor and for memory device for storing data;
the data stored in the memory device including instruction data the processor executes to operate the machine;
the processor being connected to the memory device for accessing the data stored therein;
the method comprising;operating the processor to store, in the memory device of the machine, an image definition data structure defining a text image including a plurality of glyphs; operating the processor to store, in the memory device of the machine, a transcription data structure associated with the text image and including an ordered sequence of transcription labels indicating character code information about image regions occurring in the text image; operating the processor to produce an ordered image region sequence of image regions occurring in the text image;
each image region being defined with respect to its location in the text image relative to at least one glyph occurring in the text image;operating the processor to produce an ordered transcription region sequence of transcription regions occurring in the transcription data structure;
each transcription region being a sequence of at least one transcription label;operating the processor to perform an alignment operation to align image regions in the ordered image region sequence with transcription regions in the ordered transcription region sequence;
the alignment operation computing similarity measurements measuring a similarity between image regions and transcription regions;
the alignment operation determining a best pairing between the image regions in the ordered image region sequence and transcription regions in the ordered transcription region sequence that optimizes the similarity measurements such that each one of the image regions in the ordered image region sequence is paired with either no (a null) transcription region or with at most one transcription region in the ordered transcription region sequence; andoperating the processor to produce an image transcription alignment data structure indicating transcription regions paired with image region locations of paired image regions. - View Dependent Claims (11, 12)
- the machine including a processor and for memory device for storing data;
-
13. A method of operating a machine to pair image words in a two-dimensional (2D) text image with transcription words in a transcription associated with the text image;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data the processor executes to operate the machine;
the processor being connected to the memory device for accessing the data stored therein;
the method comprising;operating the processor to receive and store, in the memory device of the machine, an image definition data structure defining a two-dimensional (2D) text image of pixels and including a plurality of glyphs;
the 2D text image having a vertical dimension size larger than a single line;
the plurality of glyphs representing characters in an image character set;operating the processor to receive and store, in the memory device of the machine, a transcription data structure associated with the 2D text image and including a plurality of transcription labels indicating character codes representing characters in the image character set; operating the processor to determine an image word location in the 2D text image for each of a plurality of image words occurring therein;
each image word being an image region including at least one glyph positioned with respect to an image baseline;
the plurality of image word locations identifying an ordered image word sequence of the image words occurring in the 2D text image;operating the processor to determine an image word length value for each image word in the 2D text image indicating a length of the image word; operating the processor to determine an image character width value indicating an estimated character width of a glyph in the 2D text image; operating the processor to determine a transcription word location for each of a plurality of transcription words occurring in the transcription data structure;
each transcription word being a sequence of at least one transcription label indicating a character code in the image character set and separated from other transcription labels indicating character codes in the image character set by a transcription word boundary character code;
the plurality of transcription word locations identifying an ordered transcription word sequence of the transcription words occurring in the transcription data structure;operating the processor to determine a transcription word length value for each transcription word using the image character width value and a count of the transcription labels included in the transcription word; operating the processor to perform an alignment operation to align transcription word lengths in the ordered transcription word sequence with image word lengths in the ordered image word sequence;
the alignment operation maintaining the ordered image word sequence and the ordered transcription word sequence when aligning the image word lengths and the transcription word lengths;
the alignment operation producing a list of word pairs indicating transcription words in the ordered transcription word sequence paired to matching image words in the ordered image word sequence; andoperating the processor to produce an image-transcription alignment data structure using the list of word pairs, the image word locations and the transcription word locations;
the image-transcription alignment data structure indicating, for each word pair, the transcription word location of a transcription word and the image word location of a paired image word. - View Dependent Claims (14, 15)
- the machine including a processor and a memory device for storing data;
Specification