×

Method and system for automatic transcription correction

  • US 5,883,986 A
  • Filed: 06/02/1995
  • Issued: 03/16/1999
  • Est. Priority Date: 06/02/1995
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of operating a system to correct errors in a transcription of a text image;

  • the system including a processor and a memory device for storing data;

    the data stored in the memory device including instruction data the processor executes to operate the system;

    the processor being connected to the memory device for accessing the data stored therein;

    the method comprising;

    operating the processor to obtain a formal two-dimensional image source model data structure, hereafter referred to as a 2D image model, modeling as a grammar a set of two-dimensional (2D) text images;

    each 2D text image including a plurality of glyphs occurring therein;

    each glyph being an image instance of a respective one of a plurality of characters in an input image character set;

    the 2D image model including mapping data indicating a mapping between a glyph occurring in a 2D text image and a respective message string identifying a character in the input image character set;

    operating the processor to obtain an image definition data structure defining a two-dimensional text image, hereafter referred to as an input 2D image of glyphs, including a plurality of glyphs occurring therein representing characters in the input image character set;

    the input 2D image of glyphs having a vertical dimension size larger than a single line of glyphs;

    the input 2D image of glyphs being one of the set of 2D text images modeled by the 2D image model;

    operating the processor to obtain a first transcription data structure, hereafter referred to as a first transcription, associated with the input 2D image of glyphs;

    the first transcription including a first ordered arrangement of transcription labels identifying characters in the input image character set represented by the glyphs occurring in the input 2D image of glyphs;

    the first transcription including at least one transcription error;

    operating the processor to modify the mapping data included in the 2D image model using the transcription labels in the first transcription to produce modified mapping data included in a modified 2D image model; and

    operating the processor to perform a recognition operation on the input 2D image of glyphs using the modified mapping data included in the modified 2D image model;

    the modified mapping data mapping a sequence of glyphs occurring in a 2D text image to a sequence of respective message strings identifying characters in the input image character set;

    the sequence of message strings produced by the modified mapping data indicating a second transcription identifying the characters represented by the glyphs occurring in the input 2D image of glyphs and including a message string indicating a correction of the at least one transcription error in the first transcription;

    wherein the glyphs included in the input 2D image of glyphs are perceptible as appearing in a visually consistent character image design, hereafter referred to as an input image font;

    wherein the mapping data included in the 2D image model includes a first set of character templates;

    wherein operating the processor to modify the mapping data included in the 2D image model includesproducing character template training data including a plurality of glyph samples and respectively paired glyph labels for each character in the input image character set;

    each glyph sample being included in the input 2D image of glyphs;

    each respectively paired glyph label being produced using the first transcription and indicating a respective one of the characters in the input image character set; and

    producing a second set of character templates using the character template training data;

    the second set of character templates indicating character images of the characters in the input image character set and being perceptible as appearing in the input image font; and

    wherein performing the recognition operation on the input 2D image of glyphs using the modified 2D image model includes mapping each glyph occurring in the input 2D image of glyphs to a respective message string identifying the character in the input image character set using the second set of character templates appearing in the input image font.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×