×

Method of text information recognition from a graphical file with use of dictionaries and other supplementary data

  • US 20080008386A1
  • Filed: 07/06/2006
  • Published: 01/10/2008
  • Est. Priority Date: 07/06/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method of a text data recognition from an image file comprisingobtaining an image file from scanning device or from other source,preliminarily assignment of the whole or a part of the following list of applied supplementary data types and an order of application thereto:

  • a line-to-graphemes parsing information and/ora graphical element (grapheme) recognition quality, and/ora whole words dictionary, and/ora dictionary of permissible word fragments, and/orrules, prescribed by applied standard data patterns or regular expressions, and/orrules, prescribed by word disposition within the line or the paragraph, and/orrules, prescribed by the document language peculiarities, and/orrules, prescribed by the document type peculiarities, and/orsupplementary rules for rare occasions,preliminarily assignment of an accuracy estimation for each type of supplementary data,performance of one or more line-to-fragments parsing versions by reliably recognized spaces, said fragments presumably comprising single word images,building of line partition graph (hereinafter LPG) for each line fragment, said graph describing fragment-to-graphemes parsing versions, said graphemes presumably comprising character images,single graphemes recognition, using two or more classifiers of different types,assignment of each said grapheme recognition version accuracy estimation,interpretation of grapheme recognition version as a character version,performance of at least the following steps;

    the first step;

    for each LPG chain connecting initial node and final node, a set of chains are built using all obtained recognized grapheme-to-character versions,a total recognition accuracy level is calculated for each said chain,obtained results are sorted in a total recognition accuracy descending order,the second step;

    all obtained character group versions are analyzed using supplemental information about capital-small characters disposition,in a case of more than one grapheme-to-character recognition version being available, said each obtained recognition version is analyzed with the successive application of subsequent said supplemental data types in connection with the preliminarily assigned order or with a joint application thereof if necessary,each obtained version is assigned an accuracy estimation,character versions having said accuracy estimation lower, than the preliminarily assigned level are discarded,the remain versions are sorted in a descending order using pair wise comparison;

    the third step;

    a supplementary space recognition correction is performed with respect to a previously mistakenly recognized spaces comprising;

    joining of previously mistakenly separated elements,separation of previously mistakenly combined elements.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×