×

Relabelling of tokenized symbols in fontless structured document image representations

  • US 20010043349A1
  • Filed: 06/18/2001
  • Published: 11/22/2001
  • Est. Priority Date: 05/23/1996
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising the steps of:

  • providing a processor with a first set of digital information comprising a first structured representation of a document, a plurality of image collections being obtainable from the first structured representation, each such obtainable image collection comprising at least one image, each image in each such collection being an image of at least a portion of the document;

    with a processor, producing from the first set of digital information a second set of digital information comprising a second structured representation of the document, the second structured representation being a lossless representation of a particular image collection, the particular image collection being one of the plurality of image collections obtainable from the first structured representation, the second structured representation including a plurality of tokens and a plurality of positions, wherein at least one token in the plurality of tokens has an associated semantic label, the second set of digital information being produced by extracting the plurality of tokens from the first structured representation, each token comprising a set of pixel data representing a subimage of the particular image collection, and determining the plurality of positions from the first structured representation, each position being a position of a token subimage in the particular image collection, a token subimage being one of the subimages from one of the tokens, at least one token subimage having a plurality of pixels and occurring at more than one position in the image collection;

    and making the second set of digital information thus produced available for further use.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×