×

Identification of key segments in document images

  • US 10,699,112 B1
  • Filed: 09/28/2018
  • Issued: 06/30/2020
  • Est. Priority Date: 09/28/2018
  • Status: Active Grant
First Claim
Patent Images

1. A computerized method for identifying keywords in a document image, comprising:

  • (i) retrieving a document image from a set of document images where each document in the set of document images contains information organized in a two-dimensional structure and contains keywords, where each keyword of a set of the keywords has a value associated therewith;

    (ii) processing the document image to identify text segments contained within the document image;

    (iii) processing the text segments to identify subword embeddings associated with each of the text segments, wherein each of the subword embeddings associated with a text segment represents a character group in the document image,(iv) generating an n-dimensional vector for each text segment from its subword embeddings;

    (v) for each identified text segment, mapping one or more of the n-dimensional vectors to each of the identified text segments to generate for each identified text segment, a feature vector which describes a local context of the identified text segment;

    (vi) retrieving an annotated version of the document image containing a visual indication annotation associated with each visual indication of a keyword in the document;

    (vii) associating with each visual indication of a keyword in the annotated version of the document image a corresponding feature vector to generate a training document; and

    (viii) repeating steps (i) through (vii) for each document from the set of document images to generate a set of training documents.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×