×

Document alteration based on native text analysis and OCR

  • US 9,256,798 B2
  • Filed: 01/31/2013
  • Issued: 02/09/2016
  • Est. Priority Date: 01/31/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for document alteration based on native text analysis and optical character recognition (OCR), the system comprising:

  • at least one processor to;

    analyze native text obtained from a native document to identify a text entity in the native document;

    use a native application interface to convert the native document to a document image, wherein the native application interface is determined based on a document type of the native document;

    perform OCR on the document image to identify a text location of the text entity, wherein the identifying of the text location of the text entity comprises;

    recognizing a plurality of words in the document image,matching a given word of the plurality of words recognized in the document image with the text entity identified by the analyzing of the native text obtained from the native document, wherein the matching comprises matching variations of a root portion of the text entity to the given word of the plurality of words,generating a plurality of bounding coordinates for each of the plurality of words, wherein the plurality of bounding coordinates describe a bounding rectangle of a plurality of bounding rectangles that surrounds the given word of the plurality of words, andusing the boundary rectangle that surrounds the given word to identify the text location of the text entity; and

    generate a redaction box at the text location in the document image to conceal the text entity.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×