Method and system for accurately detecting, extracting and representing redacted text blocks in a document
First Claim
Patent Images
1. A computer-implemented method for automatically detecting redaction blocks in a document comprising:
- receiving, by an information handling system comprising a processor and a memory, the document as an image file;
analyzing, by the information handling system, the document to identify any redaction block areas in the document;
detecting, by the information handling system, location information for each redaction block area identified in the document;
applying, by the information handling system, optical character recognition to the document to detect text fragments in the document;
detecting, by the information handling system, location information for each text fragment identified in the document; and
mapping, by the information handling system, each redaction block area to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document,wherein the redaction block areas are redacted block areas.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method, system and a computer program product are provided for automatically detecting redaction blocks in an image file document by analyzing the document to identify any redaction block areas and then detecting location information for each redaction block area identified in the document which may be mapped to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document.
14 Citations
17 Claims
-
1. A computer-implemented method for automatically detecting redaction blocks in a document comprising:
-
receiving, by an information handling system comprising a processor and a memory, the document as an image file; analyzing, by the information handling system, the document to identify any redaction block areas in the document; detecting, by the information handling system, location information for each redaction block area identified in the document; applying, by the information handling system, optical character recognition to the document to detect text fragments in the document; detecting, by the information handling system, location information for each text fragment identified in the document; and mapping, by the information handling system, each redaction block area to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document, wherein the redaction block areas are redacted block areas. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to automatically detect redaction blocks in a document, wherein the set of instructions are executable to perform actions of; receiving, by the system, the document as an image file; analyzing, by the system, the document to identify any redaction block areas in the document; detecting, by the system, location information for each redaction block area identified in the document; applying, by the system, optical character recognition to the document to detect text fragments in the document; detecting, by the system, location information for each text fragment identified in the document; and mapping, by the system, each redaction block area to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document, wherein the redaction block areas are redacted block areas. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product stored in a computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the system to automatically detecting redaction blocks in a document by performing actions comprising:
-
receiving, by the system, the document as an image file; analyzing, by the information handling system, the document to identify any redaction blocks in the document, wherein each redaction block is a redacted block; detecting, by the information handling system, location information for each redaction block identified in the document; applying, by the information handling system, optical character recognition to the document to detect text fragments in the document; detecting, by the information handling system, location information for each text fragment identified in the document; mapping, by the information handling system, each redaction block to any associated text fragments in the document based on the location information for each redaction block and text fragment in the document; classifying, by the system, each identified redaction block as a redaction block type Ti selected from a group consisting of a text block, a table cell, a checkbox, and unknown; and generating, by the system, an output file which identifies, for the document, each text fragment and associated text fragment location information, along with each redaction block and associated redaction block fragment location information and redaction block type Ti. - View Dependent Claims (16, 17)
-
Specification