Document Alteration Based on Native Text Analysis and OCR
First Claim
1. A system for document alteration based on native text analysis and optical character recognition (OCR), the system comprising:
- a processor to;
analyze native text obtained from a native document to identify a text entity in the native document;
use a native application interface to convert the native document to a document image, wherein the native application interface is determined based on a document type of the native document;
perform OCR on the document image to identify a text location of the text entity; and
generate a redaction box at the text location in the document image to conceal the text entity.
3 Assignments
0 Petitions
Accused Products
Abstract
Example embodiments relate to document alteration based on native text analysis and optical character recognition (OCR). In example embodiments, a system analyzes native text obtained from a native document to identify a text entity in the native document. At this stage, the system may use a native application interface to convert the native document to a document image and perform OCR on the document image to identify a text location of the text entity. The system may then generate an alteration box (e.g., redaction box, highlight box) at the text location in the document image to alter a presentation of the text entity.
-
Citations
15 Claims
-
1. A system for document alteration based on native text analysis and optical character recognition (OCR), the system comprising:
a processor to; analyze native text obtained from a native document to identify a text entity in the native document; use a native application interface to convert the native document to a document image, wherein the native application interface is determined based on a document type of the native document; perform OCR on the document image to identify a text location of the text entity; and generate a redaction box at the text location in the document image to conceal the text entity. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A method for document alteration based on native text analysis and optical character recognition (OCR) on a computing device, the method comprising:
-
performing, by the computing device, named-entity recognition on native text from a native document to categorize a text entity of the native text in a predefined text category, wherein the text entity is designated for redaction based on the predefined text category; using a native application interface to convert the native document to a document image, wherein the native application interface is determined based on a document type of the native document; performing OCR on the document image to identify a text location of the text entity; and generating a redaction box at the text location in the document image to conceal the text entity. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A non-transitory machine-readable storage medium encoded with instructions executable by a processor, the machine-readable storage medium comprising:
-
instructions for performing named-entity recognition on native text from a native document to categorize each of a plurality of text entities in one of a plurality of predefined text categories; instructions for using a native application interface to convert the native document to a document image, wherein the native application interface is determined based on a document type of the native document; instructions for performing OCR on the document image to identify a plurality of text locations for the plurality of text entities; and instructions for generating redaction boxes at the plurality of text locations in the document image to conceal the plurality of text entities. - View Dependent Claims (13, 14, 15)
-
Specification