System and method for automated processing of electronic documents
First Claim
1. A system for automatically processing electronic documents, the system comprises:
- a memory comprising programming instructions;
a processor configured to execute the programming instructions stored in the memory and configured to;
receive an electronic document comprising at least one of;
a structured section or an unstructured section;
convert the electronic document into a textual equivalent;
scan the textual equivalent and demarcate those sections that correspond to one or more predetermined structural attributes;
separate the one or more demarcated sections from the textual equivalent and retrieve the one or more demarcated sections corresponding to the structured sections and a remaining textual equivalent corresponding to the unstructured sections as distinct inputs;
receive the one or more demarcated sections and the remaining textual equivalent as the distinct inputs;
identify one or more master triggers within the received distinct inputs;
generate one or more potential zones with the identified one or more master triggers, wherein the generated one or more potential zones is defined by at least one geometric shape formed by geometrically coupling the master triggers and co-triggers proximate to the master triggers into the geometric shape such that the master triggers and the co-triggers form one or more vertices of the geometric shape;
generate one or more rules of extraction to determine at least one extraction type from a plurality of extraction types, wherein each of the plurality of extraction types represent a particular method of extraction, based on the type of electronic document, wherein the type of electronic document is ascertainable based on identification of a template type of the electronic document associated with the demarcated section; and
capture the business relevant data contained in the generated one or more potential zones within the one or more demarcated sections and the remaining textual equivalent based on co-ordinates of the vertices of the geometric shape formed by the one or more master triggers and the co-triggers by applying the determined at least one extraction type.
1 Assignment
0 Petitions
Accused Products
Abstract
In accordance with an aspect of the present invention, a system and method for automated processing of electronic documents is provided. The said system comprising a precursor module configured to receive an electronic document and convert into a textual equivalent; a data ascertainment module configured to identify the textual snippets, more particularly, the demarcated sections, corresponding to the at least one structured section and logically separate the demarcated sections the remaining textual equivalent; and, a pass-receiving module configured to receive said logically separated demarcated sections and the remaining textual equivalent and capture business relevant data committed therein, wherein said pass-receiving module captures the business relevant data from the demarcated sections by locating at least one master trigger and at least one proximate co-trigger in the demarcated sections and geometrically coupling said located triggers and co-triggers into at least one potential zone and extracting the business relevant data committed within these potential zones.
9 Citations
18 Claims
-
1. A system for automatically processing electronic documents, the system comprises:
-
a memory comprising programming instructions; a processor configured to execute the programming instructions stored in the memory and configured to; receive an electronic document comprising at least one of;
a structured section or an unstructured section;convert the electronic document into a textual equivalent; scan the textual equivalent and demarcate those sections that correspond to one or more predetermined structural attributes; separate the one or more demarcated sections from the textual equivalent and retrieve the one or more demarcated sections corresponding to the structured sections and a remaining textual equivalent corresponding to the unstructured sections as distinct inputs; receive the one or more demarcated sections and the remaining textual equivalent as the distinct inputs; identify one or more master triggers within the received distinct inputs; generate one or more potential zones with the identified one or more master triggers, wherein the generated one or more potential zones is defined by at least one geometric shape formed by geometrically coupling the master triggers and co-triggers proximate to the master triggers into the geometric shape such that the master triggers and the co-triggers form one or more vertices of the geometric shape; generate one or more rules of extraction to determine at least one extraction type from a plurality of extraction types, wherein each of the plurality of extraction types represent a particular method of extraction, based on the type of electronic document, wherein the type of electronic document is ascertainable based on identification of a template type of the electronic document associated with the demarcated section; and capture the business relevant data contained in the generated one or more potential zones within the one or more demarcated sections and the remaining textual equivalent based on co-ordinates of the vertices of the geometric shape formed by the one or more master triggers and the co-triggers by applying the determined at least one extraction type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer implemented method for automatically processing electronic documents, the computer implemented method comprising:
-
receiving an electronic document comprising at least one of;
a structured section or unstructured section, and converting the electronic document into a textual equivalent;scanning the textual equivalent and demarcating those sections that correspond to one or more predetermined structural attributes; separating the one or more demarcated sections from the textual equivalent and retrieving the one or more demarcated sections corresponding to the structured sections and a remaining textual equivalent corresponding to the unstructured sections as distinct inputs; identifying one or more master triggers within the received distinct inputs; generating one or more potential zones within the received distinct inputs, with the identified one or more master triggers, wherein the generated one or more potential zones is defined by at least one geometric shape formed by geometrically coupling the master triggers and co-triggers proximate to the master triggers into the geometric shape such that the master triggers and the co-triggers form one or more vertices of the geometric shape; generating one or more rules of extraction to determine at least one extraction type from a plurality of extraction types, wherein each of the plurality of extraction types represent a particular method of extraction, based on the type of electronic document, wherein the type of electronic document is ascertainable based on identification of a template type of the electronic document associated with the demarcated section; and capturing the business relevant data contained in the generated one or more potential zones within the one or more demarcated sections and the remaining textual equivalent based on co-ordinates of the vertices of the geometric shape defined by the one or more master triggers and the co-triggers by applying the determined at least one extraction type. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A computer program product comprising a non-transitory computer readable medium having a computer readable program code stored thereon, the computer-readable program code comprising instructions that, when executed by a processor, cause the processor to:
-
receive an electronic document comprising at least one of;
a structured section or an unstructured section,convert the electronic document into a textual equivalent; scan the textual equivalent and demarcate those sections that correspond to one or more predetermined structural attributes; and separate the one or more demarcated sections from the textual equivalent and retrieve the one or more demarcated sections corresponding to the structured sections and a remaining textual equivalent corresponding to the unstructured sections as distinct inputs; receive the one or more demarcated sections and the remaining textual equivalent as the distinct inputs; identify one or more master triggers within the received distinct inputs; generate one or more potential zones within the received distinct inputs with the identified one or more master triggers, wherein the generated one or more potential zones is defined by at least one geometric shape formed by geometrically coupling the master triggers and co-triggers proximate to the master triggers into the geometric shape such that the master triggers and the co-triggers form one or more vertices of the geometric shape; generate one or more rules of extraction to determine at least one extraction type from a plurality of extraction types, wherein each of the plurality of extraction types represent a particular method of extraction, based on the type of electronic document, wherein the type of electronic document is ascertainable based on identification of a template type of the electronic document associated with the demarcated section; and capture the business relevant data contained in the generated one or more potential zones and the remaining textual equivalent based on co-ordinates of the vertices of the geometric shape defined by the one or more master triggers and the co-triggers by applying the determined at least one extraction type.
-
Specification