Ingestion planning for complex tables
First Claim
1. A computer-implemented method for generating a plan for document processing, the method comprising:
- receiving a plurality of electronic documents from a data store, by a computer using a network;
analyzing the plurality of electronic documents, using the computer to identify a plurality of tabular data by performing a search for one or more table markers, based on the analyzed plurality of electronic documents;
identifying textual data within the identified tabular data, by performing a first natural language search of the analyzed plurality of electronic documents;
generating textual hints, based on the identified textual data within the identified tabular data by associating identified textual data into a set using a second natural language search;
mapping the generated textual hints to a lookup set;
identifying references, wherein references are based on mapped textual hints with associated identified textual data in the received plurality of electronic documents;
determining a count of identified references;
calculating a priority score based on the count of identified references, wherein the calculating further comprises multiplying the count of identified references by a predetermined scale value;
in response to receiving a priority score modifying value, wherein the priority score modifying value is a numerical value, calculating a modified priority score, wherein the calculating further comprises multiplying the priority score by the received priority score modifying value;
generating one or more ingestion plans, the one or more ingestion plans comprising an ordered list of the identified references and associated tabular data, where ordering of the ordered list is based at least in part on the modified priority score; and
communicating the one or more generated ingestion plans by the computer using the network.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present invention disclose a method, computer program product, and system for generating a plan for document processing. A plurality of electronic documents are received from a data store. The plurality of electronic documents are analyzed. Textual data within the identified tabular data are identified, by performing a first natural language search of the analyzed plurality of electronic documents. Textual hints are generated, where the generated textual hints are mapped to a lookup set. References are identified, and a count of identified references are determined. A priority score is calculating based on the count of identified references. In response to receiving a priority score modifying value, a modified priority score is calculated. Ingestion plans are generated based on the modified priority score. Generated ingestion plans are communicated by the computer using the network.
44 Citations
1 Claim
-
1. A computer-implemented method for generating a plan for document processing, the method comprising:
-
receiving a plurality of electronic documents from a data store, by a computer using a network; analyzing the plurality of electronic documents, using the computer to identify a plurality of tabular data by performing a search for one or more table markers, based on the analyzed plurality of electronic documents; identifying textual data within the identified tabular data, by performing a first natural language search of the analyzed plurality of electronic documents; generating textual hints, based on the identified textual data within the identified tabular data by associating identified textual data into a set using a second natural language search; mapping the generated textual hints to a lookup set; identifying references, wherein references are based on mapped textual hints with associated identified textual data in the received plurality of electronic documents; determining a count of identified references; calculating a priority score based on the count of identified references, wherein the calculating further comprises multiplying the count of identified references by a predetermined scale value; in response to receiving a priority score modifying value, wherein the priority score modifying value is a numerical value, calculating a modified priority score, wherein the calculating further comprises multiplying the priority score by the received priority score modifying value; generating one or more ingestion plans, the one or more ingestion plans comprising an ordered list of the identified references and associated tabular data, where ordering of the ordered list is based at least in part on the modified priority score; and communicating the one or more generated ingestion plans by the computer using the network.
-
Specification