INGESTION PLANNING FOR COMPLEX TABLES
First Claim
1. A computer program product for generating a plan for document processing, the computer program product comprising:
- one or more computer-readable storage media and program instructions stored on the one or more computer-readable storage media, the program instructions comprising;
program instructions to receive a plurality of electronic documents from a data store, by a computer using a network;
program instructions to analyze the plurality of electronic documents, using the computer to identify a plurality of tabular data by performing a search for one or more table markers, based on the analyzed plurality of electronic documents;
program instructions to identify textual data within the identified tabular data, by performing a first natural language search of the analyzed plurality of electronic documents;
program instructions to generate textual hints, based on the identified textual data within the identified tabular data by associating identified textual data into a set using a second natural language search;
program instructions to map the generated textual hints to a lookup set;
program instructions to identify references, wherein references are based on mapped textual hints with associated identified textual data in the received plurality of electronic documents;
program instructions to determine a count of identified references;
program instructions to calculate a priority score based on the count of identified references, wherein the program instructions to calculate further comprises program instructions to multiply the count of identified references by a predetermined scale value;
in response to program instructions to receive a priority score modifying value, wherein the priority score modifying value is a numerical value, program instructions to calculate a modified priority score, wherein the program instructions to calculate further comprises program instructions to multiply the priority score by the received priority score modifying value;
program instructions to generate one or more ingestion plans based on the modified priority score; and
program instructions to communicate the one or more generated ingestion plans by the computer using the network.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present invention disclose a method, computer program product, and system for generating a plan for document processing. A plurality of electronic documents are received from a data store. The plurality of electronic documents are analyzed. Textual data within the identified tabular data are identified, by performing a first natural language search of the analyzed plurality of electronic documents. Textual hints are generated, where the generated textual hints are mapped to a lookup set. References are identified, and a count of identified references are determined. A priority score is calculating based on the count of identified references. In response to receiving a priority score modifying value, a modified priority score is calculated. Ingestion plans are generated based on the modified priority score. Generated ingestion plans are communicated by the computer using the network.
15 Citations
1 Claim
-
1. A computer program product for generating a plan for document processing, the computer program product comprising:
one or more computer-readable storage media and program instructions stored on the one or more computer-readable storage media, the program instructions comprising; program instructions to receive a plurality of electronic documents from a data store, by a computer using a network; program instructions to analyze the plurality of electronic documents, using the computer to identify a plurality of tabular data by performing a search for one or more table markers, based on the analyzed plurality of electronic documents; program instructions to identify textual data within the identified tabular data, by performing a first natural language search of the analyzed plurality of electronic documents; program instructions to generate textual hints, based on the identified textual data within the identified tabular data by associating identified textual data into a set using a second natural language search; program instructions to map the generated textual hints to a lookup set; program instructions to identify references, wherein references are based on mapped textual hints with associated identified textual data in the received plurality of electronic documents; program instructions to determine a count of identified references; program instructions to calculate a priority score based on the count of identified references, wherein the program instructions to calculate further comprises program instructions to multiply the count of identified references by a predetermined scale value; in response to program instructions to receive a priority score modifying value, wherein the priority score modifying value is a numerical value, program instructions to calculate a modified priority score, wherein the program instructions to calculate further comprises program instructions to multiply the priority score by the received priority score modifying value; program instructions to generate one or more ingestion plans based on the modified priority score; and program instructions to communicate the one or more generated ingestion plans by the computer using the network.
Specification