Auto-classification of PDF forms by dynamically defining a taxonomy and vocabulary from PDF form fields
First Claim
Patent Images
1. A computer-implemented method comprising:
- associating form fields of a Portable Document Format (PDF) file with a markup language schema, the markup language schema specifying semantic constraints on attributes of the form fields within the PDF file, the form fields for receiving data;
creating a content folder representing a specific classification, the specific classification based on attributes of form fields from the PDF file;
receiving a selection of a subset of the form fields from the PDF file;
associating the selection of the subset of the form fields with the content folder including creating metadata describing the selected form fields, the content folder configured for storing corresponding individual data entries received within the selected form fields of PDF files;
extracting data from form fields of submitted PDF files, the submitted PDF files having data input into form fields associated with the content folder, the extracted data and metadata describing the form fields stored separately from the submitted PDF files; and
automatically classifying the submitted PDF files based on attributes of the selected form fields and the extracted data.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments herein include a content manager that constructs vocabulary from the PDF form fields to classify documents. The content manager can associate a PDF form with a markup language schema (such as an XML Schema) so that PDF form fields are semantically bounded with XML schema elements. The XML schema elements can define semantics of form fields and specify other constraints on XML elements and attributes. The content manager then associates selected form fields from the PDF form with a content folder to construct a set of properties to apply to inbound PDF form data to classify documents.
40 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
associating form fields of a Portable Document Format (PDF) file with a markup language schema, the markup language schema specifying semantic constraints on attributes of the form fields within the PDF file, the form fields for receiving data; creating a content folder representing a specific classification, the specific classification based on attributes of form fields from the PDF file; receiving a selection of a subset of the form fields from the PDF file; associating the selection of the subset of the form fields with the content folder including creating metadata describing the selected form fields, the content folder configured for storing corresponding individual data entries received within the selected form fields of PDF files; extracting data from form fields of submitted PDF files, the submitted PDF files having data input into form fields associated with the content folder, the extracted data and metadata describing the form fields stored separately from the submitted PDF files; and automatically classifying the submitted PDF files based on attributes of the selected form fields and the extracted data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product including a computer-storage medium having instructions stored thereon for processing data information, such that the instructions, when carried out by a processing device, cause the processing device to perform the operations of:
-
associating form fields of a Portable Document Format (PDF) file with a markup language schema, the markup language schema specifying semantic constraints on attributes of the form fields within the PDF file, the form fields for receiving data via a graphical user interface; creating a content folder representing a specific classification, the specific classification based on attributes of form fields from the PDF file; receiving a selection of a subset of the form fields from the PDF file; associating the selection of the subset of the form fields with the content folder including creating metadata describing the selected form fields, the content folder configured for storing corresponding individual data entries received within the selected form fields of PDF files; extracting data from form fields of submitted PDF files, the submitted PDF files having data input into form fields associated with the content folder, the extracted data and metadata describing the form fields stored separately from the submitted PDF files; and automatically classifying the submitted PDF files based on attributes of the selected form fields and the extracted data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer system comprising:
-
a processor; a memory coupled to the processor, the memory storing instructions that when executed by the processor cause the system to perform the operations of; generating a Portable Document Format (PDF) file having form fields for receiving data via a graphical user interface; associating the form fields with a markup language schema, the markup language schema specifying semantic constraints on attributes of the form fields within the PDF file; creating a content folder representing a specific classification, the specific classification based on attributes of form fields from the PDF file; receiving a selection of a subset of the form fields from the PDF file; associating the selection of the subset of the form fields with the content folder including creating metadata describing the selected form fields, the content folder configured for storing corresponding individual data entries received within the selected form fields of PDF files; extracting data from form fields of submitted PDF files, the submitted PDF files having data input into form fields associated with the content folder, the extracted data and metadata describing the form fields stored separately from the submitted PDF files; and automatically classifying the submitted PDF files based on attributes of the selected form fields and the extracted data. - View Dependent Claims (20)
-
Specification