Template-free extraction of data from documents
First Claim
1. A computer-implemented method for processing data, comprising:
- obtaining text from a document associated with a user, wherein the document was generated based on a template;
with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term;
applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms;
extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and
enabling use of the term with an application.
1 Assignment
0 Petitions
Accused Products
Abstract
The disclosed embodiments provide a system that processes data. One example embodiment is a computer-implemented method for processing data. The computer-implemented method includes obtaining text from a document associated with a user, wherein the document was generated based on a template and, with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term. The computer-implemented method further includes applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms, extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates and enabling use of the term with an application.
22 Citations
20 Claims
-
1. A computer-implemented method for processing data, comprising:
-
obtaining text from a document associated with a user, wherein the document was generated based on a template; with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term; applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms; extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and enabling use of the term with an application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for processing data, comprising:
-
a memory; a processor; and a non-transitory computer-readable storage medium storing instructions that, when executed on the processor, cause the processor to instantiate; a document-processing apparatus configured to obtain text from a document associated with a user, wherein the document was generated based on a template; an extraction apparatus configured to; with the obtained text intact, apply a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term; apply an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms; extract a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and enable use of the term with an application; a management apparatus configured to enable use of the term with an application. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for processing data, the method comprising:
-
obtaining text from a document associated with a user, wherein the document was generated based on a template; with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term; applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms; extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and enabling use of the term with an application. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification