Template-free extraction of data from documents

US 10,366,123 B1
Filed: 04/30/2018
Issued: 07/30/2019
Est. Priority Date: 08/06/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing data, comprising:

obtaining text from a document associated with a user, wherein the document was generated based on a template;

with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term;

applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms;

extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and

enabling use of the term with an application.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments provide a system that processes data. One example embodiment is a computer-implemented method for processing data. The computer-implemented method includes obtaining text from a document associated with a user, wherein the document was generated based on a template and, with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term. The computer-implemented method further includes applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms, extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates and enabling use of the term with an application.

22 Citations

20 Claims

1. A computer-implemented method for processing data, comprising:
- obtaining text from a document associated with a user, wherein the document was generated based on a template;
  
  with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term;
  
  applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms;
  
  extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and
  
  enabling use of the term with an application.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, further comprising:
    - applying a set of rules to each term in the obtained text to determine a context associated with the term;
      
      obtaining a modification to the determined context for one of the extracted terms from the user; and
      
      using the modification to update the set of rules.
  - 3. The computer-implemented method of claim 2, wherein obtaining the modification to the determined context for the one of the extracted terms from the user involves:
    - obtaining an updated location in the document of the one of the extracted terms.
  - 4. The computer-implemented method of claim 2, wherein applying the set of rules to each term in the obtained text to determine the context associated with the term involves:
    - categorizing the term based on at least one of a character type and a character sequence in the term; and
      
      determining the context based on the categorized term and a categorization of one or more terms in proximity to the term.
  - 5. The computer-implemented method of claim 4, wherein applying the set of rules to each term in the obtained text to determine the context associated with the term further involves:
    - determining the context based on a location of the term in the document.
  - 6. The computer-implemented method of claim 4, wherein the character type is at least one of:
    - a numeric character type;
      
      an alphabetic character type;
      
      an alphanumeric character type; and
      
      a special character type.
  - 7. The computer-implemented method of claim 2, further comprising:
    - storing each extracted term in one of a plurality of data elements according to the determined context; and
      
      creating, for each data element, one or more tags representing the context.
  - 8. The computer-implemented method of claim 7, wherein enabling use of each data element with the one or more applications without requiring manual input of the extracted terms into the one or more applications involves:
    - obtaining, from an application, a request for data associated with a tag from the one or more tags;
      
      matching the tag to one of the data elements; and
      
      providing the one of the data elements to the application.

9. A system for processing data, comprising:
- a memory;
  
  a processor; and
  
  a non-transitory computer-readable storage medium storing instructions that, when executed on the processor, cause the processor to instantiate;
  
  a document-processing apparatus configured to obtain text from a document associated with a user, wherein the document was generated based on a template;
  
  an extraction apparatus configured to;
  
  with the obtained text intact, apply a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term;
  
  apply an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms;
  
  extract a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and
  
  enable use of the term with an application;
  
  a management apparatus configured to enable use of the term with an application.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, wherein the extraction apparatus is further configured to:
    - apply a set of rules to each term in the obtained text to determine a context associated with the term;
      
      obtain a modification to the determined context for one of the extracted terms from the user; and
      
      use the modification to update the set of rules.
  - 11. The system of claim 10, wherein applying the set of rules to each term in the obtained text to determine the context associated with the term involves:
    - categorizing the term based on at least one of a character type and a character sequence in the term; and
      
      determining the context based on at least one of the categorized term, a categorization of one or more terms in proximity to the term, and a location of the term in the document.
  - 12. The system of claim 11, wherein the character type is at least one of:
    - a numeric character type;
      
      an alphabetic character type;
      
      an alphanumeric character type; and
      
      a special character type.
  - 13. The system of claim 10, wherein the extraction apparatus is further configured to:
    - store each extracted term in one of a plurality of data elements according to the determined context; and
      
      create, for each data element, one or more tags representing the context.
  - 14. The system of claim 13, wherein enabling use of each data element with the one or more applications without requiring manual input of the extracted terms into the one or more applications involves:
    - obtaining, from an application, a request for data associated with a tag from the one or more tags;
      
      matching the tag to one of the data elements; and
      
      providing the one of the data elements to the application.

15. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for processing data, the method comprising:
- obtaining text from a document associated with a user, wherein the document was generated based on a template;
  
  with the obtained text intact, applying a set of rules to each term in the obtained text to determine a broad category of a plurality of terms associated with the term;
  
  applying an additional set of rules to refine the broad category associated with the term to a refined category of fewer terms based on a location in the document of at least one term in the broad category of the plurality of terms;
  
  extracting a term from the obtained text using template-independent code developed to process documents generated based on a plurality of templates; and
  
  enabling use of the term with an application.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable storage medium of claim 15, the method further comprising:
    - applying a set of rules to each term in the obtained text to determine a context associated with the term;
      
      obtaining a modification to the determined context for one of the extracted terms from the user; and
      
      using the modification to update the set of rules.
  - 17. The non-transitory computer-readable storage medium of claim 16, wherein applying the set of rules to each term in the obtained text to determine the context associated with the term involves:
    - categorizing the term based on at least one of a character type and a character sequence in the term; and
      
      determining the context based on at least one of the categorized term, a categorization of one or more terms in proximity to the term, and a location of the term in the document.
  - 18. The non-transitory computer-readable storage medium of claim 17, wherein the character type is at least one of:
    - a numeric character type;
      
      an alphabetic character type;
      
      an alphanumeric character type; and
      
      a special character type.
  - 19. The non-transitory computer-readable storage medium of claim 16, the method further comprising:
    - storing each extracted term in one of a plurality of data elements according to the determined context; and
      
      creating, for each data element, one or more tags representing the context.
  - 20. The non-transitory computer-readable storage medium of claim 19, wherein enabling use of each data element with the one or more applications without requiring manual input of the extracted terms into the one or more applications involves:
    - obtaining, from an application, a request for data associated with a tag from the one or more tags;
      
      matching the tag to one of the data elements; and
      
      providing the one of the data elements to the application.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intuit, Inc.
Original Assignee
Intuit, Inc.
Inventors
Madhani, Sunil H., Sreepathy, Anu, Kakkar, Samir Revti
Primary Examiner(s)
Bibbee, Jared M

Application Number

US15/967,375
Time in Patent Office

456 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/313 Selection or weighting of t...

G06F 16/90 Details of database functio...

Template-free extraction of data from documents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Template-free extraction of data from documents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links