Method and system for extraction

US 8,321,357 B2
Filed: 09/30/2009
Issued: 11/27/2012
Est. Priority Date: 09/30/2009
Status: Active Grant

First Claim

Patent Images

1. A method for extracting information from at least one document in at least one set of documents, the method comprising:

generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score;

determining, using at least one features processor, negative features and positive features based on N-gram statistics;

determining, using at least one negative features processor, whether negative features apply to the at least one possible match;

deleting, using at least one deleting processor, any possible match to which the negative feature applies from the at least one possible match list;

determining, using at least one positive features processor, whether any of the possible matches are positive features; and

re-ordering, using at least one re-ordering processor, the possible matches in the at least one possible match list based on the information learned from determining whether any of the possible matches are positive features.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for extracting information from at least one document in at least one set of documents, the method comprising: generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score.

92 Citations

View as Search Results

39 Claims

1. A method for extracting information from at least one document in at least one set of documents, the method comprising:
- generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score;
  
  determining, using at least one features processor, negative features and positive features based on N-gram statistics;
  
  determining, using at least one negative features processor, whether negative features apply to the at least one possible match;
  
  deleting, using at least one deleting processor, any possible match to which the negative feature applies from the at least one possible match list;
  
  determining, using at least one positive features processor, whether any of the possible matches are positive features; and
  
  re-ordering, using at least one re-ordering processor, the possible matches in the at least one possible match list based on the information learned from determining whether any of the possible matches are positive features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the at least one attribute score and the at least one localization score are based on:
    - spatial feature criteria;
      
      contextual feature criteria;
      
      relational feature criteria;
      
      orderived feature criteria;
      
      orany combination thereof.
  - 3. The method of claim 2, wherein the spatial feature criteria is used to determine areas where the at least one target entry is most likely to be found.
  - 4. The method of claim 2, wherein the contextual feature criteria weighs information about at least one possible target entry in the neighborhood of the at least one target entry.
  - 5. The method of claim 2, wherein the relational feature criteria is used to determine at least one area where and within which the at least one target entry is likely to be found.
  - 6. The method of claim 2, wherein the derived feature criteria is generated by mathematical transformations between any combination of the spatial feature criteria, the contextual feature criteria, and the relational feature criteria.
  - 7. The method of claim 1, wherein at least one processor can comprise:
    - the at least one ranking and/or matching processor, the at least one features processor, the at least one negative features processor, the at least one deleting processor, the at least one positive features processor, or the at least one re-ordering processor, or any combination thereof.
  - 8. The method of claim 1, further comprising:
    - learning characteristics of the at least one set of documents from sample documents;
      
      using the learned characteristics to find similar information in the at least one set of documents.
  - 9. The method of claim 8, wherein the learned characteristics apply to at least one unknown document and/or at least one different document type.
  - 10. The method of claim 1, further comprising:
    - validating information in the at least one document to determine if the information is consistent.
  - 11. The method of claim 10, wherein the validating comprises internal validating and/or external validating.
  - 12. The method of claim 1, wherein the ranked possible match list based on the at least one attribute score and the at least one localization score takes into account information related to:
    - text features;
      
      geometric features;
      
      graphic features;
      
      feature conversion;
      
      orany combination thereof.

13. A method for extracting information from at least one document in at least one set of documents, the method comprising:
- generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The method of claim 13, wherein the at least one attribute score and the at least one localization score are based on:
    - spatial feature criteria;
      
      contextual feature criteria;
      
      relational feature criteria;
      
      orderived feature criteria;
      
      orany combination thereof.
  - 15. The method of claim 14, wherein the spatial feature criteria is used to determine areas where the at least one target entry is most likely to be found.
  - 16. The method of claim 14, wherein the contextual feature criteria weighs information about at least one possible target entry in the neighborhood of the at least one target entry.
  - 17. The method of claim 14, wherein the relational feature criteria is used to determine at least one area where and within which the at least one target entry is likely to be found.
  - 18. The method of claim 14, wherein the derived feature criteria is generated by mathematical transformations between any combination of the spatial feature criteria, the contextual feature criteria, and the relational feature criteria.
  - 19. The method of claim 13, wherein at least one processor can comprise:
    - the at least one ranking and/or matching processor, the at least one features processor, the at least one negative features processor, the at least one deleting processor, the at least one positive features processor, or the at least one re-ordering processor, or any combination thereof.
  - 20. The method of claim 13, further comprising:
    - learning characteristics of the at least one set of documents from sample documents;
      
      using the learned characteristics to find similar information in the at least one set of documents.
  - 21. The method of claim 20, wherein the learned characteristics apply to at least one unknown document and/or at least one different document type.
  - 22. The method of claim 13, further comprising:
    - validating information in the at least one document to determine if the information is consistent.
  - 23. The method of claim 22, wherein the validating comprises internal validating and/or external validating.
  - 24. The method of claim 13, wherein the ranked possible match list based on the at least one attribute score and the at least one localization score takes into account information related to:
    - text features;
      
      geometric features;
      
      graphic features;
      
      feature conversion;
      
      orany combination thereof.

25. A method for extracting information from at least one document in at least one set of documents, the method comprising:
- generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score;
  
  determining, using at least one features processor, positive features based on N-gram statistics; and
  
  re-ordering, using at least one re-ordering processor, the possible matches in the at least one possible match list based on the information learned from determining whether any of the possible matches are positive features.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 26. The method of claim 25, wherein the at least one attribute score and the at least one localization score are based on:
    - spatial feature criteria;
      
      contextual feature criteria;
      
      relational feature criteria;
      
      orderived feature criteria;
      
      orany combination thereof.
  - 27. The method of claim 26, wherein the spatial feature criteria is used to determine areas where the at least one target entry is most likely to be found.
  - 28. The method of claim 26, wherein the contextual feature criteria weighs information about at least one possible target entry in the neighborhood of the at least one target entry.
  - 29. The method of claim 26 wherein the relational feature criteria is used to determine at least one area where and within which the at least one target entry is likely to be found.
  - 30. The method of claim 26, wherein the derived feature criteria is generated by mathematical transformations between any combination of the spatial feature criteria, the contextual feature criteria, and the relational feature criteria.
  - 31. The method of claim 25, wherein at least one processor can comprise:
    - the at least one ranking and/or matching processor, the at least one features processor, or the at least one re-ordering processor, or any combination thereof.
  - 32. The method of claim 25 further comprising:
    - learning characteristics of the at least one set of documents from sample documents;
      
      using the learned characteristics to find similar information in the at least one set of documents.
  - 33. The method of claim 32, wherein the learned characteristics apply to at least one unknown document and/or at least one different document type.
  - 34. The method of claim 25, further comprising:
    - validating information in the at least one document to determine if the information is consistent.
  - 35. The method of claim 34, wherein the validating comprises internal validating and/or external validating.
  - 36. The method of claim 25, wherein the ranked possible match list based on the at least one attribute score and the at least one localization score takes into account information related to:
    - text features;
      
      geometric features;
      
      graphic features;
      
      feature conversion;
      
      orany combination thereof.

37. A computer system for extracting information from at least one document in at least one set of documents, the system comprising:
- at least one processor;
  
  wherein the at least one processor is configured to perform;
  
  generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score;
  
  determining, using at least one features processor, negative features and positive features based on N-gram statistics;
  
  determining, using at least one negative features processor, whether negative features apply to the at least one possible match;
  
  deleting, using at least one deleting processor, any possible match to which the negative feature applies from the at least one possible match list;
  
  determining, using at least one positive features processor, whether any of the possible matches are positive features; and
  
  re-ordering, using at least one re-ordering processor, the possible matches in the at least one possible match list based on the information learned from determining whether any of the possible matches are positive features.

38. A computerized system for extracting information from at least one document in at least one set of documents, the system comprising:
- at least one processor;
  
  wherein the processor is configured to perform;
  
  generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score.

39. A computerized system for extracting information from at least one document in at least one set of documents, the system comprising:
- at least one processor;
  
  wherein the processor is configured to perform;
  
  generating, using at least one ranking and/or matching processor, at least one ranked possible match list comprising at least one possible match for at least one target entry on the at least one document, the at least one ranked possible match list based on at least one attribute score and at least one localization score;
  
  determining, using at least one features processor, positive features based on N-gram statistics; and
  
  re-ordering, using at least one re-ordering processor, the possible matches in the at least one possible match list based on the information learned from determining whether any of the possible matches are positive features.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hyland Switzerland, Sarl
Original Assignee
Perceptive Software LLC (Hyland Software Incorporated)
Inventors
Urbschat, Harry, Meier, Ralph, Wanschura, Thorsten, Lapir, Gennady, Hausmann, Johannes
Primary Examiner(s)
Starks, Wilbert L

Application Number

US12/570,412
Publication Number

US 20110078098A1
Time in Patent Office

1,154 Days
Field of Search

706/12, 706/45
US Class Current

706/12
CPC Class Codes

G06F 16/35 Clustering; Classification

G06F 40/20 Natural language analysis s...

Method and system for extraction

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for extraction

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links