FAST IDENTIFICATION OF TEXT INTENSIVE PAGES FROM PHOTOGRAPHS

US 20190318163A1
Filed: 06/27/2019
Published: 10/17/2019
Est. Priority Date: 09/23/2015
Status: Active Grant

First Claim

Patent Images

1. A method of training a neural network to distinguish between text documents and image documents, comprising:

obtaining a corpus of text and image documents;

for at least one text document of the corpus of text and image documents;

scanning at least one page of the at least one text document by shifting a text window to a plurality of locations on the at least one page of the at least one text document;

determining whether text in the window at a respective location of the plurality of locations meets text line criteria; and

in accordance with a determination that the text in the window at the respective location of the plurality of locations meets text line criteria, storing the text in the window as a respective text snippet; and

for at least one image document of the corpus of text and image documents;

superimposing a plurality of image windows over at least one page of the at least one image document;

determining whether the content of a respective image window meets image criteria; and

in accordance with a determination that the content of the respective image window meets the image criteria, storing the content of the respective image window as a respective image snippet; and

providing the respective text snippet and the respective image snippet to a classifier.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the window is stored as a respective text snippet. A plurality of image windows are superimposed over at least one page of an image document. In accordance with a determination that the content of a respective image window meets image criteria, content of the image window is stored as a respective image snippet. The respective text snippet and the respective image snippet are provided to a classifier.

2 Citations

20 Claims

1. A method of training a neural network to distinguish between text documents and image documents, comprising:
- obtaining a corpus of text and image documents;
  
  for at least one text document of the corpus of text and image documents;
  
  scanning at least one page of the at least one text document by shifting a text window to a plurality of locations on the at least one page of the at least one text document;
  
  determining whether text in the window at a respective location of the plurality of locations meets text line criteria; and
  
  in accordance with a determination that the text in the window at the respective location of the plurality of locations meets text line criteria, storing the text in the window as a respective text snippet; and
  
  for at least one image document of the corpus of text and image documents;
  
  superimposing a plurality of image windows over at least one page of the at least one image document;
  
  determining whether the content of a respective image window meets image criteria; and
  
  in accordance with a determination that the content of the respective image window meets the image criteria, storing the content of the respective image window as a respective image snippet; and
  
  providing the respective text snippet and the respective image snippet to a classifier.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the text line criteria include first criteria that are met in accordance with a determination that a number of lines of text in the text in the window at the respective location of the plurality of locations is greater than a first predetermined number of lines of text.
  - 3. The method of claim 2, wherein the text line criteria include second criteria that are met in accordance with a determination that a number of lines of text in the text in the window at the respective location of the plurality of locations is fewer than a second predetermined number of lines of text.
  - 4. The method of claim 3, wherein the first predetermined number of lines of text is two and the second predetermined number of lines of text is four.
  - 5. The method of claim 1, wherein the image criteria include criteria that are met in accordance with a determination that a portion of the content of the respective image window occupied by non-background content exceeds a predetermined threshold amount of non-background content.
  - 6. The method of claim 1, wherein a first image window of the plurality of image windows has a first size that is different from a second size of a second image window of the plurality of image windows.
  - 7. The method of claim 1, including normalizing the size of a plurality of text snippets that include the respective text snippet.
  - 8. The method of claim 7, wherein normalizing the size of the plurality of text snippets includes converting each of the text snippets to a 32×
    - 32 pixel resolution.
  - 9. The method of claim 7, including normalizing the size of a plurality of image snippets that include the respective image snippet.
  - 10. The method of claim 9, wherein:
    - the plurality of text snippets and the plurality of image snippets are added to a collection of training material; and
      
      the classifier is trained based on the collection of training material.
  - 11. The method of claim 1, wherein the classifier is a Modified National Institute of Standards and Technology (MNIST) style Neural Network.

12. A computing device for training a neural network to distinguish between text documents and image documents, comprising:
- one or more processors;
  
  memory;
  
  a display; and
  
  one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for;
  
  obtaining a corpus of text and image documents;
  
  for at least one text document of the corpus of text and image documents;
  
  scanning at least one page of the at least one text document by shifting a text window to a plurality of locations on the at least one page of the at least one text document;
  
  determining whether text in the window at a respective location of the plurality of locations meets text line criteria; and
  
  in accordance with a determination that the text in the window at the respective location of the plurality of locations meets text line criteria, storing the text in the window as a respective text snippet; and
  
  for at least one image document of the corpus of text and image documents;
  
  superimposing a plurality of image windows over at least one page of the at least one image document;
  
  determining whether the content of a respective image window meets image criteria; and
  
  in accordance with a determination that the content of the respective image window meets the image criteria, storing the content of the respective image window as a respective image snippet; and
  
  providing the respective text snippet and the respective image snippet to a classifier.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The computing device of claim 12, wherein the text line criteria include first criteria that are met in accordance with a determination that a number of lines of text in the text in the window at the respective location of the plurality of locations is greater than a first predetermined number of lines of text.
  - 14. The computing device of claim 13, wherein the text line criteria include second criteria that are met in accordance with a determination that a number of lines of text in the text in the window at the respective location of the plurality of locations is fewer than a second predetermined number of lines of text.
  - 15. The computing device of claim 14, wherein the first predetermined number of lines of text is two and the second predetermined number of lines of text is four.
  - 16. The computing device of claim 12, wherein the image criteria include criteria that are met in accordance with a determination that a portion of the content of the respective image window occupied by non-background content exceeds a predetermined threshold amount of non-background content.
  - 17. The computing device of claim 12, wherein a first image window of the plurality of image windows has a first size that is different from a second size of a second image window of the plurality of image windows.
  - 18. The computing device of claim 12, including normalizing the size of a plurality of text snippets that include the respective text snippet.
  - 19. The computing device of claim 18, wherein normalizing the size of the plurality of text snippets includes converting each of the text snippets to a 32×
    - 32 pixel resolution.

20. A non-transitory computer readable medium containing software that trains a neural network to distinguish between text documents and image documents using a corpus of text and image documents, the software comprising executable code that:
- obtains a corpus of text and image documents;
  
  for at least one text document of the corpus of text and image documents;
  
  scans at least one page of the at least one text document by shifting a text window to a plurality of locations on the at least one page of the at least one text document;
  
  determines whether text in the window at a respective location of the plurality of locations meets text line criteria; and
  
  in accordance with a determination that the text in the window at the respective location of the plurality of locations meets text line criteria, stores the text in the window as a respective text snippet; and
  
  for at least one image document of the corpus of text and image documents;
  
  superimposes a plurality of image windows over at least one page of the at least one image document;
  
  determines whether the content of a respective image window meets image criteria; and
  
  in accordance with a determination that the content of the respective image window meets the image criteria, stores the content of the respective image window as a respective image snippet; and
  
  provides the respective text snippet and the respective image snippet to a classifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bending Spoons SpA
Original Assignee
Evernote Corp. (Bending Spoons SpA)
Inventors
Pashintsev, Alexander, Gorbatov, Boris, Livshitz, Eugene, Glazkov, Vitaly

Granted Patent

US 11,195,003 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06T 3/40   Scaling of whole images or ...

G06T 7/60   Analysis of geometric attri...

G06V 30/413   Classification of content, ...

G06V 30/414   Extracting the geometrical ...

FAST IDENTIFICATION OF TEXT INTENSIVE PAGES FROM PHOTOGRAPHS

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

2 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

FAST IDENTIFICATION OF TEXT INTENSIVE PAGES FROM PHOTOGRAPHS

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

2 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others