Document imaging and indexing system
First Claim
1. A method of retrieving a digitized image file from a document storage system, the document storage system comprising a plurality of digitized image files and a plurality of text files, wherein each digitized image file is associated with a text file of text characters and comprises a digitized image of a printed document comprising the text characters in printed form, the method comprising:
- searching by a computer processor the plurality of text files to identify a plurality of first text files comprising at least one of a plurality of separate first text strings, wherein the plurality of first text files was generated using optical character recognition of the plurality of digitized image files, wherein searching the plurality of text files to identify the plurality of first text files comprising the at least one of the plurality of first text string comprises;
specifying the plurality of separate first text strings; and
searching the plurality of text files in a batch to identify which of the plurality of first text files has any of the plurality of separate first text strings;
searching by a computer processor the plurality of first text files to identify a second text file comprising a second text string, wherein the second text file is associated with a second digitized image file, wherein the second text file was generated using optical character recognition of the second digitized image file; and
providing access to the second digitized image file by enabling a user to select the second digitized image file for display.
1 Assignment
0 Petitions
Accused Products
Abstract
A document digitizing method digitizes and automatically indexes documents in printed form. The method includes optically scanning the document, forming and storing a digitized image file from the optically scanned document, optically recognizing characters in the optically scanned document, and forming and storing a text file of the optically recognized characters in document. A retrieval method for retrieving the digitized image file for a document includes searching the text files to identify any having a selected text string and providing access to the digitized image files that correspond to those text files. The digital image file and the text file together represent a digitized document data structure that combines a digital image of a document with a text file of optically recognized characters in the digital image.
19 Citations
17 Claims
-
1. A method of retrieving a digitized image file from a document storage system, the document storage system comprising a plurality of digitized image files and a plurality of text files, wherein each digitized image file is associated with a text file of text characters and comprises a digitized image of a printed document comprising the text characters in printed form, the method comprising:
-
searching by a computer processor the plurality of text files to identify a plurality of first text files comprising at least one of a plurality of separate first text strings, wherein the plurality of first text files was generated using optical character recognition of the plurality of digitized image files, wherein searching the plurality of text files to identify the plurality of first text files comprising the at least one of the plurality of first text string comprises; specifying the plurality of separate first text strings; and searching the plurality of text files in a batch to identify which of the plurality of first text files has any of the plurality of separate first text strings; searching by a computer processor the plurality of first text files to identify a second text file comprising a second text string, wherein the second text file is associated with a second digitized image file, wherein the second text file was generated using optical character recognition of the second digitized image file; and providing access to the second digitized image file by enabling a user to select the second digitized image file for display. - View Dependent Claims (2, 3, 4, 5, 8, 9)
-
-
6. A document retrieval system, comprising:
-
a plurality of digitized image files and a plurality of text files, wherein each digitized image file is associated with a text file of text characters and comprises a digitized image of a printed document comprising the text characters in printed form; means for searching the plurality of text files to identify a plurality of first text files comprising at least one of a plurality of separate first text strings, wherein the plurality of first text files was generated using optical character recognition of the plurality of digitized image files, wherein searching the plurality of text files to identify the plurality of first text files comprising the at least one of a plurality of separate first text strings comprises; specifying the plurality of separate first text strings; and searching the plurality of text files in a batch to identify which of the plurality of first text files has any of the plurality of separate first text strings; means for searching the plurality of first text files to identify a second text file comprising a second text string, wherein the second text file is associated with a second digitized image file, wherein the second text file was generated using optical character recognition of the second digitized image file; and means for providing access to the second digitized image file. - View Dependent Claims (7, 15, 16, 17)
-
-
10. A computer-readable medium that is not a signal or a transmission carrier wave, the computer-readable medium having computer-executable instructions for performing steps, comprising:
-
searching, in a document storage system, a plurality of text files to identify at least one of a plurality of first text files that has at least one of a plurality of separate first text strings, wherein the plurality of first text files was generated using optical character recognition of a plurality of digitized image files, wherein searching the plurality of text files to identify the at least one of the plurality of first text files that has the at least one of a plurality of separate first text strings comprises; specifying the plurality of separate first text strings; and searching the plurality of text files in a batch to identify which of the plurality of first text files has any of the plurality of separate first text strings; searching the plurality of first text files to identify a second text file comprising a second text string, wherein the second text file is associated with a second digitized image file, wherein the second text file was generated using optical character recognition of the second digitized image file; and providing access to the second digitized image file, wherein the document storage system comprises the plurality of text files and the plurality of digitized image files, wherein each digitized image file of the plurality of digitized image files is associated with a text file of the plurality of text files, and wherein each text file of the plurality of text files comprises text characters and each digitized image file of the plurality of digitized image files comprises a digitized image of a printed document comprising the text characters in printed form. - View Dependent Claims (11, 12, 13, 14)
-
Specification