Document storage and retrieval system for storing and retrieving document image and full text data
First Claim
1. A document storage and retrieval system for storing and retrieving textual documents, comprising:
- image file means for storing textual documents which are digital image data, said textual documents including bibliographic items providing bibliographic information of said textual documents and body text data providing data of text found in bodies of said textual documents;
document recognition means, coupled to said image file means, for recognizing said textual documents, said document recognition means includes;
(a) means for extracting pattern elements forming character patterns from said digital image data,(b) a document knowledge file for storing regulations of a layout of said bibliographic items in said textual documents as document knowledge,(c) character segmentation means for extracting character patterns by analyzing said pattern elements with reference to said document knowledge in said document knowledge file, and(d) recognition means for recognizing said extracted character patterns, said recognition means outputs a recognition result including said bibliographic items and said body text data with a layout structure name corresponding to the recognition result;
data base file means, coupled to said document recognition means, for storing said bibliographic items and information as bibliographic information of said outputted recognition result with said layout structure name;
text file means, coupled to said document recognition means, for storing at least said body text data as document contents of recognized textual documents;
input means for inputting a request of a search keyword;
retrieval means, coupled to said image file means, said data base file means, said text file means and said input means, for retrieving digital image data of at least one textual document which includes said search keyword based on said stored bibliographic information and said stored body text data; and
output means, coupled to said retrieval means, for outputting said retrieved digital image data of at least one textual document.
0 Assignments
0 Petitions
Accused Products
Abstract
A document storage and retrieval system is provided with means for storing a document body in the form of image, means for storing text information in the form of a character code string for retrieval, means for executing a retrieval with reference to the text information, and means for displaying a document image relating thereto on a retrieval terminal according to the retrieval result. Such a form of the system is available for retrieving the full contents of a document and also for displaying the document body printed in a format easy to read straight in the form of image. Accordingly, users are capable of retrieving documents with arbitrary words and also capable of reading even such a document as is complicated to include mathematical expressions and charts through a terminal in the form of image, the same as on paper. Further, the invention provides a system wherein the text information for retrieval is extracted automatically from the document image through character recognition. Since a precision of the character recognition has not been satisfactory hitherto, a visual retrieval and correction have been carried out without fail by operators. However, there is no necessity for the operators to attend therefor according to the invention. Thus, the text information for retrieval can be generated at the cost of practical time and money even in case of volumes of documents.
-
Citations
11 Claims
-
1. A document storage and retrieval system for storing and retrieving textual documents, comprising:
-
image file means for storing textual documents which are digital image data, said textual documents including bibliographic items providing bibliographic information of said textual documents and body text data providing data of text found in bodies of said textual documents; document recognition means, coupled to said image file means, for recognizing said textual documents, said document recognition means includes; (a) means for extracting pattern elements forming character patterns from said digital image data, (b) a document knowledge file for storing regulations of a layout of said bibliographic items in said textual documents as document knowledge, (c) character segmentation means for extracting character patterns by analyzing said pattern elements with reference to said document knowledge in said document knowledge file, and (d) recognition means for recognizing said extracted character patterns, said recognition means outputs a recognition result including said bibliographic items and said body text data with a layout structure name corresponding to the recognition result; data base file means, coupled to said document recognition means, for storing said bibliographic items and information as bibliographic information of said outputted recognition result with said layout structure name; text file means, coupled to said document recognition means, for storing at least said body text data as document contents of recognized textual documents; input means for inputting a request of a search keyword; retrieval means, coupled to said image file means, said data base file means, said text file means and said input means, for retrieving digital image data of at least one textual document which includes said search keyword based on said stored bibliographic information and said stored body text data; and output means, coupled to said retrieval means, for outputting said retrieved digital image data of at least one textual document. - View Dependent Claims (2)
-
-
3. A document storage and retrieval method for storing and retrieving textual documents, comprising the steps of:
-
storing textual documents which are digital image data said textural documents including bibliographic items providing bibliographic information of said textual documents and body text data providing data of text found in bodies of said textual documents; recognizing said textual documents, said recognizing step includes the steps of; (a) extracting pattern elements forming character patterns from said digital image data, (b) storing structural regulations of a layout of said bibliographic items in said textual documents as document knowledge, (c) extracting character patterns by analyzing said pattern elements with reference to said document knowledge, and (d) recognizing said extracted character patterns, and outputting a recognition result including said bibliographic items and said body text data with a layout structure name corresponding to the recognition result; storing said bibliographic items and information as bibliographic information of said outputted recognition result with said layout structure name; storing at least said body text data as document contents of recognized textual documents; inputting a request of a search keyword; retrieving digital image data of at least one textual document which includes said search keyword based on said stored bibliographic information and said stored body text data; and outputting said retrieved digital image data of at least one document. - View Dependent Claims (4)
-
-
5. A document storage and retrieval system for storing and retrieving textual documents, comprising:
-
an image file storing textual document image data said textural documents including bibliographic items providing bibliographic information of said textual document image data and body text data providing data of text found in bodies of said textual documents image data; means for extracting pattern elements forming character patterns from said textual document image data; a document knowledge file storing structural regulations of a layout of bibliographic items in said textual document image data as document knowledge, according to each kind of textual document; means for extracting subsets of pattern elements that constitute each bibliographic item, from said extracted pattern elements with reference to said document knowledge, and adding a name of a bibliographic item corresponding to said extracted subset of pattern elements to said extracted subset of pattern elements; means for recognizing character patterns as extracted pattern elements and generating a string of character codes corresponding to said extracted subset of pattern elements that constitutes a bibliographic item; a text file storing said string of character codes when said string of character codes corresponds to document contents; a data base file storing said string of character codes when said string of character codes corresponds to bibliographic information; means for inputting a request of a search keyword; and means for retrieving textual document image data of at least one textual document which includes a string of character codes corresponding to said search keyword based on strings of character codes stored in said text file and said data base file. - View Dependent Claims (6, 7, 8, 9)
-
-
10. In a document storage and retrieval system which holds data of textual documents in the form of an image and text, and retrieves textual document image data of at least one textual document which includes an inputted search keyword based on said data of documents in the form of text, a document storage method comprising the steps of:
-
reading textual document image data of textual documents in the form of an image, said textual document data including bibliographic items providing bibliographic information of said textual document image data and body text data providing data of text found in bodies of said textual documents image data; extracting pattern elements forming character patterns from said textual document image data; extracting subsets of pattern elements that constitute each of a plurality of said bibliographic items, from said extracted pattern elements, with reference to structural regulations of a layout of said bibliographic items in said textual document image data according to each kind of textual document; adding a name of a bibliographic item corresponding to said extracted subset of pattern elements to said extracted subset of pattern elements; recognizing character patterns as extracted pattern elements; generating a string of character codes corresponding to said extracted subset of pattern elements that constitute a bibliographic item; and storing strings of character codes in a text file when said string of character codes corresponds to document contents and in a data base file when said string of character codes corresponds to bibliographic information. - View Dependent Claims (11)
-
Specification