Document storage and retrieval system
First Claim
Patent Images
1. A document storage and retrieval system comprising:
- storage means for storing plural documents of data to be retrieved, said data being stored in the form of character strings, said character strings including multiple candidates of character codes for a character which is not identified during character recognition of characters indicated in said plural documents, said multiple candidates of character codes being located between predetermined special character codes;
input means for inputting a partial character string retrieval request to initiate a text search for retrieval of a desired document having a desired partial character string out of said plural documents;
first generation means connected to said input means for generating a hetero-notation and a synonym in response to said partial character string retrieval request by using a hetero-notation convention and a thesaurus and for generating an aggregation of character strings on the basis of said hetero-notation and synonym generation;
second generation means connected to said first generation means for generating a finite state automaton in the form of a state transition matrix based on said aggregation of character strings generated by said first generation means in accordance with a predetermined rule, and for generating an extended finite state automaton defining predetermined states to transform a character string aggregation when said predetermined special character codes, which indicate the location of multiple candidates of character codes in said character strings, appear during retrieval of said plural documents;
means responsive to said second generation means for reading characters one-by-one out of said storage means and for verifying whether or not a desired partial character string exists according to said finite state automaton or said extended finite state automaton; and
means responsive to said reading and verifying means for outputting the documents in which said partial character string exists.
1 Assignment
0 Petitions
Accused Products
Abstract
A document storage and retrieval system stores a document body in the form of an image, storing text information in the form of a character code string for retrieval, and executing a retrieval with reference to the text information, followed by displaying a document image relating thereto on a retrieval terminal according to the retrieval result. Such a form of the system is available for retrieving the full contents of a document and also for displaying the document body printed in a format easy to read straight in the form of an image.
317 Citations
9 Claims
-
1. A document storage and retrieval system comprising:
-
storage means for storing plural documents of data to be retrieved, said data being stored in the form of character strings, said character strings including multiple candidates of character codes for a character which is not identified during character recognition of characters indicated in said plural documents, said multiple candidates of character codes being located between predetermined special character codes; input means for inputting a partial character string retrieval request to initiate a text search for retrieval of a desired document having a desired partial character string out of said plural documents; first generation means connected to said input means for generating a hetero-notation and a synonym in response to said partial character string retrieval request by using a hetero-notation convention and a thesaurus and for generating an aggregation of character strings on the basis of said hetero-notation and synonym generation; second generation means connected to said first generation means for generating a finite state automaton in the form of a state transition matrix based on said aggregation of character strings generated by said first generation means in accordance with a predetermined rule, and for generating an extended finite state automaton defining predetermined states to transform a character string aggregation when said predetermined special character codes, which indicate the location of multiple candidates of character codes in said character strings, appear during retrieval of said plural documents; means responsive to said second generation means for reading characters one-by-one out of said storage means and for verifying whether or not a desired partial character string exists according to said finite state automaton or said extended finite state automaton; and means responsive to said reading and verifying means for outputting the documents in which said partial character string exists. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of document retrieval in a data storage system comprising the steps of:
-
(a) storing plural documents of data in the form of character strings in a storage device, said character strings including multiple candidates of character codes for a character which is not identified during character recognition of characters indicated in said plural documents, said multiple candidates of character codes being located between predetermined special character codes; (b) generating a partial character string retrieval request to initiate a text search for retrieval of a desired document or documents having a desired partial character string out of said plural documents stored in said storage device; (c) effecting hetero-notation and synonym processing in response to said partial character string retrieval request using a hetero-notation convention and a thesaurus stored in a storage file and generating an aggregation of character strings as a result of said hetero-notation and synonym processing; (d) generating a finite state automaton in the form of a state transition matrix based on said aggregation of character strings generated in step (c), and generating an extended finite state automaton defining predetermined states to transform a character string aggregation when said predetermined special character codes, which indicate the location of multiple candidates of character codes in said character strings, appear during retrieval of said plural documents; (e) reading characters one-by-one out of said storage device and verifying whether or not a desired partial character string exists in said stored documents according to said finite state automaton or said extended finite state automaton; and (f) outputting from said storage device data for the documents in which said desired partial character string exists. - View Dependent Claims (7, 8, 9)
-
Specification