Document management method and apparatus and document search method and apparatus
First Claim
1. A document management method for managing document data stored in a document data region of a storage unit, comprising:
- shifting a character string of a given number of characters from document data and clipping the character string to generate a management Gram;
determining that the management Gram is one of a first Gram of relatively low occurrence frequency less than a threshold and a second Gram of relatively high occurrence frequency not less than the threshold;
storing first post data in a first post region of a storage unit in association with a Gram value obtained by computing the character string of the first Gram, the first post data being configured with a set of a document identification (ID) indicating the document data including the character string of the first Gram and an intra-document offset indicating a position of the character string of the first Gram; and
storing second post data in a second post region of the storage unit in association with the character string of the second Gram, the second post data being configured with a set of a document identification (ID) indicating document data including the character string of the second Gram and an intra-document offset indicating a position of the character string of the second Gram.
4 Assignments
0 Petitions
Accused Products
Abstract
A document management method includes shifting a character string of characters from document data and clipping it, determining that a management Gram obtained by the clipping is one of a first Gram of low frequency and a second Gram of high frequency, storing first post data in a first post region in association with a Gram value obtained by computing the character string of first Gram, the first post data having a set of a document identification (ID) indicating the document data including the first Gram and an intra-document offset indicating a character string position thereof, and storing second post data in a second post region in association with the character string of second Gram, the second post data having a set of a document identification (ID) indicating document data including the second Gram and an intra-document offset indicating a character string position thereof.
-
Citations
16 Claims
-
1. A document management method for managing document data stored in a document data region of a storage unit, comprising:
-
shifting a character string of a given number of characters from document data and clipping the character string to generate a management Gram;
determining that the management Gram is one of a first Gram of relatively low occurrence frequency less than a threshold and a second Gram of relatively high occurrence frequency not less than the threshold;
storing first post data in a first post region of a storage unit in association with a Gram value obtained by computing the character string of the first Gram, the first post data being configured with a set of a document identification (ID) indicating the document data including the character string of the first Gram and an intra-document offset indicating a position of the character string of the first Gram; and
storing second post data in a second post region of the storage unit in association with the character string of the second Gram, the second post data being configured with a set of a document identification (ID) indicating document data including the character string of the second Gram and an intra-document offset indicating a position of the character string of the second Gram. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A document retrieval method for searching document data stored in a document data region according to a retrieval key word, the method comprising:
-
preparing a storage unit containing a document data storage area in which document data is stored, a first post region storing first post data configured with a set of a document identification (ID) indicating document data including a character string of a first Gram and an intra-document offset indicating a position of the character string, in association with a Gram value obtained by computing the character string of the first Gram, and a second post region storing second post data configured with a set of a document identification (ID) indicating document data including a character string of a second Gram and an intra-document offset indicating a position of the character string of the second Gram, in association with the character string of the second Gram;
shifting a character string of a given number of characters from a retrieval key ward and clipping the character string to generate a retrieval Gram;
reading the first post data from the first post region by scanning the first post region according to a Gram value obtained by computing a character string of the retrieval Gram;
reading the second post data from the second post region by scanning the second post region according to the character string of the retrieval Gram; and
searching the document data region for document data matching with the retrieval key word, using the first read post data and the second post-data. - View Dependent Claims (7, 8)
-
-
9. A document management apparatus including comprising:
-
a storage unit having a document data region in which document data is stored;
a determination unit configured to determine that a management Gram corresponds to one of a first Gram of relatively low occurrence frequency less than a threshold and a second Gram of relatively high occurrence frequency not less than the threshold, the management Gram being generated by shifting a character string of a given number of characters from the document data of the storage unit and clipping the character string;
a first write-in unit configured to store first post data in a first post region of the storage unit in association with a Gram value obtained by computing the character string of the first Gram, the first post data being configured with a set of a document identification (ID) indicating the document data including the character string of the first Gram and an intra-document offset indicating a position of the character string of the first Gram; and
a second write-in unit configured to store second post data in a second post region of the storage unit in association with the character string of the second Gram, the second post data being configured with a set of a document identification (ID) indicating document data including the character string of the second Gram and an intra-document offset indicating a position of the character string of the second Gram;
a generation unit configured to generate a retrieval Gram by shifting a character string of a given number of characters from a retrieval key word clipping the character string, the retrieval key word being used for searching the document data stored in the document data region of the storage unit;
a first readout unit configured to read first post data from the first post region according to a Gram value obtained by computing the character string of the retrieval Gram;
a second readout unit configured to read second post data from the second post region according to the character string of the retrieval Gram; and
a search unit configured to search the document data region for document data matching with the retrieval key word using the first post data and the second post data. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A document search apparatus for searching document data stored in a document data region according to a retrieval key word, the apparatus comprising:
-
a storage unit containing a document data storage area in which document data is stored, a first post region and a second post region, wherein the first post region stores first post data configured with a set of a document identification (ID) indicating document data including a character string of a first Gram and an intra-document offset indicating a position of the character string, in association with a Gram value obtained by computing the character string of the first Gram, and the second post region stores second post data configured with a set of a document identification (ID) indicating document data including a character string of a second Gram and an intra-document offset indicating a position of the character string of the second Gram, in association with the character string of the second Gram;
a clipping unit configured to shift a character string of a given number of characters from a retrieval key ward and clip the character string to generate a retrieval Gram;
a reading unit configured to read the first post data from the first post region by scanning the first post region according to a Gram value obtained by computing a character string of the retrieval Gram;
a reading unit configured to read the second post data from the second post region by scanning the second post region according to the character string of the retrieval Gram; and
a searching unit configured to search the document data region for document data matching with the retrieval key word, using the first read post data and the second post-data. - View Dependent Claims (15, 16)
-
Specification