Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
First Claim
1. A document search method in a data base of stored documents comprising the steps of:
- extracting a partial character string in a predetermined form from each of the stored documents;
creating a neighboring plural-character occurrence bitmap for indicating whether each of said documents contains any of said partial character strings;
extracting a search term partial character string in a predetermined form from a search term inputted for searching for a desired document from said stored documents; and
referring to said neighboring plural-character occurrence bitmap for said extracted search term partial string to search if any of said stored documents contains said search term partial character string, and to discard any of said stored documents not containing said search term partial character string,wherein;
said partial character string and said search term partial character string comprise predetermined n-character strings selected from predetermined (m+1)-th character positions from each of said stored documents and exclude intermediate elements between said predetermined (m+1)-th character positions where n is an integer of 2 or larger and m is an integer of 1 or larger.
1 Assignment
0 Petitions
Accused Products
Abstract
A neighboring plural-character occurrence bitmap of a practical capacity capable of eliminating noises by hashing is realized, and a high speed full text search is realized equivalently, by greatly reducing the number of documents to be searched even if a search term constituted by a combination of English characters and words is used. Text data is segmented into words, and n-character strings at every (m+l)-th character positions are extracted from each word. A neighboring plural-character occurrence bitmap is created which stores data representing a presence of each neighboring plural-character string at a certain entry thereof. N-character strings at every (m+l)-th character positions are extracted from a search term and the neighboring plural-character occurrence bitmap is searched by using a search control program. Since the neighboring plural-character occurrence bitmap is searched prior to searching condensed texts, documents not relevant to the search term can be discarded and a high speed full text search can be realized.
101 Citations
18 Claims
-
1. A document search method in a data base of stored documents comprising the steps of:
-
extracting a partial character string in a predetermined form from each of the stored documents; creating a neighboring plural-character occurrence bitmap for indicating whether each of said documents contains any of said partial character strings; extracting a search term partial character string in a predetermined form from a search term inputted for searching for a desired document from said stored documents; and referring to said neighboring plural-character occurrence bitmap for said extracted search term partial string to search if any of said stored documents contains said search term partial character string, and to discard any of said stored documents not containing said search term partial character string, wherein; said partial character string and said search term partial character string comprise predetermined n-character strings selected from predetermined (m+1)-th character positions from each of said stored documents and exclude intermediate elements between said predetermined (m+1)-th character positions where n is an integer of 2 or larger and m is an integer of 1 or larger. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
Specification