Hierarchical presearch type text search method and apparatus and magnetic disk unit used in the apparatus
First Claim
1. A document information search method for searching for specified text data containing a given search subject key word from a group of document text data stored in advance, said method comprising the steps of:
- generating a character component table in which existence of character codes for every document is stated with respect to all the character codes contained in said group of document text data;
searching said character component table for all the character codes constituting a desiredly designated search subject key word; and
performing a first presearch for extracting all documents each containing all the character codes constituting said search subject key word.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for making document information search and a magnetic disk unit to be used for realizing the method and apparatus. In the document information search method, in performing document search with respect to a desired subject key word, two stages of presearch are carried out. In a first stage of presearch (step 402), a character component table (500) in which existence of character codes for every document is stated with respect to all the character codes contained in the group of document text data of stored documents is generated, and the character component table is searched for all the character codes constituting a desiredly designated search subject key word to thereby extract all the documents each containing all the character codes constituting the search subject key word. In a second stage of presearch step 403), contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated is generated, and the documents each containing the search subject key words by word are extracted from the documents extracted by the first presearch. After the second stage of presearch, text search is performed in accordance with a neighbor condition, a contextual condition, or the like (step 404). Further, as a term comparator means, hardware (1106) for exclusive use for term comparison in accordance with a finite automation is employed. Further, as for different notation and synonym, inputted terms are once subject to different notation development in a different notation development processing portion (2601), each of the different-notation developed terms is subject to synonym development in a synonym development processing portion (2602) while referring to a synonym dictionary, and then the results of synonym development are further subject to different notation development in a different notation development processing portion (2603) in accordance with a conversion rule table (2603).
42 Citations
56 Claims
-
1. A document information search method for searching for specified text data containing a given search subject key word from a group of document text data stored in advance, said method comprising the steps of:
-
generating a character component table in which existence of character codes for every document is stated with respect to all the character codes contained in said group of document text data; searching said character component table for all the character codes constituting a desiredly designated search subject key word; and performing a first presearch for extracting all documents each containing all the character codes constituting said search subject key word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19, 20)
-
-
11. A document information search apparatus comprising:
-
text data storage means for storing a document text data group; search expression input means for inputting a search conditional expression in which a complex condition including key words for searching said document text data, and positional and logical relationships among said key words is designated; search expression analysis means for analyzing the inputted search conditional expression to extract a search subject key word and a complex condition descriptive portion; synonym development means for generating synonyms of said search subject key word on the basis of said search subject key word outputted from said search expression analysis means; different notation development means for generating different notation words of said subject key word including said synonyms; complex condition analysis means for analyzing said complex condition descriptive portion outputted from said search expression analysis means to thereby develop said complex condition descriptive portion into said positional and logical conditions; term comparator means for reading said text data from said text data storage means so as to collectively compare said text data with respect to the key word group given from said different notation development means; complex condition judgment means for detecting documents adapted to the conditions designated by said complex condition analysis means on the basis of the results of the comparison outputted from said term comparator means to thereby output identifiers of the detected documents; and
,search result output means for outputting identifier information of the documents adapted to said search conditional expression on the basis of the output results of said complex condition judgement means. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43)
-
-
41. A document information search system for searching document information or the like comprising:
-
a plurality of document information search apparatuses connected to a communication network and operated in parallel to each other in accordance with search conditions broadcast from a plurality of search terminals connected to said communication network, each of said plurality of document information search apparatuses including; text data storage means for storing a document text data group; search expression input means for inputting a search conditional expression in which a complex condition including key words for searching said document text data, and positional and logical relationships among said key words is designated; search expression analysis means for analyzing the inputted search conditional expression to extract a search subject key word and a complex condition descriptive portion; synonym development means for generating synonyms of said search subject key word on the basis of said search subject key word outputted from said search expression analysis means; different notation development means for generating different notation words of said subject key word including said synonyms; complex condition analysis means for analyzing said complex condition descriptive portion outputted from said search expression analysis means to thereby develop said complex condition descriptive portion into said positional and logical conditions; term comparator means for reading said text data from said text data storage means so as to collectively compare said text data with respect to the key word group given from said different notation development means; complex condition judgment means for detecting documents adapted to the conditions designated by said complex condition analysis means on the basis of the results of the comparison outputted from said term comparator means to thereby output identifiers of the detected documents; and
,search result output means for outputting identifier information of the documents adapted to said search conditional expression on the basis of the output results of said complex condition judgment means; communication means for connection with the communication network, so that a search conditional expression received through said communication means is supplied to said search expression input means and the search result outputted from said search result output means is sent back by use of said communication means to a search conversation terminal from which a search request corresponding to the search result is transmitted; and
,means for receiving the search results transmitted from said document information search apparatuses and displaying the received results on said search terminals.
-
-
44. A document information search system for searching document information or the like comprising:
-
a plurality of document information search apparatuses connected to a communication network and operated in parallel to each other in accordance with search conditions broadcast from a plurality of search terminals connected to said communication network, each of said plurality of document information search apparatuses including; text data storage means for storing a document text data group; search expression input means for inputting a search conditional expression in which a complex condition including key words for searching said document text data, and positional and logical relationships among said key words is designated; search expression analysis means for analyzing the inputted search conditional expression to extract a search subject key word and a complex condition descriptive portion; synonym development means for generating synonyms of said search subject key word on the basis of said search subject key word outputted from said search expression analysis means; different notation development means for generating different notation words of said subject key word including said synonyms; complex condition analysis means for analyzing said complex condition descriptive portion outputted from said search expression analysis means to thereby develop said complex condition descriptive portion into said positional and logical conditions; term comparator means for reading said text data from said text data storage means so as to collectively compare said text data with respect to the key word group given from said different notation development means; complex condition judgment means for detecting documents adapted to the conditions designated by said complex condition analysis means on the basis of the results of the comparison outputted from said term comparator means to thereby output identifiers of the detected documents; and
,search result output means for outputting identifier information of the documents adapted to said search conditional expression on the basis of the output results of said complex condition judgment means; image data storage means for storing image information related to stored document information, so that on the basis of the identifiers of documents adapted to search conditions and outputted from said search result output means, image information related to said documents is read out from said image data storage means and outputted together with the bibliographic information and text data of the documents; communication means for connection with the communication network, so that a search conditional expression received through said communication means is supplied to said search expression input means and the search result outputted from said search result output means is sent back by use of said communication means to a search conversation terminal from which a search request corresponding to the search result is transmitted; and
,means for receiving the search results transmitted from said document information search apparatuses and displaying the received results on said search terminals.
-
-
45. A document information search apparatus comprising means for storing text data by data unit such as literature unit, means for inputting search terms, and means for searching for said search terms in said stored text data, wherein said apparatus further comprises a different notation development means for developing each of the inputted terms into terms having variations in notation system such as syllable notation system, a synonym development means having a synonym dictionary for developing the inputted terms into synonym terms while making references to said synonym dictionary, and an integral means for integrating the terms obtained by said two term development means, whereby search terms inputted by a user are once developed into a term group A in which terms are different in notation from each other by said different notation develop means, each of the terms in said term group A is developed into a synonym term group B by said synonym development means, each of terms in said synonym term group B is developed into a term group C in which terms are different in notation from each other by said different notation develop means, said term group A and said term group C are integrated by said integral means to thereby obtain a term group D, and search is made on a data unit of said text data in which any one of term in said term group D is contained.
-
46. In a magnetic disk system comprising:
-
a collective type magnetic disk device including a plurality of data storage units respectively having magnetic disk units, input/output buffers for temporarily storing data to be inputted/outputted into/from data storage units respectively, and a multi-disk controller for controlling said data storage units and said input/output buffers; and a higher-rank apparatus for issuing a control instruction for said multi-disk controller; said collective type magnetic disk device being constituted by n data storage units in which on the assumption that a data transfer rate required from said higher-rank apparatus is represented by T, one-cylinder capacity of said magnetic disk units is represented by M, a data transfer rate from said data storage units to said the input/output buffers is represented by t, the minimum seek time of said magnetic disk units is represented by s, a revolution velocity of said magnetic disk units is represented by R, and the capacity of the input/output buffer is not smaller than the one-cylinder capacity M of said magnetic disk units, then the number n of said data storage units satisfies the following expression of ##EQU31## when said minimum seek time s of said magnetic disk units is longer than the time (M/T) required for transferring the data M from said input/output buffer to said higher-rank apparatus, while the number n of said data storage satisfies the following expression of ##EQU32## when said minimum seek time s of said magnetic disk units is not longer than the time (M/T) required for transferring the data of M from said input/output buffer to said higher-rank apparatus. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53)
-
-
54. A document information search method for searching for specified text data containing a given search subject key word from a group of document text data stored in advance, said method comprising the steps of:
-
generating contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated; and
,performing a contracted text data search for extracting documents each containing a given search subject key word by word from the contracted text data. - View Dependent Claims (55, 56)
-
Specification