Data structure, index creation device, data search device, index creation method, data search method, and computer-readable recording medium
First Claim
1. In a taxonomy having a tag with respect to search subject data, an index creation device configured to create an inverted list used for taking out a set of search subject data which can be reached from each node in said taxonomy, whereina list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code, andsaid index creation device, realized by a computer, comprises:
- a processor configured to execute instructions to implement;
an ancestor node determination part configured to select one ancestor node that is a higher node of said node for every node in said taxonomy, and generate data for ancestor reference indicating selected said ancestor node;
an ancestor node search part configured to generate an ancestor node list indicating one or more ancestor nodes of a tag in said taxonomy based on said data for ancestor reference; and
an ancestor number converting part configured to, upon receiving an identifier of search subject data, with respect to the highest node among each said node in said ancestor node list, adds said identifier as an element of a corresponding inverted list, and with respect to a node other than said highest node, as an element of a corresponding inverted list, in place of said identifier, adds an integer value indicating a position in an inverted list corresponding to a node that is higher by one than said node, whereinsaid ancestor node determination part, upon receiving a frequency distribution indicating whether each node in said taxonomy has or may have to what extent of a frequency within a prescribed data set, for every node in said taxonomy, based on a frequency corresponding to each ancestor node of said node, calculates a data length of a corresponding inverted list in the case of selecting said each ancestor node, and among said each ancestor node, selects preferentially said ancestor node where said data length is small and said ancestor node of a higher order in said taxonomy.
1 Assignment
0 Petitions
Accused Products
Abstract
In an inverted list of each node in a taxonomy, among each node, an inverted list of the highest node is a list of integer values indicating an identifier of search subject data, and an inverted list of a node other than the highest node, in place of the identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than the node. Furthermore, a list of integer values in an inverted list of each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before the integer value in the block is converted into a bit string of a variable length integer code.
-
Citations
12 Claims
-
1. In a taxonomy having a tag with respect to search subject data, an index creation device configured to create an inverted list used for taking out a set of search subject data which can be reached from each node in said taxonomy, wherein
a list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code, and said index creation device, realized by a computer, comprises: -
a processor configured to execute instructions to implement; an ancestor node determination part configured to select one ancestor node that is a higher node of said node for every node in said taxonomy, and generate data for ancestor reference indicating selected said ancestor node; an ancestor node search part configured to generate an ancestor node list indicating one or more ancestor nodes of a tag in said taxonomy based on said data for ancestor reference; and an ancestor number converting part configured to, upon receiving an identifier of search subject data, with respect to the highest node among each said node in said ancestor node list, adds said identifier as an element of a corresponding inverted list, and with respect to a node other than said highest node, as an element of a corresponding inverted list, in place of said identifier, adds an integer value indicating a position in an inverted list corresponding to a node that is higher by one than said node, wherein said ancestor node determination part, upon receiving a frequency distribution indicating whether each node in said taxonomy has or may have to what extent of a frequency within a prescribed data set, for every node in said taxonomy, based on a frequency corresponding to each ancestor node of said node, calculates a data length of a corresponding inverted list in the case of selecting said each ancestor node, and among said each ancestor node, selects preferentially said ancestor node where said data length is small and said ancestor node of a higher order in said taxonomy.
-
-
2. In a taxonomy having a tag with respect to search subject data, a data search device, realized by a computer, configured to take out a set of search subject data which can be reached from a specified node specified in said taxonomy comprising:
a processor configured to execute instructions to implement; an ancestor number inverted list storage part configured to store data for an inverted list where an inverted list of each node in said taxonomy is included, and among said each node, an inverted list of the highest node is a list of integer values indicating an identifier of said search subject data, and an inverted list of a node other than said highest node, in place of said identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than said node, and furthermore, a list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code; and an identifier converting part configured to, upon receiving information indicating said specified node, based on said data for an inverted list, create a list of identifiers of said search subject data corresponding to said specified node by repeating processing to take out an integer value of said inverted list corresponding to a higher node of said specified node, which corresponds to said position indicated by an integer value of said inverted list corresponding to said specified node until taking out said identifier of said inverted list corresponding to said highest node, wherein said data search device, in the case where two or more nodes in said taxonomy are specified, takes out a set of search subject data which can be reached from any of said specified nodes, and said identifier converting part, in the case where two or more nodes in said taxonomy are specified, acquires said inverted list corresponding to each said specified node from said ancestor number inverted list storage part, and in the case of performing processing to take out an integer value of said inverted list corresponding to a higher node of said specified node, which corresponds to said position indicated by an integer value of said inverted list corresponding to said specified node, when among said higher nodes common in a tuple of said specified nodes, taking out an integer value in said inverted list corresponding to a common ancestor node that is a higher node of the lowest order in said taxonomy, takes out said integer value common in a tuple of said specified nodes, and using taken-out said integer value, creates a list of identifiers of said search subject data corresponding to said two or more specified nodes. - View Dependent Claims (3, 4)
-
5. In a taxonomy having a tag with respect to search subject data, an index creation method configured to create an inverted list used for taking out a set of search subject data which can be reached from each node in said taxonomy, wherein
a list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code, and said index creation method comprises: -
for every node in said taxonomy, selecting one ancestor node that is a higher node of said node, and generate data for ancestor reference indicating selected said ancestor node; generating an ancestor node list indicating one or more ancestor nodes of a tag in said taxonomy based on said data for ancestor reference; and upon receiving an identifier of search subject data, with respect to the highest node among each node in said ancestor node list, adding said identifier as an element of a corresponding inverted list, and with respect to a node other than the highest node, as an element of a corresponding inverted list, in place of said identifier, add an integer value indicating a position in an inverted list corresponding to a node that is higher by one than said node, wherein in generating said data for ancestor reference, upon receiving a frequency distribution indicating whether each node in said taxonomy has or may have to what extent of a frequency within a prescribed data set, for every node in said taxonomy, based on a frequency corresponding to each ancestor node of said node, a data length of a corresponding inverted list in the case of selecting said each ancestor node is made to be calculated, and among said each ancestor node, said ancestor node where said data length is small and said ancestor node of a higher order in said taxonomy are made to be selected preferentially.
-
-
6. In a taxonomy having a tag with respect to search subject data, a data search method configured to take out a set of search subject data which can be reached from a specified node specified in said taxonomy, comprising:
-
acquiring data for an inverted list where an inverted list of each node in said taxonomy is included, and an inverted list of the highest node among said each node is a list of integer values indicating an identifier of said search subject data, and an inverted list of a node other than said highest node, in place of said identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than said node, and furthermore, a list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code; and upon receiving information indicating said specified node, based on said data for an inverted list, creating a list of identifiers of said search subject data corresponding to said specified node by repeating processing to take out an integer value of said inverted list corresponding to a higher node of said specified node, which corresponds to said position indicated by an integer value of said inverted list corresponding to said specified node until taking out said identifier of said inverted list corresponding to said highest node, wherein said data search method, in the case where two or more nodes in said taxonomy are specified, takes out a set of search subject data which can be reached from any of said specified nodes, and in creating a list of said identifiers, in the case where two or more nodes in said taxonomy are specified, acquiring said inverted list corresponding to each said specified node, and in the case of performing processing to take out an integer value of said inverted list corresponding to a higher node of said specified node, which corresponds to said position indicated by an integer value of said inverted list corresponding to said specified node, when among said higher nodes common in a tuple of said specified nodes, taking out an integer value in said inverted list corresponding to a common ancestor node that is a higher node of the lowest order in said taxonomy, takes out said integer value common in a tuple of said specified nodes, and using taken-out said integer value, creates a list of identifiers of said search subject data corresponding to said two or more specified nodes. - View Dependent Claims (7, 8)
-
-
9. In a taxonomy having a tag with respect to search subject data, a non-transitory computer-readable recording medium in which recorded is an index creation program to create an inverted list used for taking out a set of search subject data which can be reached from each node in said taxonomy,
wherein a list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code, and said index creation program is a program configured to make a computer execute the steps of: -
selecting, for every node in said taxonomy, one ancestor node that is a higher node of said node, and generating data for ancestor reference indicating selected said ancestor node; generating an ancestor node list indicating one or more ancestor nodes of a tag in said taxonomy based on said data for ancestor reference; and upon receiving an identifier of said search subject data, with respect to the highest node among each said node in said ancestor node list, adding said identifier as a element of a corresponding inverted list, and with respect to a node other than said highest node, as a element of a corresponding inverted list, in place of said identifier, adding an integer value indicating a position in an inverted list corresponding to a node that is higher by one than said node, wherein in generating said data for ancestor reference, upon receiving a frequency distribution indicating whether each node in said taxonomy has or may have to what extent of a frequency within a prescribed data set, for every node in said taxonomy, based on a frequency corresponding to each ancestor node of said node, a data length of a corresponding inverted list in the case of selecting said each ancestor node is made to be calculated, and among said each ancestor node, said ancestor node where said data length is small and said ancestor node of a higher order in said taxonomy are made to be selected preferentially.
-
-
10. In a taxonomy having a tag with respect to search subject data, a non-transitory computer-readable recording medium in which recorded is data search program to take out a set of search subject data which can be reached from a specified node specified in said taxonomy, said data search program being a program configured to make a computer execute the steps of:
-
acquiring data for an inverted list where an inverted list of each node in said taxonomy is included, and an inverted list of the highest node among said each node is a list of integer values indicating an identifier of said search subject data, and an inverted list of a node other than said highest node, in place of said identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than said node, and furthermore, a list of integer values in an inverted list of said each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before said integer value in said block is converted into a bit string of a variable length integer code; and upon receiving information indicating said specified node, based on said data for an inverted list, creating a list of identifiers of said search subject data corresponding to said specified node by repeating processing to take out an integer value of said inverted list corresponding to a higher node of said specified node, which corresponds to said position indicated by an integer value of said inverted list corresponding to said specified node until taking out said identifier of said inverted list corresponding to said highest node, wherein said data search program, in the case where two or more nodes in said taxonomy are specified, takes out a set of search subject data which can be reached from any of said specified nodes, and in creating a list of said identifiers, in the case where two or more nodes in said taxonomy are specified, acquiring said inverted list corresponding to each said specified node, and in the case of performing processing to take out an integer value of said inverted list corresponding to a higher node of said specified node, which corresponds to said position indicated by an integer value of said inverted list corresponding to said specified node, when among said higher nodes common in a tuple of said specified nodes, taking out an integer value in said inverted list corresponding to a common ancestor node that is a higher node of the lowest order in said taxonomy, takes out said integer value common in a tuple of said specified nodes, and using taken-out said integer value, creates a list of identifiers of said search subject data corresponding to said two or more specified nodes. - View Dependent Claims (11, 12)
-
Specification