Data classifier system, data classifier method and data classifier program
First Claim
1. A data classifier system, implemented by a computer including a processor and a storage device, which refers to a classified hierarchy including a plurality of classifications to produce a plurality of classification axes based on classifications and data groups input,wherein the storage device stores the classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications;
- wherein the processor creates classification axis candidates based on combinations of classifications belonging to basic categories;
wherein the processor inputs the classification axis candidates to calculate priorities with respect to the classification axis candidates with reference to hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with the classification axis candidates, and hierarchical depths, each representing a length of a path between classifications associated with the classification axis candidates, andwherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes,wherein the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy is calculated using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance,wherein the Exhaustivity(X;
C) is given using a formula;
Exhaustivity(X;
C)=1/DataNum×
|U Data(ci)|,wherein DataNum denotes an amount of data being classified,wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, andwherein |U Data (ci)| denotes an amount of data allocated to the plurality of classifications.
1 Assignment
0 Petitions
Accused Products
Abstract
A data classifier system of the present invention selects a plurality of classifications correlated to data groups so as to output classification axes based on hierarchical classifications and data groups. The data classifier system includes a basic category accumulation means, a classification axis candidate creation means and a priority calculation means. The basic category accumulation means accumulates classifications serving as basic categories used for selecting desired classifications in advance. The classification axis candidate creation means creates classification axis candidates based on combinations of classifications each correlated to at least one data among descendant classifications of each basic category. The priority calculation means calculates priorities with respect to the classification axis candidates created by the classification axis candidate creation means based on hierarchical distances of classifications in the classified hierarchy.
-
Citations
29 Claims
-
1. A data classifier system, implemented by a computer including a processor and a storage device, which refers to a classified hierarchy including a plurality of classifications to produce a plurality of classification axes based on classifications and data groups input,
wherein the storage device stores the classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications; -
wherein the processor creates classification axis candidates based on combinations of classifications belonging to basic categories; wherein the processor inputs the classification axis candidates to calculate priorities with respect to the classification axis candidates with reference to hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with the classification axis candidates, and hierarchical depths, each representing a length of a path between classifications associated with the classification axis candidates, and wherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes, wherein the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy is calculated using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance, wherein the Exhaustivity(X;
C) is given using a formula;
Exhaustivity(X;
C)=1/DataNum×
|U Data(ci)|,wherein DataNum denotes an amount of data being classified, wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, and wherein |U Data (ci)| denotes an amount of data allocated to the plurality of classifications. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 29)
-
-
13. A data classifier system, implemented by a computer including a processor and a storage device, which refers to a classified hierarchy including a plurality of classifications to create a plurality of classification axes based on classifications and data groups input,
wherein the storage device stores a classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications; -
wherein the processor creates classification axis candidates based on combinations of classifications belonging to each basic category to produce multidimensional classification axis candidates each combining a plurality of classification axis candidates; wherein the processor inputs the classification axis candidates to calculate-priorities with respect to the multidimensional classification axis candidates with reference to hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with classification axis candidates and hierarchical depths each representing a length of a path between classifications associated with classification axis candidates, and wherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes, wherein the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy is calculated using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance, wherein the Uniqueness(X;
C) is given using a formula;
Uniqueness(X;
C)=1/(1/|U Data(ci)|×
Σ
CatNum(ci)),wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, wherein |U Data (ci)| denotes an amount of data having no redundancy allocated to each of the plurality of classifications, and wherein CatNum(ci) denotes an amount of data allocated to the classification ci. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A data classifier method, implemented by a computer including a processor and a memory, which refers to a classified hierarchy including a plurality of classifications to produce a plurality of classification axes based on classifications and data groups input,
wherein the memory stores the classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications; -
wherein the processor creates classification axis candidates based on combinations of classifications belonging to each basic category; wherein the processor inputs the classification axis candidates to calculate priorities with respect to the classification axis candidates with reference to hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with classification axis candidates, and hierarchical depths each representing a length of a path between classifications associated with the classification axis candidates, and wherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes, wherein the processor calculates the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance, wherein the Exhaustivity(X;
C) is given using a formula;
Exhaustivity(X;
C)=1/DataNum×
|U Data(ci)|,wherein DataNum denotes an amount of data being classified, wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, and wherein |U Data (ci)| denotes an amount of data allocated to the plurality of classifications. - View Dependent Claims (24)
-
-
25. A data classifier method, implemented by a computer including a processor and a memory, which refers to a classified hierarchy including a plurality of classifications to produce a plurality of classification axes based on classifications and data groups input,
wherein the memory stores the classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications, wherein the processor creates classification axis candidates based on combinations of classifications belonging to each basic category, to produce multidimensional classification axis candidates each combining a plurality of classification axis candidates; -
wherein the processor inputs the multidimensional classification axis candidates to calculate priorities with respect to the multidimensional classification axis candidates with reference to the hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with classification axis candidates, and hierarchical depths each representing a length of a path between classifications associated with classification axis candidates; and wherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes, wherein the processor calculates the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance, wherein the Uniqueness(X;
C) is given using a formula;
Uniqueness(X;
C)=1/(1/|U Data(ci)|×
Σ
CatNum(ci)),wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, wherein |U Data (ci)| denotes an amount of data having no redundancy allocated to each of the plurality of classifications, and wherein CatNum(ci) denotes an amount of data allocated to the classification ci. - View Dependent Claims (26)
-
-
27. A non-transitory computer-readable storage medium storing a data classifier program, executable by a computer, which refers to a classified hierarchy including a plurality of classifications to produce a plurality of classification axes based on classifications and data groups input, said data classifier program implementing:
-
a basic category accumulation process which stores the classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications; a classification axis candidate creation process which creates classification axis candidates based on combinations of classifications belonging to each basic category; and a priority calculation process which inputs the classification axis candidates to calculate priorities with respect to the classification axis candidates with reference to hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with the classification axis candidates, and hierarchical depths, each representing a length of a path between classifications associated with the classification axis candidates, wherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes, wherein the priority calculation process calculates the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance, wherein the Exhaustivity(X;
C) is given using a formula;
Exhaustivity(X;
C)=1/DataNum×
|U Data(ci)|,wherein DataNum denotes an amount of data being classified, wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, and wherein |U Data (ci)| denotes an amount of data allocated to the plurality of classifications.
-
-
28. A non-transitory computer-readable storage medium storing a data classifier program, executable by a computer, which refers to a classified hierarchy including a plurality of classifications to produce a plurality of classification axes based on classifications and data groups input, said data classifier program implementing;
-
a basic category accumulation process which stores the classified hierarchy establishing a parent-child relationship between the plurality of classifications and classifications serving as basic categories which are used to select desired classifications; a multidimensional classification axis candidate creation process which creates classification axis candidates based on combinations of classifications belonging to each basic category to produce multidimensional classification axis candidates each combining a plurality of classification axis candidates; and a multidimensional priority calculation process which inputs the multidimensional classification axis candidates to calculate priorities with respect to the multidimensional classification axis candidates with reference to hierarchical distances, each representing a length of a path reaching a common ancestor among classifications associated with the classification axis candidates, and hierarchical depths each representing a length of a path between classifications associated with the classification axis candidates, wherein semantic independence of each of the plurality of classification axes is based on length of hierarchical distances of each of the plurality of classification axes, wherein the multidimensional priority calculation process calculates the priority concerning a basic category X related to a plurality of classifications C in the classified hierarchy using a formula;
Priority(X;
C)=W1×
Independence(X;
C)+W2×
Specifics(X;
C)+W3×
Exhaustivity(X;
C)+W4×
Uniqueness(X;
C),wherein W1, W2, W3 and W4 denote weight coefficients to indexes which are determined in advance, wherein the Uniqueness(X;
C) is given using a formula;
Uniqueness(X;
C)=1/(1/|U Data(ci)|×
Σ
CatNum(ci)),wherein Data(ci) denotes a set of data allocated to a classification ci selected from among the plurality of classifications, wherein |U Data (ci)| denotes an amount of data having no redundancy allocated to each of the plurality of classifications, and wherein CatNum(ci) denotes an amount of data allocated to the classification ci.
-
Specification