System and method for organizing, compressing and structuring data for data mining readiness
First Claim
1. A system for performing data mining in a set of binary data arranged as a plurality of data items, wherein each data item has a plurality of bits, each bit in a corresponding one of a plurality of bit positions, the system comprising:
- a computer system including a processor and data storage, wherein the computer system is programmed to;
arrange the set of binary data in the data storage such that the binary data is in bit position groups, wherein each bit position group corresponds to a different one of the plurality of bit positions and includes bits of the binary data having that bit position;
compress the binary data of each bit position group such that each bit position group is represented by a compressed data structure, wherein the set of binary data is represented by a plurality of compressed data structures; and
perform a data mining technique using the plurality of compressed data structures to effect a tangible result of detecting at least one of a non-preselected pattern and a non-preselected relationship among the data items of the set of binary data.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for performing data mining in a set of binary data arranged as a plurality of data items in which each data item has a plurality of bits, each bit in a corresponding one of a plurality of bit positions. The set of binary data is arranged in the data storage such that the binary data is in bit position groups. Each bit position group corresponds to a different one of the plurality of bit positions and includes bits of the binary data having that bit position. The binary data of each bit position group is compressed to produce data structures representing the set of binary data. A data mining technique is performed using the plurality of compressed data structures.
58 Citations
25 Claims
-
1. A system for performing data mining in a set of binary data arranged as a plurality of data items, wherein each data item has a plurality of bits, each bit in a corresponding one of a plurality of bit positions, the system comprising:
a computer system including a processor and data storage, wherein the computer system is programmed to;
arrange the set of binary data in the data storage such that the binary data is in bit position groups, wherein each bit position group corresponds to a different one of the plurality of bit positions and includes bits of the binary data having that bit position;
compress the binary data of each bit position group such that each bit position group is represented by a compressed data structure, wherein the set of binary data is represented by a plurality of compressed data structures; and
perform a data mining technique using the plurality of compressed data structures to effect a tangible result of detecting at least one of a non-preselected pattern and a non-preselected relationship among the data items of the set of binary data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A system for performing data mining in a set of binary data arranged as a plurality of data items, wherein each data item has a plurality of bits, each bit in a corresponding one of a plurality of bit positions, the system comprising:
-
means for arranging the set of binary data in the data storage such that the binary data is in bit position groups;
means for compressing the binary data of each bit position group such that each bit position group is represented by a compressed data structure, wherein the set of binary data is represented by a plurality of compressed data structures; and
means for performing a data mining technique using the plurality of compressed data structures to effect a useful, concrete, and tangible result of detecting at least one of a non-preselected pattern and a non-preselected relationship among the data items of the set of binary data.
-
-
12. A method of performing data mining on data that is initially arranged as a plurality of data items, each data item having a plurality of bits in a plurality of bit positions, each bit being in one of the plurality of bit positions, the method comprising:
-
arranging the data into bit position-based groups, wherein each bit position-based group corresponds to a different one of each of the bit positions and includes data bits of the data that having that bit position; and
compressing the data of each of the bit position-based groups into a compressed representation to produce a plurality of compressed representations of the data; and
performing data mining using the plurality of compressed representations to effect a useful, concrete, and tangible result of detecting at least one of a non-preselected pattern and a non-preselected relationship among the data items of the set of binary data. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method of forming a data structure that facilitates data mining of a set of binary data having a plurality of multi-bit data items, said method comprising:
-
(a) grouping the binary data into a plurality of groups, each group being defined based on at least one bit position of the multi-bit data items;
(b) dividing each of said plurality of groups into subsets;
(c) recording a representation of each of the subsets in a computer-readable medium such that the recorded representation of the subsets is associated with a first organizational level;
(d) dividing each of said subsets into further subsets;
(e) recording a representation of each of said further subsets in a computer-readable medium such that the recorded representation of each of the further subsets is associated with a corresponding further organizational level; and
(f) repeating steps (d) and (e) until each of said further subsets comprises a pure-1 subset or a pure-0 subset. - View Dependent Claims (23, 24, 25)
-
Specification