Compressed prefix trees and estDec+ method for finding frequent itemsets over data streams
First Claim
1. A method for finding frequent itemsets using a compressed prefix tree structure that manages information on an indefinite data set composed of transactions generated continuously in an application domain and information on itemsets generated in the transactions of the indefinite data set with a single node in a restricted memory space.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed a relates to a method for finding specific information by analyzing a large amount of data set and a method for finding frequent itemsets in a data mining system realized using the same. The mining in a data stream defined as an indefinite set of data continuously generated is directed to a method for finding valuable knowledge effectively from such data and, recently, various mining methods have been proposed. When considering the characteristics of the data stream in which data elements are indefinitely generated at high speed, these mining methods has an important requirement in that the memory usage required for the performance of the mining process be restricted within an available range. The present invention provides an effective data structure in finding frequent itemsets over data streams and finds necessary information using the data structure. The data structure proposed in the present invention is defined as a compressed prefix tree structure, and the compressed prefix tree merges or splits nodes during the mining operation by comparing the prefix tree structure applied to the conventional data mining to manage a plurality of items in a single node, thus dynamically and flexibly adjusting the tree size. Such dynamic adjustment function dynamically merges and splits nodes in the prefix tree, if the variation of itemsets that are most likely to be frequent itemsets due to the variation of the data stream, thus maximizing the accuracy of the mining result in a restricted memory space, i.e., the accuracy of frequent itemsets found. Moreover, the present invention provides a method for optimizing the memory usage that ensures an optimum mining result within an available memory range using the compressed prefix tree structure.
34 Citations
10 Claims
- 1. A method for finding frequent itemsets using a compressed prefix tree structure that manages information on an indefinite data set composed of transactions generated continuously in an application domain and information on itemsets generated in the transactions of the indefinite data set with a single node in a restricted memory space.
Specification