Generating a Venn diagram using a columnar database management system

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
2
Assignments
First Claim
1. A method comprising:
 determining, by a computer, a plurality of intersection sets for all combinations of a plurality of input sets of data retrieved from a columnar format database, each intersection set of the plurality of intersections sets determined by the computer using a combination of input sets, the determining comprising;
loading a first combination of input sets of the plurality of input sets of data from the columnar format database,determining a first intersection set from the first combination of input sets,selecting a second intersection set based on a second combination of input sets, the second intersection set selected such that the input sets of the second combination include the input sets of the first combination,loading input sets of the second combination that are not included in the first combination, anddetermining the second intersection set from the second combination of input sets which include data that has been previously loaded with the first combination of input sets;
determining, by the computer, one or more subsets of a Venn diagram based on the plurality of intersection sets, the determining comprising computing a cardinality of each subset of the Venn diagram as a difference of a cardinality of a first intersection set and cardinality of previous computed subsets of the Venn diagram;
storing the one or more subsets of the Venn diagram on a nontransitory computerreadable medium; and
communicating a data mining result including the one or more subsets of the Venn diagram to a client device over a network for display on the client device.
2 Assignments
0 Petitions
Accused Products
Abstract
Venn diagrams are computed for a given plurality of input sets. The process of computing the Venn diagrams is executed on columnar database systems for efficient execution. The computation of various subsets of the Venn diagrams is performed by determining subsets of various combinations of the input sets and computing set differences of the intersection sets. The process orders the execution of various steps of computing the subsets for the Venn diagram in an order that reduces the number of times an input set is loaded. Information describing various subsets of a Venn diagram is used to render the Venn diagram for display, for example, on a client device.
37 Citations
No References
Data mining and reporting  
Patent #
US 7,945,850 B2
Filed 02/27/2007

Current Assignee
Mastermine Software Incorporated

Sponsoring Entity
Mastermine Software Incorporated

INFORMATION PRESENTING DEVICE, INFORMATION PRESENTING METHOD, INFORMATION PRESENTING PROGRAM, AND INTEGRATED CIRCUIT  
Patent #
US 20100010986A1
Filed 08/23/2007

Current Assignee
Panasonic Intellectual Property Corporation of America

Sponsoring Entity
Panasonic Corporation

EFFICIENT LARGESCALE JOINING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES  
Patent #
US 20100088309A1
Filed 12/15/2008

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Technology Licensing LLC

METHOD AND SYSTEM FOR PERFORMING A SCAN OPERATION ON A TABLE OF A COLUMNORIENTED DATABASE  
Patent #
US 20090019029A1
Filed 07/11/2007

Current Assignee
InfiniDB Inc.

Sponsoring Entity
InfiniDB Inc.

Efficient evaluation of queries with mining predicates  
Patent #
US 7,346,601 B2
Filed 06/03/2002

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Method and apparatus to visually present discussions for data mining purposes  
Patent #
US 7,421,660 B2
Filed 02/04/2003

Current Assignee
Adobe Inc.

Sponsoring Entity
Cataphora Incorporated

Method and apparatus for improved processing and analysis of complex hierarchic data  
Patent #
US 20070088731A1
Filed 10/21/2005

Current Assignee
Middlemarch Holdings Pty Limited

Sponsoring Entity
Middlemarch Holdings Pty Limited

Analyzing Administrative Healthcare Claims Data and Other Data Sources  
Patent #
US 20070174252A1
Filed 12/06/2006

Current Assignee
Ingenix Incorporated

Sponsoring Entity
Ingenix Incorporated

Progress notification supporting data mining  
Patent #
US 7,031,978 B1
Filed 05/17/2002

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

Universal tree interpreter for data mining models  
Patent #
US 6,941,318 B1
Filed 05/10/2002

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

Method and apparatus to visually present discussions for data mining purposes  
Patent #
US 20040153456A1
Filed 02/04/2003

Current Assignee
Adobe Inc.

Sponsoring Entity
Cataphora Incorporated

Adaptive acceleration of retrieval queries  
Patent #
US 20030158842A1
Filed 01/17/2003

Current Assignee
INFOCYCLONE LTD.

Sponsoring Entity
INFOCYCLONE LTD.

Data mining and reporting  
Patent #
US 20020013786A1
Filed 12/15/2000

Current Assignee
Robert Machalek

Sponsoring Entity
Robert Machalek

Technique for efficiently classifying packets using a trieindexed hierarchy forest that accommodates wildcards  
Patent #
US 6,041,053 A
Filed 09/18/1997

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Apparatus and accompanying methods, using a trieindexed hierarchy forest, for storing wildcardbased patterns and, given an input key, retrieving, from the forest, a stored pattern that is identical to or more general than the key  
Patent #
US 5,995,971 A
Filed 09/18/1997

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

SCAN SHARING FOR QUERY PREDICATE EVALUATIONS IN COLUMNBASED INMEMORY DATABASE SYSTEMS  
Patent #
US 20120084278A1
Filed 09/30/2010

Current Assignee
ServiceNow Incorporated

Sponsoring Entity
International Business Machines Corporation

Determination of Rules by Providing Data Records in Columnar Data Structures  
Patent #
US 20120310874A1
Filed 05/07/2012

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Hybrid OLTP and OLAP High Performance Database System  
Patent #
US 20130073513A1
Filed 04/04/2011

Current Assignee
Technische UniversitT MNchen

Sponsoring Entity
Technische UniversitT MNchen

Online Reorganization of Hybrid InMemory Databases  
Patent #
US 20130232176A1
Filed 03/05/2013

Current Assignee
HassoPlattnerInstitut fr Softwaresystemtechnik GmbH

Sponsoring Entity
HassoPlattnerInstitut fr Softwaresystemtechnik GmbH

Method and System To Manipulate Multiple Selections Against a Population of Elements  
Patent #
US 20130342542A1
Filed 06/24/2013

Current Assignee
Quintiles Transnational Corporation

Sponsoring Entity
Quintiles Transnational Corporation

Storage Advisor for HybridStore Databases  
Patent #
US 20140012881A1
Filed 07/09/2012

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

BUSINESS INTELLIGENCE PERFORMANCE ANALYSIS SYSTEM  
Patent #
US 20140040306A1
Filed 08/01/2012

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

PERFORMING DATA MINING OPERATIONS WITHIN A COLUMNAR DATABASE MANAGEMENT SYSTEM  
Patent #
US 20140372482A1
Filed 06/12/2014

Current Assignee
Open Text Corporation

Sponsoring Entity
Actuate Corporation

Performing CrossTabulation Using a Columnar Database Management System  
Patent #
US 20140379697A1
Filed 06/18/2014

Current Assignee
Open Text Corporation

Sponsoring Entity
Open Text Corporation

Generating a Venn Diagram Using a Columnar Database Management System  
Patent #
US 20140379703A1
Filed 06/19/2014

Current Assignee
Open Text Corporation

Sponsoring Entity
Open Text Corporation

System and method of multidimensional query results processing  
Patent #
US 9,081,849 B2
Filed 05/27/2004

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Method and apparatus for a multiplexed active data window in a near realtime business intelligence system  
Patent #
US 9,094,258 B2
Filed 08/09/2012

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

Business intelligence performance analysis system  
Patent #
US 9,183,529 B2
Filed 08/01/2012

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

Inmemory data profiling  
Patent #
US 9,218,373 B2
Filed 10/10/2012

Current Assignee
Business Objects Incorporated

Sponsoring Entity
Business Objects Incorporated

Systems and methods for data mining automation  
Patent #
US 9,405,821 B1
Filed 08/03/2012

Current Assignee
Tinyclues SAS

Sponsoring Entity
Tinyclues SAS

Performing crosstabulation using a columnar database management system  
Patent #
US 9,600,539 B2
Filed 06/18/2014

Current Assignee
Open Text Corporation

Sponsoring Entity
Open Text Corporation

PERFORMING CROSSTABULATION USING A COLUMNAR DATABASE MANAGEMENT SYSTEM  
Patent #
US 20170154079A1
Filed 02/10/2017

Current Assignee
Open Text Holdings Inc.

Sponsoring Entity
Open Text Holdings Inc.

Generating a venn diagram using a columnar database management system  
Patent #
US 9,679,000 B2
Filed 06/19/2014

Current Assignee
Open Text Corporation

Sponsoring Entity
Actuate Corporation

Performing data mining operations within a columnar database management system  
Patent #
US 9,798,783 B2
Filed 06/12/2014

Current Assignee
Open Text Holdings Inc.

Sponsoring Entity
Open Text Corporation

PERFORMING DATA MINING OPERATIONS WITHIN A COLUMNAR DATABASE MANAGEMENT SYSTEM  
Patent #
US 20180011909A1
Filed 09/25/2017

Current Assignee
Open Text Holdings Inc.

Sponsoring Entity
Open Text Holdings Inc.

Performing crosstabulation using a columnar database management system  
Patent #
US 10,282,355 B2
Filed 02/10/2017

Current Assignee
Open Text Holdings Inc.

Sponsoring Entity
Open Text Holdings Inc.

PERFORMING CROSSTABULATION USING A COLUMNAR DATABASE MANAGEMENT SYSTEM  
Patent #
US 20190228011A1
Filed 03/29/2019

Current Assignee
Open Text Holdings Inc.

Sponsoring Entity
Open Text Holdings Inc.

20 Claims
 1. A method comprising:
determining, by a computer, a plurality of intersection sets for all combinations of a plurality of input sets of data retrieved from a columnar format database, each intersection set of the plurality of intersections sets determined by the computer using a combination of input sets, the determining comprising; loading a first combination of input sets of the plurality of input sets of data from the columnar format database, determining a first intersection set from the first combination of input sets, selecting a second intersection set based on a second combination of input sets, the second intersection set selected such that the input sets of the second combination include the input sets of the first combination, loading input sets of the second combination that are not included in the first combination, and determining the second intersection set from the second combination of input sets which include data that has been previously loaded with the first combination of input sets; determining, by the computer, one or more subsets of a Venn diagram based on the plurality of intersection sets, the determining comprising computing a cardinality of each subset of the Venn diagram as a difference of a cardinality of a first intersection set and cardinality of previous computed subsets of the Venn diagram; storing the one or more subsets of the Venn diagram on a nontransitory computerreadable medium; and communicating a data mining result including the one or more subsets of the Venn diagram to a client device over a network for display on the client device.  View Dependent Claims (2, 3, 4, 5, 6, 7)
 8. A system comprising:
a processor; a nontransitory computerreadable medium; and stored instructions translatable by the processor for; determining a plurality of intersection sets for all combinations of a plurality of input sets of data retrieved from a columnar format database, each intersection set of the plurality of intersections sets determined by using a combination of input sets, the determining comprising; loading a first combination of input sets of the plurality of input sets of data from the columnar format database, determining a first intersection set from the first combination of input sets, selecting a second intersection set based on a second combination of input sets, the second intersection set selected such that the input sets of the second combination include the input sets of the first combination, loading input sets of the second combination that are not included in the first combination, and determining the second intersection set from the second combination of input sets which include data that has been previously loaded with the first combination of input sets; determining one or more subsets of a Venn diagram based on the plurality of intersection sets, the determining comprising computing a cardinality of each subset of the Venn diagram as a difference of a cardinality of a first intersection set and cardinality of previous computed subsets of the Venn diagram; storing the one or more subsets of the Venn diagram; and communicating a data mining result including the one or more subsets of the Venn diagram to a client device over a network for display on the client device.  View Dependent Claims (9, 10, 11, 12, 13, 14)
 15. A computer program product comprising a nontransitory computerreadable medium storing instructions translatable by a processor for:
determining a plurality of intersection sets for all combinations of a plurality of input sets of data retrieved from a columnar format database, each intersection set of the plurality of intersections sets determined by using a combination of input sets, the determining comprising; loading a first combination of input sets of the plurality of input sets of data from the columnar format database, determining a first intersection set from the first combination of input sets, selecting a second intersection set based on a second combination of input sets, the second intersection set selected such that the input sets of the second combination include the input sets of the first combination, loading input sets of the second combination that are not included in the first combination, and determining the second intersection set from the second combination of input sets which include data that has been previously loaded with the first combination of input sets; determining one or more subsets of a Venn diagram based on the plurality of intersection sets, the determining comprising computing a cardinality of each subset of the Venn diagram as a difference of a cardinality of a first intersection set and cardinality of previous computed subsets of the Venn diagram; storing the one or more subsets of the Venn diagram; and communicating a data mining result including the one or more subsets of the Venn diagram to a client device over a network for display on the client device.  View Dependent Claims (16, 17, 18, 19, 20)
1 Specification
This application is a continuation of, and claims a benefit of priority under 35 U.S.C. 120 of the filing date of U.S. patent application Ser. No. 14/308,971, filed Jun. 19, 2014, now U.S. Pat. No. 9,679,000, entitled “GENERATING A VENN DIAGRAM USING A COLUMNAR DATABASE MANAGEMENT SYSTEM,” which is a conversion of, and claims a benefit of priority from U.S. Provisional Application No. 61/837,272, filed Jun. 20, 2013, both of which are incorporated by reference in their entireties.
Field of Disclosure
This invention relates generally to data mining, and particularly to generating a Venn diagram using a columnar database management system.
Description of the Related Art
Data mining algorithms are often employed by various systems that process data. Data is often represented as sets of various types of entities, for example, products, employees, users of a system, transactions performed by an online system and so on. Data mining systems perform operations on these sets of data, for example, set union, set intersection, set difference, and so on. One operation performed by data mining systems is generation of a Venn diagram of two or more sets of data.
Generating a Venn diagram in turn requires performing various other set operations, for example, set intersection, set difference and so on. Conventional techniques for generating Venn diagrams perform these operations inefficiently. This is so because conventional techniques load the same data multiple times for performing various steps of the Venn diagram generation. As a consequence, generating a Venn diagram is often inefficient and consumes more computing resources than needed.
Embodiments of the invention generate Venn diagrams. A Venn diagram shows information describing subsets of data of two or more input sets. A data mining system receives a request for determining subsets of the Venn diagram. The request identifies the plurality of input sets for the Venn diagram. The data mining system generates intersection sets, each intersection set based on a combination of input sets. The data mining system determines the intersection sets in an order that efficiently utilizes data of input sets that has been previously loaded.
The data mining system loads a first combination of input sets for determining a first intersection set. The data mining system selects a second intersection set for processing next such that the combination of input sets for the second intersection set includes the first combination of input sets. For example, the second combination can be a superset of the first combination of input sets. The data mining system loads the input sets of the second combination that are not included in the first combination. The data mining system uses the loaded input sets to determine the second intersection set. The data mining system determines the subsets for the Venn diagram based on the intersection sets.
In an embodiment, the data mining system builds a truth table representing combinations of input sets as binary values. The positions of bits in a binary value from the truth table correspond to input sets and the value of each bit determines whether the input set corresponding to that position is included in the combination. The data mining system uses the truth table to determine the order in which steps for generating the Venn diagram are executed. For example, the data mining system ranks the combinations of sets based on the number of ones in each binary value of the truth table and uses the ranking to select the second intersection set after the first intersection set is determined.
The features and advantages described in this summary and the following detailed description are not allinclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims hereof.
The Figures (FIGS.) and the following description describe certain embodiments by way of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality.
As shown in
The columnar database management system 112 is a system configured to store data in column oriented fashion in contrast with databases that store data in row oriented fashion. The columnar database management system 112 stores data in column oriented fashion so as to perform computations of aggregates over columns of data efficiently compared to databases that store data in row oriented fashion. For example, if data is stored in row oriented fashion, loading a column of data requires loading of data belonging to other columns as well. However, the columnar database management system 112 stores in column oriented fashion and loads data of a column for processing without loading data belonging to other columns. As a result the columnar database management system 112 performs processing of a column of data more efficiently than databases that represent data in row oriented fashion. As shown in
The database engine 116 is a logical entity configured to create, read, update and delete data stored by the columnar data management system 112. In one embodiment, the database engine 116 is configured to perform data mining using the data mining algorithms 115 and the data 114. In one embodiment, the data mining algorithms 115 include a Venn diagram process for generating Venn diagrams. Generating a Venn diagram comprises calculation of various subsets of data displayed by a Venn diagram.
In the embodiment, responsive to receiving a data mining request to generate a Venn diagram from the client 105 or forwarded by the application frontend 110, the data mining system 108 performs various steps to process the request and generate a Venn diagram. In particular, the data mining system 108 generates a truth table responsive to a request to generate a Venn diagram. Furthermore, the data mining system 108 performs various arithmetic and logical operations using the truth table to generate the Venn diagram. Specifically, the database engine 116 performs calculations for those sets of data involved in the Venn diagram without performing additional operations, such as negation, union, or exclusion when the operations are not needed.
The interactions between the client devices 105 and the data mining system 108 are typically performed via a network 102, for example, via the internet. The network 102 enables communications between the client device 105 and the data mining system 108. In one embodiment, the network 108 uses standard communications technologies and/or protocols. The data exchanged over the network 108 can be represented using technologies and/or formats including the hypertext markup language (HTML), the extensible markup language (XML), etc. In another embodiment, the entities can use custom and/or dedicated data communications technologies instead of, or in addition to, the ones described above.
The Venn diagram shows various relations between sets. The relations may be shown pictorially as overlapping geometric shapes, for example, overlapping circles, overlapping rectangles, or other types of shapes. For example, as shown in
For example, region 120a corresponds to all elements of set 130a that do not overlap with any other set. Region 120b corresponds to all elements of set 130b that do not overlap with any other set and region 120c corresponds to all elements of set 130c that do not overlap with any other set. Region 120g corresponds to intersection of all three sets 130a, 130b, and 130c. Region 120d corresponds to intersection of sets 130a and 130b, minus the elements of region 120g. Region 120e corresponds to intersection of sets 130a and 130c, minus the elements of region 120g. Region 120f corresponds to intersection of sets 130b and 130c, minus the elements of region 120g.
Given the various subsets corresponding to a Venn diagram, the data mining system 108 can determine other relations between the sets. For example, the union of 120d and 120g can be determined to compute the intersection of sets 130a and 130b. Similarly, the union of 120e and 120g can be determined to compute the intersection of sets 130a and 130c. Similarly, the union of 120f and 120g can be determined to compute the intersection of sets 130b and 130c.
Computer Architecture
The storage device 208 is a nontransitory computerreadable storage medium, such as a hard drive, compact disk readonly memory (CDROM), DVD, or a solidstate memory device. In an embodiment, instructions for performing various steps of the process of generating Venn diagram are stored on a storage device 208. The memory 206 holds instructions and data used by the processor 202. The pointing device 214 may be a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard 210 to input data into the computer system 200. The graphics adapter 212 displays images and other information on the display 218. The network adapter 216 couples the computer system 200 to the network 102.
A computer 200 can have different and/or other components than those shown in
The computer 200 is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic utilized to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device 208, loaded into the memory 206, and executed by the processor 202.
Embodiments of the entities described herein can include other and/or different modules than the ones described here. In addition, the functionality attributed to the modules can be performed by other or different modules in other embodiments. Moreover, this description occasionally omits the term “module” for purposes of clarity and convenience.
Overall Process
The process of generating Venn diagrams using columnar databases orders the various steps of the computation such that data loaded for performing a step is reused by subsequent steps if possible. The data mining system 108 receives information identifying a plurality of input sets of data for generating a Venn diagram. In an embodiment, the columnar database management system 112 stores the input sets in a columnar format that stores data of a column on a storage device. The data mining system 108 determines intersections of various combinations of the input sets. For example, if the input data sets are 130a, 130b, and 130c, the various combinations of intersections are (130a∩130b), (130b∩130c), (130a∩130c), and (130a∩130b∩130c).
The data mining system 108 orders the computations of the intersections of sets such that the data used for determining an intersection of a combination is used for determining the intersection of the next combination. For example, if data for set 130a is loaded, the intersection (130a∩130b) may be determined next since this intersection uses the data of set 130a that is already loaded. Once the intersection of data set (130a∩130b) is determined, the data mining system 108 may determine the intersection of (130a∩130b∩130c) since performing this operation requires intersection set (130a∩130b) that is already loaded.
The data mining system 108 determines the various subsets of the Venn diagram by computing appropriate set differences of the intersection sets or subsets of Venn diagram previously computed. For example, the subset 120d of the Venn diagram is determined by computing the set difference of intersection set (130a∩130b) and the intersection set (130a∩130b∩130c). Similarly, the subset 120e of the Venn diagram is determined by computing the set difference of intersection set (130a∩130c) and the intersection set (130a∩130b∩130c). Similarly, the subset 120f of the Venn diagram is determined by computing the set difference of intersection set (130b∩130c) and the intersection set (130a∩130b∩130c).
Furthermore, subset 120a is computed by computing the set difference of set 130a and a union of subsets 120d, 120e, and 120g. Subset 120b is computed by computing the set difference of set 130b and a union of subsets 120d, 120f, and 120g. Subset 120c is computed by computing the set difference of set 130c and a union of subsets 120e, 120f, and 120g.
In the one embodiment, the columnar database management system 112 directly receives 305 a data mining request from the client 105 or receives a forwarded data mining request from the application frontend 110. The data mining request received 305 requests generation of a Venn diagram. The data mining request identifies the input sets of data for the Venn diagram. For example, the data mining request may identify the sets of data A, B, and C. For illustration purposes, the following description assumes that the set of data A includes 8 elements, the set of data B includes 10 elements, and the set of data C includes 8 elements.
In one aspect, the data mining request can be encapsulated in a suitable message format. For example the data mining request may be encapsulated in a suitable text format, such as XML, or in any other format. In some embodiments, the data mining request is received by the columnar database management system 112 via one or more suitable network transfer protocols. In other embodiments, the data mining request is received via an application programming interface (API) call, through receipt of a file containing the data mining request, or via an interactive console. It will be appreciated, however, that other ways of receiving the data mining request may be used.
After receiving the data mining request, the database engine 116 generates 315 a truth table based on the sets of data indicated by the data mining request. In one embodiment, the generated truth table includes multiple entries, where each entry corresponds to a combination of the input sets of data. Each entry of the truth table is associated with a binary value. The positions of bits in a binary value correspond to input sets and the value of a bit determines whether the input set corresponding to the position of the bit is included in the combination represented by the binary value.
For example, counting from the least significant bit to the most significant bit, the first position of bits may correspond to set C, the second position may correspond to set B, and the third position may correspond to set A. Furthermore, if a bit value is 1, the corresponding set is included in the combination and if the bit value is 0, the corresponding set is not included in the combination. A value 1 in the binary representation corresponds to value “true” and value 0 corresponds to value “false.” However, a different encoding scheme can be used, for example, an encoding scheme in which value 1 in the binary representation corresponds to value “false” and value 0 corresponds to value “true.”
The data mining system 108 ranks the combinations of sets based on the number of ones in each binary value. This ranking is distinct from a ranking based on the numeric value of the binary number. For example, even though binary number 1000 is greater that binary number 011, the number of ones in 011 is more than the number of ones in 1000. Accordingly, in the ranking based on the number of ones, 011 is ranked after the value 1000. The data mining system 108 uses this ranking to select the next intersection set to process after a particular intersection set is processed.
As another example, entry for the combination 3 corresponds to an intersection of the sets of data B and C. This is so because the binary representation of value 3 is 011. The first position corresponding to set C and the second position corresponding to set B are both 1 indicating that sets C and B are included in this combination. Furthermore, the third bit position is 0, indicating that the set A is not included in this combination.
Following generation of the truth table, the entries of the truth table are sorted 320 in ascending order based on the number of “trues” (or ones) in the combinations corresponding to the entries. For example,
Thereafter, for each noncalculated entry in the truth table and in ascending order by number of “trues,” the database engine 116 determines 325 an intersection set based on those sets of data marked as true in the combination of the noncalculated entry of the truth table. As used herein, a noncalculated entry may refer to an entry for which the intersection set or a cardinality (number of elements) of the intersection set has not yet been determined. For example, with reference to
After determining the intersection set for the combination of the noncalculated entry, the database engine 116 determines 330 the cardinality of the intersection set, and records the determined cardinality as a partial result in the noncalculated entry. Referring again to
After recording the cardinality as the partial result, the database engine 116 marks the entry as having been calculated. The entry is additionally marked as including the “current combination.” Thereafter, in ascending order and for each noncalculated entry including a combination higher than the current combination, the database engine 116 determines 335 whether the higher combination indicated by the entry includes at least the same sets as the current combination. For example, referring to
If it is determined that the higher combination contains at least one set that overlaps with the current combination, the database engine 116 repeats steps 325, 330, and 335 for the noncalculated entry including the higher combination. For example, referring to
With continued reference to the example, since there are no remaining noncalculated entries for combinations that include the sets of data A, B, and C, the database engine 116 processes the next noncalculated entry in ascending order in the truth table. For example, referring to
Once the intersection sets for all combinations are determined, the data mining system 108 determines the subsets of the Venn diagram based on the intersection sets. The data mining system 108 determines a subset of the Venn diagram as a set difference of an intersection set and one or more subsets of the Venn diagram that were previously computed. In an embodiment, the data mining system 108 determines a cardinality of each subset of the Venn diagram as a difference of a cardinality of a first intersection set and cardinality of subsets of the Venn diagram that were previously computed. The data mining system 108 uses the ranking based on the truth table to determine which subset of the Venn diagram to determine next. As described herein, the truth table ranks the combinations of sets based on the number of input sets in each combination.
The data mining system 108 processes the combinations of sets in decreasing order of the number of input sets of each combination to determine subsets of the Venn diagram. In an embodiment, the data mining system 108 determines a subset of the Venn diagram as the set difference of intersection sets and previously computed subsets of Venn diagram. In another embodiment, the data mining system 108 determines the cardinality of each subset of the Venn diagram as the difference of cardinality of each intersection set and cardinality of subsets of the Venn diagram that were previously computed.
Accordingly, for each entry in descending order from highest to lowest, final results are calculated 340 based on the partial results for the entry. To calculate the final result for a particular entry, the database engine 116 subtracts the partial value for the particular entry by the final result for each other entry that includes (1) a higher combination than the combination of the particular entry and (2) where the higher combination includes the sets of data of the combination of the particular entry. For example, referring to
Thereafter, the database engine 116 generates 345 a Venn diagram based on the final results and provides the Venn diagram to the client 105 for presentation to a user.
Alternative Applications
The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.
The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computerreadable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a generalpurpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a tangible nontransitory computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability. Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.