Method and apparatus for retail data mining using pair-wise co-occurrence consistency
First Claim
1. A method for semi-supervised insight discovery, the method being implemented by one or more data processors and comprising:
- seeking, by at least one data processor, pair-wise relationships between large numbers of entities, in a variety of domain specific contexts, from appropriately filtered and customized transaction data;
representing, by at least one data processor, the pair-wise relationships between the entities in a graph structure containing a set of nodes representing entities, and a set of edges representing strength of relationships between pairs of nodes;
discovering, by at least one data processor, insights in the form of relationship patterns of interest that may be projected or scored on individual or groups of transactions or customers; and
using, by at least one data processor, said insights to make data-driven-decisions for a variety of business goals;
said graph structure comprising any of the following types of structures;
a sub-graph comprising a subset of a graph, created by picking a subset of nodes and edges from an original graph, a sub-graph comprising any of;
node based sub-graphs which are created by selecting a subset of the nodes and by keeping only those edges between selected nodes; and
edge based sub-graphs which are created by pruning a set of edges from the graph and removing all nodes that are rendered disconnected from the graph;
a neighborhood of a target product comprising a sub-graph that contains the target product and all the products that are connected to the target product with consistency strength above a predefined threshold to show the top most affiliated products for a given target product;
a bundle structure comprising a sub-set of products wherein each product in the bundle has a high consistency connection with all the other products in the bundle, wherein each product in a bundle is assigned a product density with respect to the bundle which is high if the product has high consistency connection with other products in the bundle and low otherwise; and
a bridge structure comprising a collection of two or more, otherwise disconnected, product groups that are bridged by one or more bridge product(s).
1 Assignment
0 Petitions
Accused Products
Abstract
The invention, referred to herein as PeaCoCk, uses a unique blend of technologies from statistics, information theory, and graph theory to quantify and discover patterns in relationships between entities, such as products and customers, as evidenced by purchase behavior. In contrast to traditional purchase-frequency based market basket analysis techniques, such as association rules which mostly generate obvious and spurious associations, PeaCoCk employs information-theoretic notions of consistency and similarity, which allows robust statistical analysis of the true, statistically significant, and logical associations between products. Therefore, PeaCoCk lends itself to reliable, robust predictive analytics based on purchase-behavior.
235 Citations
34 Claims
-
1. A method for semi-supervised insight discovery, the method being implemented by one or more data processors and comprising:
-
seeking, by at least one data processor, pair-wise relationships between large numbers of entities, in a variety of domain specific contexts, from appropriately filtered and customized transaction data; representing, by at least one data processor, the pair-wise relationships between the entities in a graph structure containing a set of nodes representing entities, and a set of edges representing strength of relationships between pairs of nodes; discovering, by at least one data processor, insights in the form of relationship patterns of interest that may be projected or scored on individual or groups of transactions or customers; and using, by at least one data processor, said insights to make data-driven-decisions for a variety of business goals; said graph structure comprising any of the following types of structures; a sub-graph comprising a subset of a graph, created by picking a subset of nodes and edges from an original graph, a sub-graph comprising any of; node based sub-graphs which are created by selecting a subset of the nodes and by keeping only those edges between selected nodes; and edge based sub-graphs which are created by pruning a set of edges from the graph and removing all nodes that are rendered disconnected from the graph; a neighborhood of a target product comprising a sub-graph that contains the target product and all the products that are connected to the target product with consistency strength above a predefined threshold to show the top most affiliated products for a given target product; a bundle structure comprising a sub-set of products wherein each product in the bundle has a high consistency connection with all the other products in the bundle, wherein each product in a bundle is assigned a product density with respect to the bundle which is high if the product has high consistency connection with other products in the bundle and low otherwise; and a bridge structure comprising a collection of two or more, otherwise disconnected, product groups that are bridged by one or more bridge product(s). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A retail mining method for extracting actionable insights and data-driven decisions from transaction data, the method being implemented by one or more data processors and comprising:
-
data pre-processing by at least one data processor, wherein raw transaction data are filtered and customized;
said filtering cleaning said data by removing data elements that are to be excluded from analysis;
said customization creating different slices of said filtered transaction data that may be analyzed separately and whose results may be compared for further insight generation;graph generation, by at least one data processor, to create graphs that capture all pair-wise relationships between entities in a variety of contexts, said graph generation step comprising; context-instance creation wherein a number of context instances are created from said transaction data slice; co-occurrence counting wherein, for each pair of products, a co-occurrence count is computed as the number of context instances in which two products co-occurred; and co-occurrence consistency, wherein, once all co-occurrence counting is done, information theoretic consistency measures are computed for each pair of products, resulting in a graph; and insight discovery and decisioning, by at least one data processor, from said graphs, wherein said graphs serve as a model or internal representation of knowledge extracted from transaction data, said insight discovery and decisioning step further comprising any of;
product related insight discovery, wherein graph theory and machine learning algorithms are applied to said graphs to discover patterns of interest, including product bundles, bridge products, product phrases, and product neighborhoods;
wherein said patterns may be used to make decisions; and
customer related decisioning, wherein a graph is used as a model to decisions;said graphs comprising any of the following types of structures; a sub-graph comprising a subset of a graph, created by picking a subset of nodes and edges from an original graph, a sub-graph comprising any of; node based sub-graphs which are created by selecting a subset of the nodes and by keeping only those edges between selected nodes; and edge based sub-graphs which are created by pruning a set of edges from the graph and removing all nodes that are rendered disconnected from the graph; a neighborhood of a target product comprising a sub-graph that contains the target product and all the products that are connected to the target product with consistency strength above a predefined threshold to show the top most affiliated products for a given target product; a bundle structure comprising a sub-set of products wherein each product in the bundle has a high consistency connection with all the other products in the bundle, wherein each product in a bundle is assigned a product density with respect to the bundle which is high if the product has high consistency connection with other products in the bundle and low otherwise; and a bridge structure comprising a collection of two or more, otherwise disconnected, product groups that are bridged by one or more bridge product(s). - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. An article comprising a tangible computer-readable storage medium embodying instructions that when performed by one or more computer processors result in operations for semi-supervised insight discovery comprising:
-
seeking pair-wise relationships between large numbers of entities, in a variety of domain specific contexts, from appropriately filtered and customized transaction data; representing the pair-wise relationships between the entities in a graph structure containing a set of nodes representing entities, and a set of edges representing strength of relationships between pairs of nodes; discovering insights in the form of relationship patterns of interest that may be projected or scored on individual or groups of transactions or customers; and using said insights to make data-driven-decisions for a variety of business goals; said graph structure comprising any of the following types of structures; a sub-graph comprising a subset of a graph, created by picking a subset of nodes and edges from an original graph, a sub-graph comprising any of; node based sub-graphs which are created by selecting a subset of the nodes and by keeping only those edges between selected nodes; and edge based sub-graphs which are created by pruning a set of edges from the graph and removing all nodes that are rendered disconnected from the graph; a neighborhood of a target product comprising a sub-graph that contains the target product and all the products that are connected to the target product with consistency strength above a predefined threshold to show the top most affiliated products for a given target product; a bundle structure comprising a sub-set of products wherein each product in the bundle has a high consistency connection with all the other products in the bundle, wherein each product in a bundle is assigned a product density with respect to the bundle which is high if the product has high consistency connection with other products in the bundle and low otherwise; and a bridge structure comprising a collection of two or more, otherwise disconnected, product groups that are bridged by one or more bridge product(s).
-
-
34. An article comprising a tangible computer-readable storage medium embodying instructions that when performed by one or more computer processors result in operations for extracting actionable insights and data-driven decisions from transaction data comprising:
-
data pre-processing, wherein raw transaction data are filtered and customized;
said filtering cleaning said data by removing data elements that are to be excluded from analysis;
said customization creating different slices of said filtered transaction data that may be analyzed separately and whose results may be compared for further insight generation;graph generation to create graphs that capture all pair-wise relationships between entities in a variety of contexts, said graph generation step comprising; context-instance creation wherein a number of context instances are created from said transaction data slice; co-occurrence counting wherein, for each pair of products, a co-occurrence count is computed as the number of context instances in which two products co-occurred; and co-occurrence consistency, wherein, once all co-occurrence counting is done, information theoretic consistency measures are computed for each pair of products, resulting in a graph; and insight discovery and decisioning from said graphs, wherein said graphs serve as a model or internal representation of knowledge extracted from transaction data, said insight discovery and decisioning step further comprising any of;
product related insight discovery, wherein graph theory and machine learning algorithms are applied to said graphs to discover patterns of interest, including product bundles, bridge products, product phrases, and product neighborhoods;
wherein said patterns may be used to make decisions; and
customer related decisioning, wherein a graph is used as a model to decisions;said graphs comprising any of the following types of structures; a sub-graph comprising a subset of a graph, created by picking a subset of nodes and edges from an original graph, a sub-graph comprising any of; node based sub-graphs which are created by selecting a subset of the nodes and by keeping only those edges between selected nodes; and edge based sub-graphs which are created by pruning a set of edges from the graph and removing all nodes that are rendered disconnected from the graph; a neighborhood of a target product comprising a sub-graph that contains the target product and all the products that are connected to the target product with consistency strength above a predefined threshold to show the top most affiliated products for a given target product; a bundle structure comprising a sub-set of products wherein each product in the bundle has a high consistency connection with all the other products in the bundle, wherein each product in a bundle is assigned a product density with respect to the bundle which is high if the product has high consistency connection with other products in the bundle and low otherwise; and a bridge structure comprising a collection of two or more, otherwise disconnected, product groups that are bridged by one or more bridge product(s).
-
Specification