Market basket analysis
Market basket analysis is a data mining technique used to identify frequent patterns, associations and relationships among items that are frequently purchased together in a dataset. It can be used to gain insights into customer behavior and inform marketing and sales strategies.
The two main techniques used in market basket analysis are frequent itemset mining and association rule learning.
Frequent itemset mining
Frequent itemset mining is a method used to extract sets of items that frequently co-occur in a dataset. The Apriori algorithm is one of the most common algorithms used for frequent itemset mining. In Python, we can use the apriori
function from the mlxtend
library to generate a set of frequent itemsets.
For this example, we will use a dataset of transactions from a grocery store, where each transaction is represented by a list of items.
import pandas as pd
from mlxtend.frequent_patterns import apriori
# Load the transaction data
data = pd.read_csv("groceries.csv", header=None)
transactions = data.values.tolist()
# Generate the frequent itemsets
frequent_itemsets = apriori(transactions, min_support=0.05)
The min_support
parameter specifies the minimum support threshold, which is the proportion of transactions that must contain a particular itemset for it to be considered frequent. In this example, we have set the threshold to 0.05, meaning we only consider itemsets that occur in at least 5% of the transactions.
The output of the apriori
function is a DataFrame that lists all the frequent itemsets and their corresponding support values.
Association rule learning
Association rule learning is a method used to extract rules that describe the relationships between different items in a dataset. The most common metric used to evaluate the strength of association rules is confidence. Confidence is the conditional probability of the consequent given the antecedent. In Python, we can use the association_rules
function from the mlxtend
library to generate a set of association rules.
from mlxtend.frequent_patterns import association_rules
# Generate the association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)
The metric
parameter specifies the metric used to evaluate the strength of the association rules. In this example, we have set the metric to confidence, and the min_threshold
parameter specifies the minimum threshold for the confidence of the rules to be considered.
The output of the association_rules
function is a DataFrame that lists all the association rules and their corresponding metrics, such as support, confidence, and lift.
Example
Let's use an example to illustrate the process of market basket analysis using frequent itemset mining and association rule learning.
Suppose we have a dataset of transactions from a grocery store, where each transaction is represented by a list of items.
[ ['bread', 'milk', 'eggs'],
['bread', 'milk', 'cheese'],
['milk', 'cheese'],
['bread', 'butter'],
['butter', 'jam']
]
We can use the Apriori algorithm to generate a set of frequent itemsets.
from mlxtend.frequent_patterns import apriori
# Generate the frequent itemsets
frequent_itemsets = apriori(transactions, min_support=0.4)
print(frequent_itemsets)
This will output the following frequent itemsets:
support itemsets
0 0.6 (bread)
1 0.6 (milk)
2 0.4 (butter)
3 0
We can see that there are four frequent itemsets with a minimum support of 0.4. The first itemset (bread)
has a support of 0.6, which means that it appears in 60% of the transactions.
Next, we can use association rule learning to generate a set of association rules.
from mlxtend.frequent_patterns import association_rules
# Generate the association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)
print(rules)
This will output the following association rules:
antecedents consequents antecedent support ... lift leverage conviction
0 (butter) (bread) 0.4 ... 1.666667 0.08 inf
1 (bread) (butter) 0.6 ... 1.666667 0.08 1.8
We can see that there are two association rules with a minimum confidence of 0.6. The first rule states that if a customer buys butter
, they are 1.67 times more likely to buy bread
. The second rule states that if a customer buys bread
, they are 1.5 times more likely to buy butter
.
These association rules can be used to inform marketing and sales strategies. For example, the store could place butter
and bread
next to each other to encourage customers to buy both items.
In summary, market basket analysis is a powerful technique that can be used to gain insights into customer behavior and inform marketing and sales strategies. Frequent itemset mining and association rule learning are two common techniques used in market basket analysis, and Python libraries like mlxtend
provide easy-to-use functions for implementing these techniques.
Leave a Comment