Show List

Market basket analysis

Market basket analysis is a data mining technique used to identify frequent patterns, associations and relationships among items that are frequently purchased together in a dataset. It can be used to gain insights into customer behavior and inform marketing and sales strategies.

The two main techniques used in market basket analysis are frequent itemset mining and association rule learning.

Frequent itemset mining

Frequent itemset mining is a method used to extract sets of items that frequently co-occur in a dataset. The Apriori algorithm is one of the most common algorithms used for frequent itemset mining. In Python, we can use the apriori function from the mlxtend library to generate a set of frequent itemsets.

For this example, we will use a dataset of transactions from a grocery store, where each transaction is represented by a list of items.

python
Copy code
import pandas as pd from mlxtend.frequent_patterns import apriori # Load the transaction data data = pd.read_csv("groceries.csv", header=None) transactions = data.values.tolist() # Generate the frequent itemsets frequent_itemsets = apriori(transactions, min_support=0.05)

The min_support parameter specifies the minimum support threshold, which is the proportion of transactions that must contain a particular itemset for it to be considered frequent. In this example, we have set the threshold to 0.05, meaning we only consider itemsets that occur in at least 5% of the transactions.

The output of the apriori function is a DataFrame that lists all the frequent itemsets and their corresponding support values.

Association rule learning

Association rule learning is a method used to extract rules that describe the relationships between different items in a dataset. The most common metric used to evaluate the strength of association rules is confidence. Confidence is the conditional probability of the consequent given the antecedent. In Python, we can use the association_rules function from the mlxtend library to generate a set of association rules.

python
Copy code
from mlxtend.frequent_patterns import association_rules # Generate the association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)

The metric parameter specifies the metric used to evaluate the strength of the association rules. In this example, we have set the metric to confidence, and the min_threshold parameter specifies the minimum threshold for the confidence of the rules to be considered.

The output of the association_rules function is a DataFrame that lists all the association rules and their corresponding metrics, such as support, confidence, and lift.

Example

Let's use an example to illustrate the process of market basket analysis using frequent itemset mining and association rule learning.

Suppose we have a dataset of transactions from a grocery store, where each transaction is represented by a list of items.

css
Copy code
[ ['bread', 'milk', 'eggs'], ['bread', 'milk', 'cheese'], ['milk', 'cheese'], ['bread', 'butter'], ['butter', 'jam'] ]

We can use the Apriori algorithm to generate a set of frequent itemsets.

python
Copy code
from mlxtend.frequent_patterns import apriori # Generate the frequent itemsets frequent_itemsets = apriori(transactions, min_support=0.4) print(frequent_itemsets)

This will output the following frequent itemsets:

scss
Copy code
support itemsets 0 0.6 (bread) 1 0.6 (milk) 2 0.4 (butter) 3 0

We can see that there are four frequent itemsets with a minimum support of 0.4. The first itemset (bread) has a support of 0.6, which means that it appears in 60% of the transactions.

Next, we can use association rule learning to generate a set of association rules.

python
Copy code
from mlxtend.frequent_patterns import association_rules # Generate the association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6) print(rules)

This will output the following association rules:

scss
Copy code
antecedents consequents antecedent support ... lift leverage conviction 0 (butter) (bread) 0.4 ... 1.666667 0.08 inf 1 (bread) (butter) 0.6 ... 1.666667 0.08 1.8

We can see that there are two association rules with a minimum confidence of 0.6. The first rule states that if a customer buys butter, they are 1.67 times more likely to buy bread. The second rule states that if a customer buys bread, they are 1.5 times more likely to buy butter.

These association rules can be used to inform marketing and sales strategies. For example, the store could place butter and bread next to each other to encourage customers to buy both items.

In summary, market basket analysis is a powerful technique that can be used to gain insights into customer behavior and inform marketing and sales strategies. Frequent itemset mining and association rule learning are two common techniques used in market basket analysis, and Python libraries like mlxtend provide easy-to-use functions for implementing these techniques.


    Leave a Comment


  • captcha text