MARKET BASKET ANALYSIS

 

market basket analysis

Author

BALAKRISHNAN C

Market Basket Analysis

1. Introduction

Market Basket Analysis is used for understanding the association between items in a customer basket based on the frequency of those items in the customer basket. Support (frequency) and Confidence (Ratio of joint support of antecedent and consequent to the support of antecedent) are used for identifying the association rules. Association rules are evaluated using lift ratio.

2. Market Basket Analaysis using Groceries_data from Kaggle

  Member_number       Date  itemDescription
1          1808 21-07-2015   tropical fruit
2          2552 05-01-2015       whole milk
3          2300 19-09-2015        pip fruit
4          1187 12-12-2015 other vegetables
5          3037 01-02-2015       whole milk
6          4941 14-02-2015       rolls/buns
      Member_number       Date       itemDescription
38760          3364 06-05-2014                   oil
38761          4471 08-10-2014         sliced cheese
38762          2022 23-02-2014                 candy
38763          1097 16-04-2014              cake bar
38764          1510 03-12-2014 fruit/vegetable juice
38765          1521 26-12-2014              cat food
[1] 3898

2.1 Data Exploration

The groceries_data set has 38765 observations and 3 features including transaction ID (member_number), date and item description. There are 3898 members in this transaction data which are unique.

2.2 Creating the rules

#creating association rules
rules<-apriori(transaction_list,parameter = list(supp=0.0005,conf=0.2,minlen=2))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.2    0.1    1 none FALSE            TRUE       5   5e-04      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[167 item(s), 14963 transaction(s)] done [0.00s].
sorting and recoding items ... [158 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.01s].
writing ... [19 rule(s)] done [0.00s].
creating S4 object  ... done [0.09s].
#inspecting the first 10 rules
inspect(head(rules,10))
     lhs                                rhs          support      confidence
[1]  {artif. sweetener}              => {whole milk} 0.0005346521 0.2758621 
[2]  {brandy}                        => {whole milk} 0.0008688097 0.3421053 
[3]  {spices}                        => {soda}       0.0006014837 0.2250000 
[4]  {softener}                      => {whole milk} 0.0008019782 0.2926829 
[5]  {house keeping products}        => {whole milk} 0.0007351467 0.2444444 
[6]  {finished products}             => {whole milk} 0.0008688097 0.2031250 
[7]  {rolls/buns, white bread}       => {whole milk} 0.0006014837 0.2812500 
[8]  {other vegetables, white bread} => {whole milk} 0.0005346521 0.2051282 
[9]  {margarine, soda}               => {whole milk} 0.0005346521 0.2051282 
[10] {curd, rolls/buns}              => {whole milk} 0.0006014837 0.2195122 
     coverage    lift     count
[1]  0.001938114 1.746815  8   
[2]  0.002539598 2.166281 13   
[3]  0.002673261 2.317051  9   
[4]  0.002740092 1.853328 12   
[5]  0.003007418 1.547872 11   
[6]  0.004277217 1.286229 13   
[7]  0.002138609 1.780933  9   
[8]  0.002606429 1.298914  8   
[9]  0.002606429 1.298914  8   
[10] 0.002740092 1.389996  9   

We created the rules based on the support ( = 0.005), confidence ( = 0.2), and minimum length ( = 2) using the Apriori algorithm available in the “arules” library. According to the rules created by these specific conditions, if a customer buys “artificial sweetener”, there is a 27.58% chance (confidence = 0.2758) for them to buy “Whole milk” also. Similarly, if a customer buys “brandy”, there is a 34.21% chance (confidence = 0.3421) for them to buy “Whole milk” as well. However, the support levels for these combinations are extremely low (0.00053 and 0.00086 respectively), indicating that these associations occur very rarely across the entire dataset.

Conclusion

Market Basket Analysis of the Groceries dataset revealed meaningful—though infrequent—associations between certain products. Using the Apriori algorithm with a support threshold of 0.0005 and confidence threshold of 0.2, we identified rules that highlight customer purchasing behavior:

  • Product Associations: Items like artificial sweetener and brandy showed moderate confidence levels (27.58% and 34.21%, respectively) in being purchased alongside whole milk. This suggests that when these niche items are bought, whole milk is a common complementary product.

  • Support vs. Confidence: While the confidence values indicate a reasonable likelihood of joint purchase, the low support values (below 0.001) imply that these combinations are not widespread across the entire customer base. These are niche patterns, not dominant trends.

  • Strategic Implications:

    • Retailers could use such insights for targeted promotions (e.g., bundling whole milk with artificial sweeteners or brandy).

    • These rules can also inform store layout decisions, placing associated items closer together to encourage impulse buys.

About the Author

I’m a student in the MBA Program (2025–2027), currently navigating Trimester I at Amrita School of Business, Amrita Vishwa Vidyapeetham, Coimbatore. This assignment/blog was written as part of our coursework for Introduction to Business Analytics.

Comments

Popular posts from this blog

workforce shceduler project

Team Tools and technologies for collaboration and Social Buisness

ESGFINANCIADATA