MARKET BASKET ANALYSIS
market basket analysis
Market Basket Analysis
1. Introduction
Market Basket Analysis is used for understanding the association between items in a customer basket based on the frequency of those items in the customer basket. Support (frequency) and Confidence (Ratio of joint support of antecedent and consequent to the support of antecedent) are used for identifying the association rules. Association rules are evaluated using lift ratio.
2. Market Basket Analaysis using Groceries_data from Kaggle
Member_number Date itemDescription
1 1808 21-07-2015 tropical fruit
2 2552 05-01-2015 whole milk
3 2300 19-09-2015 pip fruit
4 1187 12-12-2015 other vegetables
5 3037 01-02-2015 whole milk
6 4941 14-02-2015 rolls/buns Member_number Date itemDescription
38760 3364 06-05-2014 oil
38761 4471 08-10-2014 sliced cheese
38762 2022 23-02-2014 candy
38763 1097 16-04-2014 cake bar
38764 1510 03-12-2014 fruit/vegetable juice
38765 1521 26-12-2014 cat food[1] 38982.1 Data Exploration
The groceries_data set has 38765 observations and 3 features including transaction ID (member_number), date and item description. There are 3898 members in this transaction data which are unique.
2.2 Creating the rules
#creating association rules
rules<-apriori(transaction_list,parameter = list(supp=0.0005,conf=0.2,minlen=2))Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.2 0.1 1 none FALSE TRUE 5 5e-04 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 7
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[167 item(s), 14963 transaction(s)] done [0.00s].
sorting and recoding items ... [158 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.01s].
writing ... [19 rule(s)] done [0.00s].
creating S4 object ... done [0.09s].#inspecting the first 10 rules
inspect(head(rules,10)) lhs rhs support confidence
[1] {artif. sweetener} => {whole milk} 0.0005346521 0.2758621
[2] {brandy} => {whole milk} 0.0008688097 0.3421053
[3] {spices} => {soda} 0.0006014837 0.2250000
[4] {softener} => {whole milk} 0.0008019782 0.2926829
[5] {house keeping products} => {whole milk} 0.0007351467 0.2444444
[6] {finished products} => {whole milk} 0.0008688097 0.2031250
[7] {rolls/buns, white bread} => {whole milk} 0.0006014837 0.2812500
[8] {other vegetables, white bread} => {whole milk} 0.0005346521 0.2051282
[9] {margarine, soda} => {whole milk} 0.0005346521 0.2051282
[10] {curd, rolls/buns} => {whole milk} 0.0006014837 0.2195122
coverage lift count
[1] 0.001938114 1.746815 8
[2] 0.002539598 2.166281 13
[3] 0.002673261 2.317051 9
[4] 0.002740092 1.853328 12
[5] 0.003007418 1.547872 11
[6] 0.004277217 1.286229 13
[7] 0.002138609 1.780933 9
[8] 0.002606429 1.298914 8
[9] 0.002606429 1.298914 8
[10] 0.002740092 1.389996 9 We created the rules based on the support ( = 0.005), confidence ( = 0.2), and minimum length ( = 2) using the Apriori algorithm available in the “arules” library. According to the rules created by these specific conditions, if a customer buys “artificial sweetener”, there is a 27.58% chance (confidence = 0.2758) for them to buy “Whole milk” also. Similarly, if a customer buys “brandy”, there is a 34.21% chance (confidence = 0.3421) for them to buy “Whole milk” as well. However, the support levels for these combinations are extremely low (0.00053 and 0.00086 respectively), indicating that these associations occur very rarely across the entire dataset.
Conclusion
Market Basket Analysis of the Groceries dataset revealed meaningful—though infrequent—associations between certain products. Using the Apriori algorithm with a support threshold of 0.0005 and confidence threshold of 0.2, we identified rules that highlight customer purchasing behavior:
Product Associations: Items like artificial sweetener and brandy showed moderate confidence levels (27.58% and 34.21%, respectively) in being purchased alongside whole milk. This suggests that when these niche items are bought, whole milk is a common complementary product.
Support vs. Confidence: While the confidence values indicate a reasonable likelihood of joint purchase, the low support values (below 0.001) imply that these combinations are not widespread across the entire customer base. These are niche patterns, not dominant trends.
Strategic Implications:
Retailers could use such insights for targeted promotions (e.g., bundling whole milk with artificial sweeteners or brandy).
These rules can also inform store layout decisions, placing associated items closer together to encourage impulse buys.
About the Author
I’m a student in the MBA Program (2025–2027), currently navigating Trimester I at Amrita School of Business, Amrita Vishwa Vidyapeetham, Coimbatore. This assignment/blog was written as part of our coursework for Introduction to Business Analytics.
Comments
Post a Comment