[[1]]
[1] "milk" "bread" "butter"
[[2]]
[1] "bread" "butter"
[[3]]
[1] "milk" "bread"
[[4]]
[1] "milk" "butter" "eggs"
[[5]]
[1] "bread" "butter" "jam"

also called Market Basket Analysis
Frequent itemset mining helps in discovering associations and correlations among items in large datasets. These associations can then be used for tasks like:
Association Rule Mining (e.g., “If a customer buys bread, they are likely to buy butter”)
Recommendation systems
Inventory management
Cross-selling strategies
{bread, butter})the frequency (often expressed as percentage) with which an itemset appears in the data set
{bread, butter} appears in \(30\) out of \(100\) transactions, its support is \(30\%\)an itemset whose support is greater than or equal to a predefined, minimum threshhold
{bread, butter} is a frequent itemset| Transaction ID | Items Purchased |
|---|---|
| T1 | milk, bread, butter |
| T2 | bread, butter |
| T3 | milk, bread |
| T4 | milk, butter, eggs |
| T5 | bread, butter, jam |
| Transaction ID | Items Purchased |
|---|---|
| T1 | milk, bread, butter |
| T2 | bread, butter |
| T3 | milk, bread |
| T4 | milk, butter, eggs |
| T5 | bread, butter, jam |
milk, bread, butter, eggs, jam
| Itemset | Support Count | Transaction |
|---|---|---|
| {bread} | 4 | T1, T2, T3, T5 |
| {milk} | 3 | T1, T3, T4 |
| {butter} | 4 | T1, T2, T4, T5 |
| {milk, bread} | 2 | T1, T3 |
| {bread, butter} | 3 | T1, T2, T5 |
| {milk, butter] | 2 | T1, T4 |
| {milk, bread, butter} | 1 | T1 |
The Apriori algorithm finds frequent itemsets in a dataset by building larger sets from smaller ones, using the principle that all subsets of a frequent itemset must also be frequent.
| Transaction ID | Items Purchased |
|---|---|
| T1 | milk, bread, butter |
| T2 | bread, butter |
| T3 | milk, bread |
| T4 | milk, butter, eggs |
| T5 | bread, butter, jam |
Support Threshold: 2 (itemset must appear two times to be considered frequent)
| 1-Itemset | Support Count |
|---|---|
| milk | 3 (T1, T3, T4) |
| bread | 4 (T1, T2, T3, T5) |
| butter | 4 (T1, T2, T4, T5) |
| eggs | 1 (T4) |
| jam | 1 (T5) |
frequent itemsets (L1): milk, bread, butter
Generate all combinations of L1:
{milk, bread}{milk, butter}{bread, butter}Count the support
| 2-Itemset | Support Count |
|---|---|
| {milk,bread} | 2 (T1, T3) |
| {milk, butter} | 2 (T1, T4) |
| {bread, butter} | 3 (T1, T2, T5) |
frequent itemsets (L1): {milk, bread}, {milk, butter}, {bread, butter}
Generate combinations of L2 that share a common subset
{milk, bread, butter}Check the support:
| 3-Itemset | Support Count |
|---|---|
| {milk, bread, butter} | 1 (T1) |
{milk}, {bread}, {butter}{milk,bread}, {milk,butter}, {bread,butter}Association rule mining finds patterns or relationships between items in large datasets, such as products often bought together, focussing on those that meet a minimum confidence threshold (\(70\%\) in this example).
{milk}, {bread}, {butter}{milk,bread}, {milk,butter}, {bread,butter}| From | Rule | L2 | Support of L2 | L1 | Support of L1 | Confidence |
|---|---|---|---|---|---|---|
{milk,bread}
|
\(milk \rightarrow bread\) |
{milk,bread}
|
2 | milk | 3 | \(Confidence = \frac{2}{3} \approx 66.7\%\) |
| \(bread \rightarrow milk\) |
{milk,bread}
|
2 | bread | 4 | \(Confidence = \frac{2}{4} \approx 50\%\) | |
{milk,butter}
|
\(milk \rightarrow butter\) |
{milk,butter}
|
2 | milk | 3 | \(Confidence = \frac{2}{3} \approx 66.7\%\) |
| \(butter \rightarrow milk\) |
{milk,butter}
|
2 | butter | 4 | \(Confidence = \frac{2}{4} \approx 50\%\) | |
{bread,butter}
|
\(bread \rightarrow butter\) |
{bread,butter}
|
3 | bread | 4 | \(Confidence = \frac{3}{4} \approx 75\%\) |
| \(butter \rightarrow bread\) |
{bread,butter}
|
3 | butter | 4 | \(Confidence = \frac{3}{4} \approx 75\%\) |
| Rule | Confidence | Interpretation |
|---|---|---|
| \(bread \rightarrow butter\) | \(75\%\) | \(75\%\) of people who buy bread also buy butter |
| \(butter \rightarrow bread\) | \(75\%\) | \(75\%\) of people who buy butter also buy bread |
\[Lift = \frac{Confidence (bread \rightarrow butter)}{Support(butter)} = \frac{0.75}{0.8} = 0.9375 < 1\] \(Lift < 1\) means bread and butter co-occur less than expected if they were independent (only slightly).
\[Lift = \frac{Confidence (butter \rightarrow bread)}{Support(bread)} = \frac{0.75}{0.8} = 0.9375 < 1\]
\(Lift < 1\) means butter and bread co-occur less than expected if they were independent (only slightly).
[[1]]
[1] "milk" "bread" "butter"
[[2]]
[1] "bread" "butter"
[[3]]
[1] "milk" "bread"
[[4]]
[1] "milk" "butter" "eggs"
[[5]]
[1] "bread" "butter" "jam"
items
[1] {bread, butter, milk}
[2] {bread, butter}
[3] {bread, milk}
[4] {butter, eggs, milk}
[5] {bread, butter, jam}
# Mine frequent itemsets
frequent_items <- apriori(transactions,
parameter = list(supp = 0.4, target = "frequent itemsets"))Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
NA 0.1 1 none FALSE TRUE 5 0.4 1
maxlen target ext
10 frequent itemsets TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 2
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5 item(s), 5 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [6 set(s)] done [0.00s].
creating S4 object ... done [0.00s].
items support count
[1] {milk} 0.6 3
[2] {bread} 0.8 4
[3] {butter} 0.8 4
[4] {bread, milk} 0.4 2
[5] {butter, milk} 0.4 2
[6] {bread, butter} 0.6 3
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.7 0.1 1 none FALSE TRUE 5 0.4 1
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 2
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[5 item(s), 5 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [4 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
lhs rhs support confidence coverage lift count
[1] {} => {bread} 0.8 0.80 1.0 1.0000 4
[2] {} => {butter} 0.8 0.80 1.0 1.0000 4
[3] {bread} => {butter} 0.6 0.75 0.8 0.9375 3
[4] {butter} => {bread} 0.6 0.75 0.8 0.9375 3
lhs rhs support confidence coverage lift count
[1] {} => {bread} 0.8 0.80 1.0 1.0000 4
[2] {} => {butter} 0.8 0.80 1.0 1.0000 4
[3] {bread} => {butter} 0.6 0.75 0.8 0.9375 3
[4] {butter} => {bread} 0.6 0.75 0.8 0.9375 3
Master Thesis: Van Thuy Cuc Dang
Supervisor: Prof. Dr. Tim Weber
Cooperation Partner: SK Technology
| Aspect | Apriori | Eclat | FP-Growth |
|---|---|---|---|
| Core Strategy | Candidate generation + pruning | TID-list intersections | Prefix-tree with recursive mining |
| Data Format | Horizontal | Vertical (TID sets) | Tree-based (compressed horizontal) |
| Candidate Gen | Yes | No | No |
| Speed | Slower on large/dense data | Faster than Apriori | Fastest on large/dense data |
| Memory Use | High | Moderate | Low |
| DB Scans | Multiple | Fewer | Typically 2 |
| Rule Support | Yes | Yes | Yes (with post-processing) |
| Best For | Small/medium sparse datasets | Medium-size frequent data | Large and dense datasets |
Copyright Prof. Dr. Tim Weber, 2024