Association rule mining is one of the most popular data mining methods. This kind of analysis is also called frequent itemset analysis, association analysis or association rule learning.

To perform the analysis in R, we use the arules and arulesViz packages.

Introduction

In association analysis, we are usually interested in the absolute number of customer transactions (also called baskets) that contain a particular set of items (usually products). A typical application of association analysis is the analysis of consumer buying behavior in supermarkets and chain stores where they record the contents of shopping carts brought to the register for checkout. These transaction data are normally recorded by point-of-sale scanners and often consist of tuples of the form: {transaction ID, item ID, item ID, ...}. By finding frequent itemsets, a retailer can learn what is commonly bought together and use this information to increase sales in several ways.

Imagine there is a pair of different products (which we call items), X and Y, that are frequently bought together in a store (Ng & Soo, 2017):

Both X and Y can be placed on the same shelf, so that buyers of one item would be prompted to buy the other.
Promotional discounts could be applied to just one out of the two items.
Advertisements on X could be targeted at buyers who purchase Y.
X and Y could be combined into a new product, such as having Y in flavors of X.

Note that online retailers like Amazon.com or online platforms like Spotify have little need for this kind of analysis, since it is designed to search for itemsets that appear frequently. If the online retailer was limited to frequent itemsets, they would miss all the opportunities that are present in the “long tail” to select advertisements for each customer individually (for example to recommend certain products or songs). Instead of searching for frequent itemsets, they use similarity search algorithms (like collaborative filtering) to detect similar customers that have a large fraction of their baskets in common, even if the absolute number of baskets is small. (Leskovec, Rajaraman, & Ullman, 2020)

The market-basket model

Association rule mining is based on the so called “market-basket” model of data. This is essentially a many-many relationship between two kinds of elements, called items and baskets (also called transactions) with some assumptions about the shape of the data (Leskovec, Rajaraman, & Ullman, 2020):

Each basket (i.e. transaction) consists of a set of items (usually products).
Usually we assume that the number of items in a basket is small (much smaller than the total number of all items).
The number of all baskets (transactions) is usually assumed to be very large.
The data is assumed to be represented in a file consisting of a sequence of baskets (transactions).

To illustrate the logic of association rule mining, let’s create a sequence of baskets (transactions) with a small number of items from different customers in a grocery store. Note that because we use a very simple example with only a few baskets and items, the results of the analysis will differ from the results we may obtain from a real world example. We save the data as a sequence of transactions with the name market_basket:

library(tidyverse)
library(knitr)

# create a list of baskets
market_basket <-  
  list(  
  c("apple", "beer", "rice", "meat"),
  c("apple", "beer", "rice"),
  c("apple", "beer"), 
  c("apple", "pear"),
  c("milk", "beer", "rice", "meat"), 
  c("milk", "beer", "rice"), 
  c("milk", "beer"),
  c("milk", "pear")
  )

# set transaction names (T1 to T8)
names(market_basket) <- paste("T", c(1:8), sep = "")

Each basket includes so called itemsets (like {apple, beer, etc.}). You can observe that “apple” is bought together with “beer” in three transactions:

The frequent-itemsets problem is that of finding sets of items that appear in many of the baskets. Hence, a set of items that appears in many baskets is said to be “frequent”.

Association rules

While we are interested in extracting frequent sets of items, this information is often presented as a collection of if–then rules, called association rules.

The form of an association rule is {X -> Y}, where {X} is a set of items and {Y} is an item. The implication of this association rule is that if all of the items in {X} appear in some basket, then {Y} is “likely” to appear in that basket as well.

{X} is also called antecedent or left-hand-side (LHS) and
{Y} is called consequent or right-hand-side (RHS).

An example association rule for products from Apple could be {Apple iPad, Apple iPad Cover} -> {Apple Pencil}, meaning that if Apple’s iPad and iPad Cover {X} are bought, customers are also likely to buy Apple’s Pencil {Y}. Notice that the logical implication symbol “->” does not indicate a causal relationship between {X} and {Y}. It is merely an estimate of the conditional probability of {Y} given {X}.

Now imagine a grocery store with tens of thousands of different products. We wouldn’t want to calculate all associations between every possible combination of products. Instead, we would want to select only potentially “relevant” rules from the set of all possible rules. Therefore, we use the measures support, confidence and lift to reduce the number of relationships we need to analyze:

Support is an indication of how frequently a set of items appear in baskets.
Confidence is an indication of how often the support-rule has been found to be true.
Lift is a measure of association using both support and confidence.

If we are looking for association rules {X -> Y} that apply to a reasonable fraction of the baskets, then the support of X must be reasonably high. In practice, such as for marketing in brick-and-mortar stores, “reasonably high” is often around 1% to 10% of the baskets. We also want the conﬁdence of the rule to be reasonably high, perhaps 50%, or else the rule has little practical effect. (Leskovec, Rajaraman, & Ullman, 2020)

Furthermore, it must be assumed that there are not too many frequent itemsets and thus not too many candidates for high-support, high-conﬁdence association rules. The reason for this is that if we give companies to many association rules that meet our thresholds for support and conﬁdence, they cannot even read them, let alone act on them. Thus, it is normal to adjust the support and confidence thresholds so that we do not get too many frequent itemsets. (Leskovec, Rajaraman, & Ullman, 2020)

Next, we take a closer look at the measures support, confidence and lift.

Association measures

Support

The metric support tells us how popular a set of items is, as measured by the proportion of transactions in which the itemset appears.

In our data, the support of {apple} is 4 out of 8, or 50%. The support of {apple, beer, rice} is 2 out of 8, or 25%.

\[Support(apple) = \frac{4}{8} = 0.5\]

Or in general, for a set of items X:

\[ Support(X) = \frac{frequency(X)}{n} \]

with n = number of all transactions (baskets).

Usually, a specific support-threshold is used to reduce the number of itemsets we need to analyze. At the beginning of the analysis, we could set our support-threshold to 10%.

Confidence

Confidence tells us how likely an item Y is purchased given that item X is purchased, expressed as {X -> Y}. It is measured by the proportion of transactions with item X, in which item Y also appears. The confidence of a rule is defined as:

\[ Confidence(X -> Y) = \frac{support(X \cup Y)}{support(X)} = \frac {P(X \cap Y)}{P(X)} = P(Y|X) \]

Hence, the confidence can be interpreted as an estimate of the probability P(Y|X). In other words, this is the probability of finding the RHS (Y) of the rule in transactions under the condition that these transactions also contain the LHS (X) (Hornik, Grün, & Hahsler, 2005). Confidence is directed and therefore usually gives different values for the rules X -> Y and Y -> X.

Note that \(support(X ∪ Y)\) means the support of the union of the items in X and Y. Since we usually state probabilities of events and not sets of items, we can rewrite \(support(X \cup Y)\) as the probability \(P(E_X \cap E_Y)\), where \(E_{X}\) and \(E_{Y}\) are the events that a transaction contains itemset X and Y, respectively (review this site from Michael Hahsler for a detailed explanation).

In our example, the confidence that beer is purchased given that apple is purchased ({apple -> beer}) is 3 out of 4, or 75%. This means the conditional probability P(beer|apple) = 75%. Apple is the antecedent or left-hand-side (LHS) and beer is the consequent or right-hand-side (RHS).

\[Confidence(apple -> beer ) = \frac{\frac{3}{8}{}{}}{\frac{4}{8}{}} = \frac{3}{4} = 0.75\]

Note that the confidence measure might misrepresent the importance of an association. This is because it only accounts for how popular item X is (in our case apple) but not Y (in our case beer).

If beer is also very popular in general, there will be a higher chance that a transaction containing apple will also contain beer, thus inflating the confidence measure. To account for the base popularity of both items, we use a third measure called lift.

Lift

Lift tells us how likely item Y is purchased when item X is purchased, while controlling for how popular items Y and X are. It measures how many times more often X and Y occur together than expected if they were statistically independent.

\[Lift(X -> Y ) = \frac{P(X \cap Y)}{P(X) \times P(Y)}\]

In our example, lift is calculated as:

\[Lift(apple -> beer ) = \frac{\frac{3}{8}{}{}}{\frac{4}{8}{\times \frac{6}{8}}} = \frac{\frac{3}{8}{}{}}{\frac{24}{64}} = \frac{\frac{3}{8}{}{}}{\frac{3}{8}} = 1\]

Lift measures how many times more often X and Y occur together than expected if they were independent.

A lift value of:

lift = 1: implies no association between items.
lift > 1: greater than 1 means that item Y is likely to be bought if item X is bought,
lift < 1: less than 1 means that item Y is unlikely to be bought if item X is bought.

The lift of {apple -> beer} is 1, which implies no association between the two items.

A-Priori Algorithm

There are different algorithms for finding frequent item-sets. In this tutorial we cover the main idea behind the A-Priori Algorithm, which reduces the number of itemsets we need to examine. It works by eliminating itemsets by looking ﬁrst at smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. Put simply, the algorithm states that if an itemset is infrequent, then all its subsets must also be infrequent.

This means that if item {beer} was found to be infrequent, we can expect the itemset {beer, pizza} to be equally or even more infrequent. So in consolidating the list of popular itemsets, we need not consider {beer, pizza}, nor any other itemset configuration that contains {beer}.

The A-Priori Algorithm uses a so called breadth-first search strategy, which can be viewed in this decision tree:

Example of breadth-first search (source: Matheny, 2007)

Using this principle, the number of itemsets that have to be examined can be pruned (i.e. removing sections of the decision tree).

The list of popular itemsets can be obtained in these steps (Ng & Soo, 2017):

Step 0. Start with itemsets containing just a single item, such as {apple} and {pear}.
Step 1. Determine the support-threshold for itemsets. Keep the itemsets that meet your minimum support threshold, and remove itemsets that do not.
Step 2. Using the itemsets you have kept from Step 1, generate all the possible itemset configurations.
Step 3. Repeat Steps 1 & 2 until there are no more new itemsets.

This iterative process is illustrated in the animation below:

As seen in the animation, {apple} was determine to have low support, hence it was removed and all other itemset configurations that contain apple need not be considered. This reduced the number of itemsets to consider by more than half.

Note that the support threshold that you pick in Step 1 could be based on a formal analysis or past experience. If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold (otherwise you may use 1%-10% as a starting value).

We have seen how the A-Priori Algorithm can be used to identify itemsets with high support. The same principle can also be used to identify item associations with high confidence or lift. Finding rules with high confidence or lift is less computationally taxing once high-support itemsets have been identified, because confidence and lift values are calculated using support values (Ng & Soo, 2017).

Take for example the task of finding high-confidence rules. If the rule

{beer, chips -> apple}

has low confidence, all other rules with the same left hand side (LHS) items and with apple on the right hand side (RHS) would have low confidence too. Specifically, the rules

{beer -> apple, chips}
{chips -> apple, beer}

would have low confidence as well. As before, lower level candidate item rules can be pruned using the A-Priori Algorithm, so that fewer candidate rules need to be examined (Ng & Soo, 2017).

In summary, when you apply the A-Priori Algorithm on a given set of transactions, your goal will be to find all rules with support greater than or equal to your support threshold and confidence greater than or equal to your confidence threshold.

Implementation in R

install.packages("arules")
install.packages("arulesViz")

To perform the association analysis in R, we use the arules and arulesViz packages. Review Hornik et al. (2005) for a detailed description of the packages or visit the arules documentation site.

Transform data

First of all, you have to load the transaction data into an object of the “transaction class” to be able to analyze the data. This is done by using the following function of the arules package:

library(arules)

trans <- as(market_basket, "transactions")

Inspect data

Take a look at the dimensions of this object:

dim(trans)

[1] 8 6

This means we have 8 transactions and 6 distinct items.

Obtain a list of the distinct items in the data:

itemLabels(trans)

[1] "apple" "beer"  "meat"  "milk"  "pear"  "rice"

View the summary of the transaction data:

summary(trans)

transactions as itemMatrix in sparse format with
 8 rows (elements/itemsets/transactions) and
 6 columns (items) and a density of 0.4583333 

most frequent items:
   beer   apple    milk    rice    meat (Other) 
      6       4       4       4       2       2 

element (itemset/transaction) length distribution:
sizes
2 3 4 
4 2 2 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    2.00    2.50    2.75    3.25    4.00 

includes extended item information - examples:
  labels
1  apple
2   beer
3   meat

includes extended transaction information - examples:
  transactionID
1            T1
2            T2
3            T3

The summary() gives us information about our transaction object:

There are 8 transactions (rows) and 6 items (columns) and we can view the most frequent items.
Density tells us the percentage of non-zero cells in this 8x6-matrix.
Element length distribution: a set of 2 items in 4 transactions; 3 items in 2 of the transactions and 4 items in 2 transactions.

Note that a matrix is called a sparse matrix if most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The number of zero-valued elements divided by the total number of elements is called the sparsity of the matrix (which is equal to 1 minus the density of the matrix).

Take a look at all transactions and items in a matrix like fashion:

image(trans)

You can observe that almost half of the “cells” (45,83 %) are non zero values.

Display the relative item frequency:

itemFrequencyPlot(trans, topN=10,  cex.names=1)

The items {apple}, {milk} and {rice} all have a relative item frequency (i.e. support) of 50%.

A-Priori Algorithm

The next step is to analyze the rules using the A-Priori Algorithm with the function apriori(). This function requires both a minimum support and a minimum confidence constraint at the same time. The option parameter will allow you to set the support-threshold, confidence-threshold as well as the maximum lenght of items (maxlen). If you do not provide threshold values, the function will perform the analysis with these default values: support-threshold of 0.1 and confidence-threshold of 0.8.

#Min Support 0.3, confidence as 0.5.
rules <- apriori(trans, 
                 parameter = list(supp=0.3, conf=0.5, 
                                  maxlen=10, 
                                  target= "rules"))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.3      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6 item(s), 8 transaction(s)] done [0.00s].
sorting and recoding items ... [4 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [10 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

In our simple example, we already know that by using a support-threshold of 0.3, we will eliminate {meat} and {pear} from our analysis, since they have support values below 0.3.

The summary shows the following:

summary(rules)

set of 10 rules

rule length distribution (lhs + rhs):sizes
1 2 
4 6 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     1.0     2.0     1.6     2.0     2.0 

summary of quality measures:
    support        confidence        coverage           lift      
 Min.   :0.375   Min.   :0.5000   Min.   :0.5000   Min.   :1.000  
 1st Qu.:0.375   1st Qu.:0.5000   1st Qu.:0.5625   1st Qu.:1.000  
 Median :0.500   Median :0.5833   Median :0.7500   Median :1.000  
 Mean   :0.475   Mean   :0.6417   Mean   :0.7750   Mean   :1.067  
 3rd Qu.:0.500   3rd Qu.:0.7500   3rd Qu.:1.0000   3rd Qu.:1.000  
 Max.   :0.750   Max.   :1.0000   Max.   :1.0000   Max.   :1.333  
     count    
 Min.   :3.0  
 1st Qu.:3.0  
 Median :4.0  
 Mean   :3.8  
 3rd Qu.:4.0  
 Max.   :6.0  

mining info:
  data ntransactions support confidence
 trans             8     0.3        0.5
                                                                                           call
 apriori(data = trans, parameter = list(supp = 0.3, conf = 0.5, maxlen = 10, target = "rules"))

Set of rules: 10.
Rule length distribution (LHS + RHS): 4 rules with a length of 1 item; 6 rules with a length of 2 items.
Summary of quality measures: min, max, median, mean and quantile values for support, confidence and lift.
Mining info: number of transactions, support-threshold and confidence-threshold.

Inspect the 10 rules we obtained:

inspect(rules)

     lhs        rhs     support confidence coverage lift     count
[1]  {}      => {apple} 0.500   0.5000000  1.00     1.000000 4    
[2]  {}      => {milk}  0.500   0.5000000  1.00     1.000000 4    
[3]  {}      => {rice}  0.500   0.5000000  1.00     1.000000 4    
[4]  {}      => {beer}  0.750   0.7500000  1.00     1.000000 6    
[5]  {apple} => {beer}  0.375   0.7500000  0.50     1.000000 3    
[6]  {beer}  => {apple} 0.375   0.5000000  0.75     1.000000 3    
[7]  {milk}  => {beer}  0.375   0.7500000  0.50     1.000000 3    
[8]  {beer}  => {milk}  0.375   0.5000000  0.75     1.000000 3    
[9]  {rice}  => {beer}  0.500   1.0000000  0.50     1.333333 4    
[10] {beer}  => {rice}  0.500   0.6666667  0.75     1.333333 4

The rules 1 to 4 with an empty LHS mean that no matter what other items are involved the item in the RHS will appear with the probability given by the rule’s confidence (which equals the support). If you want to avoid these rules then use the argument parameter=list(minlen=2) (stackoverflow).

#Min Support 0.3, confidence as 0.5.
rules <- apriori(trans, 
                        parameter = list(supp=0.3, conf=0.5, 
                                         maxlen=10, 
                                         minlen=2,
                                         target= "rules"))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.3      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6 item(s), 8 transaction(s)] done [0.00s].
sorting and recoding items ... [4 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [6 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

inspect(rules)

    lhs        rhs     support confidence coverage lift     count
[1] {apple} => {beer}  0.375   0.7500000  0.50     1.000000 3    
[2] {beer}  => {apple} 0.375   0.5000000  0.75     1.000000 3    
[3] {milk}  => {beer}  0.375   0.7500000  0.50     1.000000 3    
[4] {beer}  => {milk}  0.375   0.5000000  0.75     1.000000 3    
[5] {rice}  => {beer}  0.500   1.0000000  0.50     1.333333 4    
[6] {beer}  => {rice}  0.500   0.6666667  0.75     1.333333 4

We can observe that rule 6 states that {beer -> rice} has a support of 50% and a confidence of 67%. This means this rule was found in 50% of all transactions. The confidence that rice (LHS) is purchased given beer (RHS) is purchased (P(rice|beer)) is 67%. In other words, 67% of the times a customer buys beer, rice is bought as well.

Set LHS and RHS

If you want to analyze a specific rule, you can use the option appearance to set a LHS (if part) or RHS (then part) of the rule.

For example, to analyze what items customers buy before buying {beer}, we set rhs=beerand default=lhs:

beer_rules_rhs <- apriori(trans, 
                          parameter = list(supp=0.3, conf=0.5, 
                                         maxlen=10, 
                                         minlen=2),
                          appearance = list(default="lhs", rhs="beer"))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.3      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[6 item(s), 8 transaction(s)] done [0.00s].
sorting and recoding items ... [4 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [3 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

Inspect the result:

inspect(beer_rules_rhs)

    lhs        rhs    support confidence coverage lift     count
[1] {apple} => {beer} 0.375   0.75       0.5      1.000000 3    
[2] {milk}  => {beer} 0.375   0.75       0.5      1.000000 3    
[3] {rice}  => {beer} 0.500   1.00       0.5      1.333333 4

It is also possible to analyze what items customers buy after buying {beer}:

beer_rules_lhs <- apriori(trans, parameter=list(supp=0.3, conf=0.5, maxlen=10, 
                                         minlen=2),
                          appearance = list(lhs="beer", default="rhs"))

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.3      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 2 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[6 item(s), 8 transaction(s)] done [0.00s].
sorting and recoding items ... [4 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [3 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

Inspect the result:

inspect(beer_rules_lhs)

    lhs       rhs     support confidence coverage lift     count
[1] {beer} => {apple} 0.375   0.5000000  0.75     1.000000 3    
[2] {beer} => {milk}  0.375   0.5000000  0.75     1.000000 3    
[3] {beer} => {rice}  0.500   0.6666667  0.75     1.333333 4

Visualizing association rules

Mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Therefore, in addition to our calculations of associations, we can use the package arulesViz to visualize our results as:

Scatter-plots,
interactive scatter-plots and
Individual rule representations.

For a detailed discussion of the different visualization techniques, review Hahsler & Karpienko (2017).

Scatter-Plot

A scatter plot for association rules uses two interest measures, one on each of the axes. The default plot for association rules in arulesViz is a scatter plot using support and confidence on the axes. The measure defined by shading (default: lift) is visualized by the color of the points. A color key is provided to the right of the plot.

To visualize our association rules in a scatter plot, we use the function plot() of the arulesViz package. You can use the function as follows:

plot(x, method, measure, shading, control, data, engine).

For a detailed description, review the vignette of the package:

x: an object of class “rules” or “itemsets”.
method: a string with value “scatterplot”, “two-key plot”, “matrix”, “matrix3D”, “mo-saic”, “doubledecker”, “graph”, “paracoord” or “grouped”, “iplots” selecting the visualization method.
measure: measure(s) of interestingness (e.g., “support”, “confidence”, “lift”, “order”) used in the visualization.
shading: measure of interestingness used for the color of the points/arrows/nodes (e.g., “support”, “confidence”, “lift”). The default is “lift”.
control: a list of control parameters for the plot. The available control parameters depend on the used visualization method.
data: the dataset (class “transactions”) used to generate the rules/itemsets. Only “mo-saic” and “doubledecker” require the original data.
engine: a string indicating the plotting engine used to render the plot. The “default” en- gine uses (mostly) grid, but some plots can produce interactive interactive grid visualizations using engine “interactive”, or HTML widgets using engine “html- widget”.

For a basic plot with default settings, just insert the object x (in our case rules). This visualization method draws a two dimensional scatter plot with different measures of interestingness (parameter “measure”) on the axes and a third measure (parameter “shading”) is represented by the color of the points.

library(arulesViz)

plot(rules)

The plot shows support on the x-axis and confidence on the y-axis. Lift ist shown as a color with different levels ranging from grey to red.

We could also use only “confidence” as a specific measure of interest:

plot(rules, measure = "confidence")

Scatter plot with confidence as measure of interest

There is a special value for shading called “order” which produces a two-key plot where the color of the points represents the length (order) of the rule if you select method = "two-key plot. This is basically a scatterplot with shading = "order":

plot(rules, method = "two-key plot")

Interactive scatter-plot

Plot an interactive scatter plot for association rules using plotly:

plot(rules, engine = "plotly")

Interactive scatter-plot

Graph-based visualization

Graph-based techniques concentrate on the relationship between individual items in the rule set. They represent the rules (or itemsets) as a graph with items as labeled vertices, and rules (or itemsets) represented as vertices connected to items using arrows.

For rules, the LHS items are connected with arrows pointing to the vertex representing the rule and the RHS has an arrow pointing to the item.

Several engines are available. The default engine uses igraph (plot.igraph and tkplot for the interactive visualization). … arguments are passed on to the respective plotting function (use for color, etc.).

The network graph below shows associations between selected items. Larger circles imply higher support, while red circles imply higher lift. Graphs only work well with very few rules, why we only use a subset of 10 rules from our data:

subrules <- head(rules, n = 10, by = "confidence")

plot(subrules, method = "graph",  engine = "htmlwidget")

Graph-based visualization

Parallel coordinate plot

Represents the rules (or itemsets) as a parallel coordinate plot (from LHS to RHS).

plot(subrules, method="paracoord")

The plot indicates that if a customer buys rice, he is likely to buy beer as well: {rice -> beer}. The same is true for the opposite direction: {beer -> rice}.

References

Hahsler, M., & Karpienko, R. (2017). Visualizing association rules in hierarchical groups. Journal of Business Economics, 87(3), 317–335. https://doi.org/10.1007/s11573-016-0822-8

Hornik, K., Grün, B., & Hahsler, M. (2005). arules - A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15), 1–25.

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge university press.

Ng, A., & Soo, K. (2017). Numsense! Data Science for the Layman: No Math Added. Leanpub.