Association rules are one of the most straightforward of all of the data mining techniques. Almost everything about them is intuitive and can be easily understood. However, when people try to learn them, they often get bogged down with new equations and terminology that end up confusing them. All of the intuitiveness disappears as you focus more on memorizing the correct equations than on understanding what is going on. In this post, we will explain association rules in the simplest terms possible.

The easiest way to understand what association rules are is to consider a simple example. Imagine you run a small grocery store. You want to gain a better understanding of what products your customers buy. You not only want to know what individual products they are buying but which products they are buying together. To do this, you go stand at the checkout lane and write down what each customer buys. For simplicity sake, say there were five customers, and each one bought three items. You end up with the following lists:

Customer 1: Apples, Oranges, Grapes
Customer 2: Oranges, Apples, Gum
Customer 3: Oranges, Bananas, Grapes
Customer 4: Gum, Bananas, Bread
Customer 5: Apples, Oranges, Bread

It is important to note that in the following discussion, the order that the items are listed for any customer are irrelevant. All that matters is that each customer purchased the three items listed.

Take a look at the data and make a few notes about any patterns or trends that you see. First off, it looks like a lot of customers buy oranges. In fact, 4 out of the 5 (or 80%) of the customers bought oranges. Apples shows up three times (60%), and the rest of the items show up two times each (40%).

Support

BAM!! You just calculated the Support of each of these items! Wait, what? That was it? Yup. Support is just a fancy way of saying how many times something happened. It is represented as a value between 0 and 1 because you count how many times it happened and divide it by the total number of instances.

Again, to calculate the support of the item Gum, you just count how many customers purchased gum and divide it by the total number of customers. In this case, we get 2 out of 5 customers bought gum, so, 2 / 5 = 0.40

You can also calculate the support of more than one item. For example, you can calculate the support of the set {Oranges, Apples}. To do this, you look at your data and see that 3 out of the 5 customers purchased both oranges and apples. Hence, the support of this item set is 3 / 5 = 0.60 (the order the items appear inside of the set does not matter). Going forward, when we are referring to items, we will put them inside of the curly brackets to represent what are called, Itemsets. An Itemset can be a single item or a list of items, as we have done in this example with oranges and apples.

Association Rules

Now we want to come up with some Association Rules. A simple association rule is something like this: When customers buy Product A, they are likely to also buy Product B. So you have two parts of the rule, the Itemset on the lefthand side of the rule (LHS) and the Itemset on the righthand side of the rule (RHS).

The shorthand notation of this is LHS -> RHS. Interpretation: when the items on the left are purchased, the items on the right are also likely to be purchased.

To identify these rules, you look for items that often appear together. In this case, it looks like apples and oranges appear together often. So we could make a rule that says when customers buy apples, they are likely to also buy oranges. This rule would look like this: {Apples} -> {Oranges}.

Rule 1: {Apples} -> {Oranges}
Interpretation: Customers that buy apples are likely to also buy oranges

But wait you say, how do we know it is {Apples} -> {Oranges} and NOT {Oranges} -> {Apples}?

Let’s call this new rule, Rule 2.

Rule 2: {Oranges} -> {Apples}
Interpretation: Customers that buy oranges are likely to also buy apples

Which one of these rules is better? Does it even matter? Are they saying the exact same thing? Take a look at the data again and see what you think.

Calculating Confidence

To answer the above question, we are going to look at the confidence of each rule. So Rule 1 says customers that purchase apples are likely to also purchase oranges. Confidence answers the question, how likely? If the customer bought apples, how likely are they to have bought oranges?

Take a look at the data. Three customers bought apples. How many of them also bought oranges? All three. Three out of three. That is 100%. The Confidence of this rule is 3 / 3 = 1.0

Let’s take a look at Rule 2. If the customer bought oranges, how likely are they to have bought apples? Four customers bought oranges. How many of them also bought apples? Three. Three out of four. That is 75%. The Confidence of this rule is 3 / 4 = 0.75

We just calculated the confidence of each rule. The confidence of Rule 1 is 1.00, while the confidence of Rule 2 is 0.75. Besides seeing how easy it is to calculate confidence, we also just confirmed that {Apples} -> {Oranges} is NOT the same thing as {Oranges} -> {Apples}.

Revisiting Support

Earlier, we calculated the support of an Itemset containing a single item as well as the support of an Itemset containing multiple items (we did the support of {oranges, apples}). You can also calculate the support of an association rule. Remember, support just measures how many times something happens. To find the support for a rule, you just count how many times the rule was true and divide it by the total number. So if we want to find the support of Rule 1, we would count how many times both apples and oranges were purchased together and divide it by 5. We know that they were purchased together 3 times, so the support of Rule 1 is 3 / 5 = 0.60.

Consider this: What is going to have a higher support value, the Itemset {Apples, Oranges}, the rule {Apples} -> {Oranges}, or the rule {Oranges} -> {Apples}? The answer: they will all have the same support. In each instance, you are simply counting how many customers bought both apples and oranges and dividing it by the total number of customers.

Calculating Lift

The last metric to discuss is Lift. Of the three metrics, Lift is the most complicated, but it is still pretty intuitive.

Consider Rule 2: {Oranges} -> {Apples}. Lift answers the question: Are customers that buy oranges more likely to buy apples than the average customer? We already know that 75% of the customers who bought oranges also bought apples (this is the confidence of the rule). We also know that, in general, 60% of all of the customers bought apples (this is the support of {Apples}). So the answer is, yes… customers that buy oranges are more likely to buy apples than the average customer. How much more likely? To figure that out you do .75 / .60 = 1.25. This is gives you the lift value of the rule: 1.25

To calculate lift we took the confidence of the rule and divided it by the support of the RHS. If the lift value is above 1, it basically means the rule may be useful. If the value is one or below, it means the rule is not very useful.

Review

Let’s do one more example. Consider the rule, {Gum} -> {Oranges}.

What is the support of this rule? To figure this out, we count how many customers bought both gum and oranges. This gives us 1 out of 5 bought both. Therefore, support is 1 / 5 = 0.20

What is the confidence of this rule? If a customer buys gum, how likely is it that they also buy oranges? Two customers bought gum, of those two, only one also bought oranges. So 1 out of 2 customers that bought gum also bought oranges. The confidence of the rule is 1 / 2 = 0.50

What is the lift of this rule? Are customers that buy gum more likely to buy oranges than the average customer? We already know that 50% of the customers that bought gum also bought oranges. What about the average customer? If we look at the data we see 4 out of 5 customers buy oranges. So 80% of customers buy oranges. We calculate lift by saying 0.50 / 0.80 = 0.625. The lift value is less than 1, which means that this rule is not very useful.

Final Thoughts

The example we just did only involved five different customers purchasing three items each. With such a small dataset, identifying rules and calculating the three metrics is pretty easy to do manually. In practice, datasets contain hundreds, thousands, maybe millions of rows and involve perhaps thousands of different potential items. This is why we use computers to generate a list of rules with their confidence, support, and lift for us. It may be tempting to look only at a rule’s confidence to determine if it is a good rule, but you need to also look at the support and lift. Rules that have a lift ≤ 1.0 are not useful, regardless of how high the confidence is. Rules that have a lift > 1.0 may have value, regardless of how low the confidence seems to be.

Lastly, support can be somewhat tricky to evaluate. It simply depends on your dataset. If you are Walmart and you are analyzing the shopping patterns of millions of customers, a rule with a support value of 0.01 means the items are purchased together tens of thousands of times, so it may be quite useful to know that the products are related. Rules based on such large and diverse datasets often have seemingly low support (below 0.03), but can still be quite useful.


1 Comment

Jessica Sullivan · May 29, 2021 at 1:43 am

This was so helpful. With your support my confidence has been lifted.

Comments are closed.