What is clustering? - Smartprofile

Clustering [1] is a method of unsupervised machine learning aiming to Group objects or observations without prior knowledge of the classes or categories to which they belong. The objective is discover hidden structures in the data.

To properly discover the concepts hidden behind this term, as well as concrete examples, find the key notions through a series of 3 articles, the first of which is here!

Analogy
Imagine you have a large bag of mixed fruit: apples, bananas, oranges, kiwis, etc., but you don't know how many varieties you have, nor how many of each type there are. Your mission? Sort into homogeneous groups fruits according to their shape, colour, size or weight.

This is exactly what a clustering algorithm does. It analyses objects (here, fruits) based on their attributes, and groups them according to their similarity, “without anyone telling it” in advance what each group corresponds to. We talk unsupervised learningThere is no predefined label. The model doesn't need to know what an «apple» or a «banana» is. It only knows the characteristics or Features available (weight, colour, size, etc.), like an eye discovering fruit for the first time. Its role is to automatically detect the underlying structures in all the data and group similar elements together.

Clustering is used in various industries

Clustering is now used in a wide range of sectors that are faced with complex massive volumes of data, often heterogeneous, unlabelled and complex. It allows these raw data to be structured by bringing forth coherent groups (clusters), which reflect underlying similarities and are difficult to distinguish by simple observation.

In the medical field, clustering is used, for example, to distinguish patient phenotypes from clinical data. Ahlqvist et al. [2] applied clustering to a type 2 diabetes dataset to reveal five distinct groups/clusters, with different complication and treatment response profiles.

In finance, In particular, clustering is used to detect fraud by spotting abnormal transactions that do not correspond to typical behaviour. Min, et al [3] propose a clustering method for detecting fraud by analysing sequences of clicks on digital interfaces, for example: the pages visited, the order of actions performed during the
navigating a banking app or e-commerce website.

Clustering for Digital Marketing

Clustering is a powerful lever for optimising marketing strategies. It enables companies to better understand their customers' interactions. Punj and Stewart [4] have produced an in-depth review of clustering methods applied to marketing.

Client segmentation

Traditionally, customer segmentation is based on demographic or transactional criteria, but these methods can be time-consuming and inflexible. Clustering, on the other hand, automates and enriches this segmentation. It is often used to identify audiences similar to a reference group (existing customers, subscribers, active users, etc.). This is based on the idea that «people who look alike consume in the same way». For example, an e-commerce site can detect «promotion hunters», «repeat customers» or «seasonal customers».

The clusters then become a analysis grid that can be projected onto new audiences to anticipate their conversion potential.

Segments can be created based on behaviour, such as purchase history, visit or order frequency, campaign responsiveness, etc. Alternatively, they can be based on sociodemographic data like age, gender, location, etc.

But it is often the combination of these two types of data that yields the most relevant results. The more data you have, the more effective segmentation becomes.

Content customisation

These generated segments automatically enable you to set up marketing campaigns more targeted, These include dynamically adapting offers (products, services), visuals or message content, the communication channel (email, push, SMS, social networks), etc.

Advertising campaign optimisation

Clustering can also be used to optimise media campaigns by identifying the best-performing creative for each cluster. It can also be used to coupled with A/B tests to further refine performance. For example, several variations of an advertising message can be tested within the same cluster, or compare the reactions between different clusters, in order to better understand what works for each type of profile.

Churn detection

Analysing user trajectories such as a decrease in purchase frequency, interactions, and visits, clustering allows to identify patterns of disengagement, to spot weak signals before the user abandons, triggering corrective actions such as a personalised offer or a proactive email follow-up.

Would you like to go further?

If you wish to take action and implement Clustering, you will find the mathematical formulation below!

Consider a dataset X = {x_1, x_2, …, x_n}, where each point
is an observation in a d-dimensional vector space. The goal of clustering
is to partition this set into K groups (or clusters) C = {C_1, C_2,…, C_K}, such that
What is it?

each observation belongs to a single cluster :
the union of all clusters covers the entire set of points:
the points within a cluster are closer to each other than to those in other clusters,
according to a distance or similarity function d(.,.).

If it's a little complex, don't hesitate to call on a team of experts to support you.
in your project. Smartprofile's teams deploy bespoke Clustering algorithms for
their users!

Things to remember

Clustering is an unsupervised learning method
This is a key tool for discovering hidden patterns in data.
Its applications in digital marketing are numerous and growing rapidly

In our next article, we'll look at how to implement a clustering algorithm and integrate it into a production pipeline. Do you want to segment your audiences precisely? Contact us for a demo of our
Solution Smartprofile.

Article written by Ibtihal El Mimouni – Data Scientist at Smartprofile, currently undertaking a CIFRE thesis on the challenges of AI for more responsible marketing

References

[1] Rokach, L., & Maimon, O. (2005). Clustering methods. Data mining and knowledge
discovery handbook, 321-352.

[2] Ahlqvist, E. et al. (2018). “Novel subgroups of adult-onset diabetes and their association.
with outcomes: a data-driven cluster analysis of six variables”. The Lancet Diabetes &
endocrinology, 6(5), 361-369.

[3] Wei Min, Weiming Liang, Hang Yin, Zhurong Wang, Mei Li, Alok Lal (2021). “Explainable
Deep Behavioural Sequence Clustering for Transaction Fraud Detection.” CoRR abs/2101.04285

[4] Punj, G., & Stewart, D. W. (1983). “Cluster analysis in marketing research: Review and
suggestions for application.” Journal of Marketing Research, 20(2), 134-148.

Process the potential of your data
and make the right decisions to take action.

Request a demo

Why maintain your email list?
You have set up a clean collection system, [...]

Read more
Welcome!
The idea of setting up a space for sharing between [...]

Read more
A few terms to know to better understand deliverability
To get started with email marketing, you need to learn a whole [...]

Read more
Cookies & CNIL Regulations: Continuing to Measure Audience with Consent Exemption
You will certainly have heard of the new guidelines and [...]

Read more
Prepare for the Black Friday rush with Smartprofile!
1. Mysterious teasing Before the D-day even arrives, generate [...]

Read more
Why do I highlight my unsubscribe link?
The size and responsiveness of the contact base [...]

Read more
5 essential Marketing Automation scenarios to get you started!
In the ever-evolving world of digital marketing, the [...]

Read more
Inject a dose of AI into your segmentation
Une des notions clés du marketing est la démarche de [...]

Read more
Thinking carefully about your message: some advice
The good deliverability of an email marketing campaign is often associated with [...]

Read more

You might also like