Clustering [1] is a method of unsupervised machine learning aiming to Group objects or observations without prior knowledge of the classes or categories to which they belong. The objective is discover hidden structures in the data.

To properly discover the concepts hidden behind this term, as well as concrete examples, find the key notions through a series of 3 articles, the first of which is here!

Analogy
Imagine you have a large bag of mixed fruit: apples, bananas, oranges, kiwis, etc., but you don't know how many varieties you have, nor how many of each type there are. Your mission? Sort into homogeneous groups fruits according to their shape, colour, size or weight.

This is exactly what a clustering algorithm does. It analyses objects (here, fruits) based on their attributes, and groups them according to their similarity, “without anyone telling it” in advance what each group corresponds to. We talk unsupervised learningThere is no predefined label. The model doesn't need to know what an «apple» or a «banana» is. It only knows the characteristics or Features available (weight, colour, size, etc.), like an eye discovering fruit for the first time. Its role is to automatically detect the underlying structures in all the data and group similar elements together.

Clustering is used in various industries

Clustering is now used in a wide range of sectors that are faced with complex massive volumes of data, often heterogeneous, unlabelled and complex. It allows these raw data to be structured by bringing forth coherent groups (clusters), which reflect underlying similarities and are difficult to distinguish by simple observation.

In the medical field, clustering is used, for example, to distinguish patient phenotypes from clinical data. Ahlqvist et al. [2] applied clustering to a type 2 diabetes dataset to reveal five distinct groups/clusters, with different complication and treatment response profiles.

In finance, In particular, clustering is used to detect fraud by spotting abnormal transactions that do not correspond to typical behaviour. Min, et al [3] propose a clustering method for detecting fraud by analysing sequences of clicks on digital interfaces, for example: the pages visited, the order of actions performed during the
navigating a banking app or e-commerce website.

Clustering for Digital Marketing

Clustering is a powerful lever for optimising marketing strategies. It enables companies to better understand their customers' interactions. Punj and Stewart [4] have produced an in-depth review of clustering methods applied to marketing.

Client segmentation

Traditionally, customer segmentation is based on demographic or transactional criteria, but these methods can be time-consuming and inflexible. Clustering, on the other hand, automates and enriches this segmentation. It is often used to identify audiences similar to a reference group (existing customers, subscribers, active users, etc.). This is based on the idea that «people who look alike consume in the same way». For example, an e-commerce site can detect «promotion hunters», «repeat customers» or «seasonal customers».

The clusters then become a analysis grid that can be projected onto new audiences to anticipate their conversion potential.

Segments can be created based on behaviour, such as purchase history, visit or order frequency, campaign responsiveness, etc. Alternatively, they can be based on sociodemographic data like age, gender, location, etc.

But it is often the combination of these two types of data that yields the most relevant results. The more data you have, the more effective segmentation becomes.

Content customisation

These generated segments automatically enable you to set up marketing campaigns more targeted, These include dynamically adapting offers (products, services), visuals or message content, the communication channel (email, push, SMS, social networks), etc.

Advertising campaign optimisation

Clustering can also be used to optimise media campaigns by identifying the best-performing creative for each cluster. It can also be used to coupled with A/B tests to further refine performance. For example, several variations of an advertising message can be tested within the same cluster, or compare the reactions between different clusters, in order to better understand what works for each type of profile.

Churn detection

Analysing user trajectories such as a decrease in purchase frequency, interactions, and visits, clustering allows to identify patterns of disengagement, to spot weak signals before the user abandons, triggering corrective actions such as a personalised offer or a proactive email follow-up.

Would you like to go further?

If you wish to take action and implement Clustering, you will find the mathematical formulation below!

Consider a dataset X = {x_1, x_2, …, x_n}, where each point  
is an observation in a d-dimensional vector space. The goal of clustering
is to partition this set into K groups (or clusters) C = {C_1, C_2,…, C_K}, such that
What is it?

  • each observation belongs to a single cluster :
  • the union of all clusters covers the entire set of points:
  • the points within a cluster are closer to each other than to those in other clusters,
    according to a distance or similarity function d(.,.).

If it's a little complex, don't hesitate to call on a team of experts to support you.
in your project. Smartprofile's teams deploy bespoke Clustering algorithms for
their users!

Things to remember

  • Clustering is an unsupervised learning method
  • This is a key tool for discovering hidden patterns in data.
  • Its applications in digital marketing are numerous and growing rapidly

In our next article, we'll look at how to implement a clustering algorithm and integrate it into a production pipeline. Do you want to segment your audiences precisely? Contact us for a demo of our
Solution Smartprofile.

Article written by Ibtihal El Mimouni – Data Scientist at Smartprofile, currently undertaking a CIFRE thesis on the challenges of AI for more responsible marketing

References

[1] Rokach, L., & Maimon, O. (2005). Clustering methods. Data mining and knowledge
discovery handbook, 321-352.

[2] Ahlqvist, E. et al. (2018). “Novel subgroups of adult-onset diabetes and their association.
with outcomes: a data-driven cluster analysis of six variables”. The Lancet Diabetes &
endocrinology, 6(5), 361-369.

[3] Wei Min, Weiming Liang, Hang Yin, Zhurong Wang, Mei Li, Alok Lal (2021). “Explainable
Deep Behavioural Sequence Clustering for Transaction Fraud Detection.” CoRR abs/2101.04285

[4] Punj, G., & Stewart, D. W. (1983). “Cluster analysis in marketing research: Review and
suggestions for application.” Journal of Marketing Research, 20(2), 134-148.

Process the potential of your data
and make the right decisions to take action.

You might also like