Implement a clustering algorithm

In our previous article, we discovered what clustering is and how it can be useful for better segmenting one's data, particularly in marketing.

In this second part, we detail the implementation of a clustering approach: choice of algorithms, key steps of the process and applications to customer segmentation.

Reminder: what is clustering?

The Clustering is an unsupervised learning method that groups similar objects into clusters called clusters. In marketing, this allows for a better understanding of your customers, tailoring of campaigns, and personalisation of the experience.

For example, segmenting newsletter subscribers according to their engagement level (clicks, opens, conversions) allows for tailored messaging for each group.

Types of clustering

Here we present three types of clustering widely used in the field of marketing. For each type, you will find a technical presentation and a practical section with application examples.

Partitioning clustering

Partitioning clustering consists of dividing a dataset into $k$ distinct clusters, based on their similarity. Each observation is assigned to a single cluster, by seeking to minimise a measure of distance, for example, Euclidean distance, which measures the proximity between two points.

The best-known algorithm in this category is K-Means [1].

Limits This approach is particularly suited when the clusters are spherical in shape and of comparable sizes. However, it is not very performant in the presence of noisy or non-linearly separable data.

Use case

To get an initial overview of users and a macro-level reading of their engagement level. For example, segmenting website visitors based on their online behaviour (frequency of visits, session duration, interactions with ads) would make it possible to identify a difference in intent: informational or transactional.
Identify occasional visitors (those who visit once, explore little, and don't convert) and loyal customers (those who return often and make purchases).

Examples of algorithms: K-Means [1] , K-Medoids [2], CLARANS [3]

Partition clustering. Left: partition into two clusters; right: partition into three clusters.

Hierarchical clustering

Hierarchical clustering builds a hierarchy of clusters in the form of a tree, called a dendrogram. Unlike partitional clustering, it doesn't require specifying the number of clusters in advance.

Limits This method can explore the data structure at multiple levels, but it can become computationally expensive on large datasets.

Use case

Analysing customer journeys: grouping can reveal segments according to acquisition channel (email, social media, SEO). For example, those arriving via SEO spend more time on the website, which can indicate information seeking, while those from emails primarily visit product pages and leave the site quickly, which reflects a more direct purchase intention.
Identify similar customer behaviours and create sub-segments based on their behavioural proximity. For example, visitors from social networks may, after several visits, adopt similar behaviours to those from SEO. Hierarchical clustering can highlight this behavioural proximity, allowing for more tailored content to be delivered based on their exploration phase, even if they initially come from a promotional acquisition channel.

Examples of algorithms: BIRCH [4], DIANA [5].

Dendrogram illustrating hierarchical clustering. The points (labelled A to Z) are grouped according to their similarity.

Clustering by density

Density-based clustering groups data points based on areas of high density, separated by areas of low density. Outlier points, located in areas of low density, are considered noise. This approach does not require specifying the number of clusters in advance.

The best-known algorithm in this category is DBSCAN.

Limits This method is well suited for noisy or non-linearly separable data. On the other hand, the choice of parameters can be sensitive.

Use case

Automatically detect groups of atypical behaviours on a website (like unusual visits or bots).
Identify micro-communities of users with similar trajectories, even if they are in the minority.

Examples of algorithms: DBSCAN [6], HDBSCAN [7], OPTICS [8].

Density-based clustering. Points are grouped according to their local density. Points in low-density areas are considered noise (unclassified).

Clustering pipeline: implementation method

Having defined the main types of clustering algorithms, let's now look at how to implement them in a project.

A clustering pipeline resembles a structured sequence of interdependent steps that transforms raw data into actionable clusters.

Here is a step-by-step process for implementing clustering in a marketing strategy:

Data preprocessing: This is about cleaning your CRM or web data:
- Remove duplicates, correct missing values.
- Encode categories (e.g. «acquisition source» into numerical variables).
Scaling of variables This involves putting all variables on a comparable scale, preventing one variable from dominating others in the calculation. For example, if one of your variables is measured in seconds (time spent on the site, ranging from 300 to 600 seconds) and another in number of clicks (ranging from 1 to 10), the clustering algorithm might give more importance to the former because its values are expressed on a larger scale (hundreds versus tens). By normalising this data, you bring all variables back to the same scale (between 0 and 1), allowing the algorithm to evaluate each variable fairly and produce more reliable clusters.
Algorithm choice The choice of clustering algorithm depends on the nature of your marketing data and your analysis objectives. For example:
- K-Means It's ideal if you're looking to segment your customers into distinct groups, for example, by their purchase frequency or average basket size. It works well when the groups are compact and separate.
- DBSCAN is better suited if your data contains noise, for example, to identify customers with unusual purchase journeys or atypical behaviours.
- The Hierarchical clustering allows for more exploratory analysis. It is useful if you want to visualise different levels of segmentation. It produces a kind of tree that helps you decide at which level to cut to define your segments. For example, it can reveal that a group of regular customers then splits into big buyers and small buyers, depending on their average basket size.
Validation of results Once the clusters have been obtained, it is essential to evaluate their coherence using indicators such as the silhouette score, which indicates how well each customer is associated with their segment compared to others. For example, if your marketing segments are poorly defined (customers too close between segments), this score will be low. By analysing these results, you can adjust certain parameters (fine-tuning), such as the number of clusters$k $, to refine the segmentation and better reflect your customers' actual behaviour.
Cluster interpretation: We analyse the characteristics specific to each group to understand their meaning: Who are the individuals in the clusters? What do they do? What marketing actions can be triggered for each?

Go live Finally, when the model is deemed satisfactory, it can be integrated into an operational environment to automate segmentation (CRM, automation). This step will be discussed in more detail in our next article.

Steps in the clustering process.

Things to remember

There are several types of clustering algorithms, each suited to different use cases.
A well-structured clustering pipeline, from data preparation to deployment, ensures actionable results.
Business interpretation of clusters allows for the translation of results into concrete actions (differentiated targeting, personalised recommendations, commercial prioritisation, etc.).

Do you need a segmentation engine? Contact us to discover a solution tailored to your data and your business objectives.

Article written by Ibtihal El Mimouni – Data Scientist at Smartprofile, currently undertaking a CIFRE thesis on the challenges of AI for more responsible marketing

References

[1] MacQueen, J. (1967). “Some methods for classification and analysis of multivariate observations”. In *Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics* (Vol. 5, pp. 281-298). University of California Press.
[2] Rdusseeun, L. K. P. J., & Kaufman, P. (1987). “Clustering by means of medoids”. In *Proceedings of the statistical data analysis based on the L1 norm conference, Neuchâtel, Switzerland* (Vol. 31, p. 28).
[3] Ng, R. T., & Han, J. (2002). “CLARANS: A method for clustering objects for spatial data mining”. *IEEE transactions on knowledge and data engineering*, 14(5), 1003-1016.
[4] Zhang, T., Ramakrishnan, R., & Livny, M. (1996). “BIRCH: an efficient data clustering method for very large databases”. ACM sigmod record, 25(2), 103-114.
[5] Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
[6] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). “A density-based algorithm for discovering clusters in large spatial databases with noise”. In KDD (Vol. 96, No. 34, pp. 226-231).
[7] Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: “Ordering points to identify the clustering structure”. ACM Sigmod record, 28(2), 49-60
Campello, R. J., Moulavi, D., & Sander, J. (2013). “Density-based clustering based on hierarchical density estimates”. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Berlin, Heidelberg: Springer Berlin Heidelberg.

Process the potential of your data
and make the right decisions to take action.

Request a demo

Autumn – Winter 2024 Release: Discover the latest innovations from Smartprofile!
The arrival of winter means new things are happening at Smartprofile. As [...]

Read more
Spam Trap! How to avoid them?
This article is from the Litmus blog, available here [...]

Read more
Here are a few tips for developing your Facebook application: * **Understand your audience:** Who are you trying to reach with your app? What are their needs and interests? Tailoring your app to your target audience will increase engagement. * **Focus on user experience (UX):** Make your app intuitive, easy to navigate, and visually appealing. A good UX will keep users coming back. * **Leverage Facebook's platform features:** Integrate with Facebook Login, the Graph API, and other tools to provide a seamless experience and gather valuable data. * **Promote your app:** Share your app with your network, use Facebook's advertising tools, and encourage users to share it with their friends. * **Analyse and iterate:** Use Facebook Insights and other analytics tools to track your app's performance. Identify what's working and what's not, and make improvements accordingly. * **Keep it simple:** Start with a clear core functionality and build upon it. Avoid overwhelming users with too many features at once. * **Consider performance:** Ensure your app loads quickly and runs smoothly. Slow or buggy apps will deter users. * **Be mindful of privacy:** Handle user data responsibly and transparently. Comply with Facebook's platform policies and privacy regulations. * **Engage with your users:** Respond to feedback, answer questions, and build a community around your app. * **Test thoroughly:** Before launching, test your app on different devices and browsers to identify and fix any bugs.
Developing a Facebook application for your brand's page is not [...]

Read more
Action emailing: Final reminder before unsubscribing
In these difficult times for email marketing where on the one hand [...]

Read more
Doing without third-party cookies
Having become a term used by everyone, the term [...]

Read more
Keep in touch
Pourquoi les entreprises devraient-elles communiquer régulièrement ? Entretenir un lien [...]

Read more
Clustering: a practical use case in tourism
In our first article, we discovered what the [...]

Read more
Why the arrival of iOS 15 is a small (r)evolution for email marketing
Since Monday 20 September 2021, it's here! #iOS15 [...]

Read more
What is Marketing Automation?
Marketing automation is not just a toolkit [...]

Read more

Reminder: what is clustering?

Types of clustering

Clustering pipeline: implementation method

Things to remember

You might also like