In our previous article, we discovered what clustering is and how it can be useful for better segmenting one's data, particularly in marketing.
In this second part, we detail the implementation of a clustering approach: choice of algorithms, key steps of the process and applications to customer segmentation.
Reminder: what is clustering?
The Clustering is an unsupervised learning method that groups similar objects into clusters called clusters. In marketing, this allows for a better understanding of your customers, tailoring of campaigns, and personalisation of the experience.
For example, segmenting newsletter subscribers according to their engagement level (clicks, opens, conversions) allows for tailored messaging for each group.
Types of clustering
Here we present three types of clustering widely used in the field of marketing. For each type, you will find a technical presentation and a practical section with application examples.
Partitioning clustering
Partitioning clustering consists of dividing a dataset into $k$ distinct clusters, based on their similarity. Each observation is assigned to a single cluster, by seeking to minimise a measure of distance, for example, Euclidean distance, which measures the proximity between two points.
The best-known algorithm in this category is K-Means [1].
Limits This approach is particularly suited when the clusters are spherical in shape and of comparable sizes. However, it is not very performant in the presence of noisy or non-linearly separable data.
Use case
- To get an initial overview of users and a macro-level reading of their engagement level. For example, segmenting website visitors based on their online behaviour (frequency of visits, session duration, interactions with ads) would make it possible to identify a difference in intent: informational or transactional.
- Identify occasional visitors (those who visit once, explore little, and don't convert) and loyal customers (those who return often and make purchases).
Examples of algorithms: K-Means [1] , K-Medoids [2], CLARANS [3]
Partition clustering. Left: partition into two clusters; right: partition into three clusters.
Hierarchical clustering
Hierarchical clustering builds a hierarchy of clusters in the form of a tree, called a dendrogram. Unlike partitional clustering, it doesn't require specifying the number of clusters in advance.
Limits This method can explore the data structure at multiple levels, but it can become computationally expensive on large datasets.
Use case
- Analysing customer journeys: grouping can reveal segments according to acquisition channel (email, social media, SEO). For example, those arriving via SEO spend more time on the website, which can indicate information seeking, while those from emails primarily visit product pages and leave the site quickly, which reflects a more direct purchase intention.
- Identify similar customer behaviours and create sub-segments based on their behavioural proximity. For example, visitors from social networks may, after several visits, adopt similar behaviours to those from SEO. Hierarchical clustering can highlight this behavioural proximity, allowing for more tailored content to be delivered based on their exploration phase, even if they initially come from a promotional acquisition channel.
Examples of algorithms: BIRCH [4], DIANA [5].

Dendrogram illustrating hierarchical clustering. The points (labelled A to Z) are grouped according to their similarity.
Clustering by density
Density-based clustering groups data points based on areas of high density, separated by areas of low density. Outlier points, located in areas of low density, are considered noise. This approach does not require specifying the number of clusters in advance.
The best-known algorithm in this category is DBSCAN.
Limits This method is well suited for noisy or non-linearly separable data. On the other hand, the choice of parameters can be sensitive.
Use case
- Automatically detect groups of atypical behaviours on a website (like unusual visits or bots).
- Identify micro-communities of users with similar trajectories, even if they are in the minority.
Examples of algorithms: DBSCAN [6], HDBSCAN [7], OPTICS [8].

Density-based clustering. Points are grouped according to their local density. Points in low-density areas are considered noise (unclassified).
Clustering pipeline: implementation method
Having defined the main types of clustering algorithms, let's now look at how to implement them in a project.
A clustering pipeline resembles a structured sequence of interdependent steps that transforms raw data into actionable clusters.
Here is a step-by-step process for implementing clustering in a marketing strategy:
- Data preprocessing: This is about cleaning your CRM or web data:
- Remove duplicates, correct missing values.
- Encode categories (e.g. «acquisition source» into numerical variables).
- Scaling of variables This involves putting all variables on a comparable scale, preventing one variable from dominating others in the calculation. For example, if one of your variables is measured in seconds (time spent on the site, ranging from 300 to 600 seconds) and another in number of clicks (ranging from 1 to 10), the clustering algorithm might give more importance to the former because its values are expressed on a larger scale (hundreds versus tens). By normalising this data, you bring all variables back to the same scale (between 0 and 1), allowing the algorithm to evaluate each variable fairly and produce more reliable clusters.
- Algorithm choice The choice of clustering algorithm depends on the nature of your marketing data and your analysis objectives. For example:
- K-Means It's ideal if you're looking to segment your customers into distinct groups, for example, by their purchase frequency or average basket size. It works well when the groups are compact and separate.
- DBSCAN is better suited if your data contains noise, for example, to identify customers with unusual purchase journeys or atypical behaviours.
- The Hierarchical clustering allows for more exploratory analysis. It is useful if you want to visualise different levels of segmentation. It produces a kind of tree that helps you decide at which level to cut to define your segments. For example, it can reveal that a group of regular customers then splits into big buyers and small buyers, depending on their average basket size.
- Validation of results Once the clusters have been obtained, it is essential to evaluate their coherence using indicators such as the silhouette score, which indicates how well each customer is associated with their segment compared to others. For example, if your marketing segments are poorly defined (customers too close between segments), this score will be low. By analysing these results, you can adjust certain parameters (fine-tuning), such as the number of clusters$k $, to refine the segmentation and better reflect your customers' actual behaviour.
- Cluster interpretation: We analyse the characteristics specific to each group to understand their meaning: Who are the individuals in the clusters? What do they do? What marketing actions can be triggered for each?
Go live Finally, when the model is deemed satisfactory, it can be integrated into an operational environment to automate segmentation (CRM, automation). This step will be discussed in more detail in our next article.

Steps in the clustering process.
Things to remember
- There are several types of clustering algorithms, each suited to different use cases.
- A well-structured clustering pipeline, from data preparation to deployment, ensures actionable results.
- Business interpretation of clusters allows for the translation of results into concrete actions (differentiated targeting, personalised recommendations, commercial prioritisation, etc.).
Do you need a segmentation engine? Contact us to discover a solution tailored to your data and your business objectives.
Article written by Ibtihal El Mimouni – Data Scientist at Smartprofile, currently undertaking a CIFRE thesis on the challenges of AI for more responsible marketing
References
- [1] MacQueen, J. (1967). “Some methods for classification and analysis of multivariate observations”. In *Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics* (Vol. 5, pp. 281-298). University of California Press.
- [2] Rdusseeun, L. K. P. J., & Kaufman, P. (1987). “Clustering by means of medoids”. In *Proceedings of the statistical data analysis based on the L1 norm conference, Neuchâtel, Switzerland* (Vol. 31, p. 28).
- [3] Ng, R. T., & Han, J. (2002). “CLARANS: A method for clustering objects for spatial data mining”. *IEEE transactions on knowledge and data engineering*, 14(5), 1003-1016.
- [4] Zhang, T., Ramakrishnan, R., & Livny, M. (1996). “BIRCH: an efficient data clustering method for very large databases”. ACM sigmod record, 25(2), 103-114.
- [5] Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
- [6] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). “A density-based algorithm for discovering clusters in large spatial databases with noise”. In KDD (Vol. 96, No. 34, pp. 226-231).
- [7] Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: “Ordering points to identify the clustering structure”. ACM Sigmod record, 28(2), 49-60
- Campello, R. J., Moulavi, D., & Sander, J. (2013). “Density-based clustering based on hierarchical density estimates”. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Berlin, Heidelberg: Springer Berlin Heidelberg.


