In our  first article, we discovered what clustering is and how it can be useful for better segmenting one's data, particularly in marketing.

In the Second article, In this third part, we will present a concrete use case in the tourism sector, explain visualisations from our model and discuss some operational aspects.

Reminder: what is clustering?

The Clustering is an unsupervised learning method that groups similar objects into clusters called clusters. In marketing, this allows for a better understanding of your customers, tailoring of campaigns, and personalisation of the experience.

Segmenting purchasing behaviour in tourism

In a competitive market like travel, understanding when and how customers book their stays is a major challenge for adapting communication plans, optimising recommendations, and anticipating busy periods.

Using booking data from a tourism operator, including purchase date, departure date, duration, price, etc., as well as data from online behaviour via our Web Analytics module, we sought to group customers according to their purchasing behaviour:

  • Do they book far in advance or at the last minute?
  • Do they prefer to go in the summer or off-season?
  • Do they shop on weekdays or at the weekend?
  • Do they prefer short or long stays?

To identify typical customer profiles according to their booking habits, we applied a clustering method in order to discover customer segments that are homogeneous in their purchasing behaviour.

The approach followed replicates the pipeline detailed in our previous article: data cleaning, creation of key variables, algorithm selection, and interpretation of the obtained clusters.

Seasonal purchasing behaviours?

In the example we are going to present (partly inspired by a client case), we have identified four clusters reflecting differentiated booking behaviours.

Seasonality is a determining factor in travel purchases. We therefore began by observing when, during the year, the different identified groups book their stays.

The heatmap below allows us to visualise these trends.

It shows very contrasting seasonal dynamics:

  • Cluster 0: Purchases concentrated in January and February, typical of customers who anticipate their holidays far in advance.
  • Klyster 1: photo taken in July, indicating last-minute purchasing behaviour.
  • Cluster 2 strong activity in spring and early summer (April to June), likely corresponding to summer stays.
  • Cluster 3: purchasing concentrated in September and October, indicative of off-season travel.

 

Explanatory variables: what differentiates profiles

Before assigning a label to each cluster, it is necessary to understand what distinguishes them behaviourally. To do this, we used a radar chart, which allows us to compare the groups according to key variables such as: number of days of anticipation, purchase season, duration, price, type of stay, or the chosen region.

 

Data Analysis Criterion Clustering

Here are the four typologies that we have identified:

  • Cluster 0: Very high score on advance booking days. Bookings in autumn and winter, for long and expensive stays. These are travellers who anticipate their summer stay early.
  • Cluster 1: Late bookings, often just before departure. Short stays, over a weekend. Behaviour sensitive to availability.
  • Cluster 2: Moderate bookings during the summer season. Medium duration, intermediate prices. This segment often reflects families or working individuals taking their holidays in the summer.
  • Cluster 3: Booking in autumn for a moderate duration and price. These travellers are looking to avoid crowds, take advantage of better rates, or travel outside of peak periods.

The analysis can also be enriched by integrating other dimensions, depending on their availability, such as: the purchase channel (website, mobile phone), the frequency of travel in the year, the average basket history, etc.

Adding relevant variables enriches the analysis and makes segments more representative and useful for activating targeted marketing actions.

Cluster Interpretation

  • The Radar chart of behavioural variables highlights the differences on key criteria such as anticipation, duration, price, or season. heatmap of bookings by month shows when each segment makes their purchases. By cross-referencing these two visualisations, we refined the business interpretation of each cluster:
    • Cluster 0 / Early planning: Very high score on days_in_advance, purchases in January/February, long stays often in summer or winter.
    • Cluster 1 / Last-minute purchase: Bookings concentrated in July, short stays over weekends.
    • Cluster 2 / Summer Voyage Moderate bookings in May and June, average duration, summer seasonality.
    • Cluster 3 / Off-season voyage: Purchases in September and October, longer duration, moderate prices.

 

Each cluster becomes an actionable segment, around which campaigns can be activated and offers personalised. For example:

  • Advance planning booking often at the start of the year, for long stays.
    • Recommended actions:
      • Highlighting extended availability and customisation options (direct flights, room with a view, etc.)
      • Communication campaigns from January
      • Integration into a premium loyalty strategy
  • Last-minute purchaseQuick decision, often in July, for immediate departure on a weekend.
    • Recommended actions:
      • Highlighting short, accessible, and immediately available stays
      • Push notification or email with 48-hour flash offers or departures this weekend

These visualisations (heatmap, radar) are just examples of potential tools for exploring purchasing behaviours. Other graphics can also reveal more subtle patterns and further refine the interpretation of clusters.

We discuss this in more detail in the video from our webinar dedicated to clustering applied to marketing. Stay tuned!

From analysis to production: model deployment and monitoring

Once the clusters have been identified, the next step is to deploy the model into an operational environment so that it can be used in marketing campaigns and CRM dashboards.

Deployment in the SmartProfile platform

At Smartprofile, clusters are integrated directly into our distributed marketing platform. Specifically, these segments are:

  • Synchronised in the segmentation engine from the platform and made available to users to help them effectively target their campaigns.
  • Activatable via planned campaignsFor example, a user could choose to send a special offer to «last-minute shoppers» or schedule a campaign for «planners».
  • Visualisations in dashboards The user can view the breakdown of their contacts by behavioural segment, track their evolution, and adjust their messages accordingly.
  • Usable in automation scenarios For example, as soon as a customer exhibits behaviour typical of a segment, they can be automatically enrolled into an email or SMS sequence tailored to that profile.

Behavioural drift tracking

Over time, certain customer behaviours can evolve. For example, a segment initially identified as a «last-minute buyer» may book their reservations further in advance. New behaviours can also emerge which no longer fit within existing segments. This is why it is essential to regularly monitor the relevance of the model. This is known as Drift : this refers to the fact that the current data no longer corresponds to that on which the model was trained. Segmentation could become less reliable or less representative. To detect these changes, certain indicators are particularly useful:

  • Does a segment become too dominant or disappear?
  • Do the clusters remain well separated?
  • Do the distributions of key variables change? etc.

To ensure the robustness of the model, it is important to retrain the model regularly by incorporating new booking data. We recommend retraining every 3 to 6 months, depending on purchasing frequency and seasonal variations in the industry.

This allows for the integration of recently observed new behaviours, such as the emergence of a new type of traveller or a trend in booking differently, and for the correction of model deviations.

Things to remember

  • Understanding customer behaviour allows for more relevant campaigns at the right time, through the right channel.
  • Visualisations (heatmaps, radar charts, etc.) are powerful tools that offer an intuitive reading of customer segments, helping to better understand their behaviours and adapt marketing actions.
  • To ensure performance, it is essential to monitor the evolution of behaviours and retrain the model regularly.

Article written by Ibtihal El Mimouni – Data Scientist at Smartprofile, currently undertaking a CIFRE thesis on the challenges of AI for more responsible marketing

Your data has potential! Contact our teams to support you in structuring an approach tailored to your marketing challenges.

References

  • [1] MacQueen, J. (1967). “Some methods for classification and analysis of multivariate observations”. In *Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics* (Vol. 5, pp. 281-298). University of California Press.
  • [2] Rdusseeun, L. K. P. J., & Kaufman, P. (1987). “Clustering by means of medoids”. In *Proceedings of the statistical data analysis based on the L1 norm conference, Neuchâtel, Switzerland* (Vol. 31, p. 28).
  • [3] Ng, R. T., & Han, J. (2002). “CLARANS: A method for clustering objects for spatial data mining”. *IEEE transactions on knowledge and data engineering*, 14(5), 1003-1016.
  • [4] Zhang, T., Ramakrishnan, R., & Livny, M. (1996). “BIRCH: an efficient data clustering method for very large databases”. ACM sigmod record, 25(2), 103-114.
  • [5] Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
  • [6] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). “A density-based algorithm for discovering clusters in large spatial databases with noise”. In KDD (Vol. 96, No. 34, pp. 226-231).
  • [7] Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: “Ordering points to identify the clustering structure”. ACM Sigmod record, 28(2), 49-60
  • Campello, R. J., Moulavi, D., & Sander, J. (2013). “Density-based clustering based on hierarchical density estimates”. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Berlin, Heidelberg: Springer Berlin Heidelberg.

Process the potential of your data
and make the right decisions to take action.

You might also like