AzureML model for customer segmentation

As mentioned previously, we are approaching the customer segmentation problem holistically with a view to provide an end to end solution.  This end to end solution comprises of three components.

  1. Data preparation and enrichment. Any complex enterprise landscape comprises of multiple systems, each performing a specific function.  There could in fact be more than one system performing the same function, perhaps due to a merger or an acquisition.   For example, each business unit within the company could have a different CRM system.  We are able to leverage industry leading technologies that specialize in integration and ETL (Extract, Transform and Load) to bring this data together.
  2. Predictive modeling.  We don’t believe in reinventing the wheel here.  The decades of research has resulted in several techniques that have proven to work in the field.  We used this research as our foundation and have been developed a library of models that we can apply for each customer use case.  These models bring together the foundational research that’s publicly available as well as our expertise and methodologies to apply sound statistical principles to clean the data, fit the right model, test and train the model and finally evaluate it.  AzureML provides a great platform to get models up and running quickly.  See this post for our initial experiences with AzureML.
  3. We rely on the awesome visualization features of Salesforce Wave to display information that a user can consume easily.

We now have an initial version of a working model published on AzureML’s Studio Gallery.  This is an implementation of the high level flow discussed in our previous article.

Step 1: Calculate R, F and M parameters. Here’s how the R, F M parameters were calculated in this experiment.  It was done primarily in R.


Here’s the output of this step.


Step 2: Apply k-means clustering algorithm on these parameters to group similar customers.

  • Note the input values to this algorithm have to be continuous variables.
  • K-means method is a non-hierarchical method and a very popular approach for classification because of its simplicity of implementation and fast execution and has been widely used in market segmentation
  • The number of clusters can be determined by using the elbow method

AzureML has an out of the box clustering component.  Here’s how it was configured.



Step 3: Apply classification algorithms such as Logistic Regression and Decision Trees to predict future customer behavior.

  • This will be a multi-class classification problem with the number of classes corresponding to the number of clusters from the previous step.
  • Use any customer attributes such as age, gender, region, etc as independent variables in the model
  • Finally, here’s how the multi-class logistic regression algorithm was applied.  This was done after merging customer data with the clusters.  This will now provide the ability to predict customer segment given customer demographic data