The Google team is known for their innovation and development, may it be new products or the improvement of older ones. When the team introduced Machine Learning capabilities in Big Query at Next’18, everyone was pretty excited to try it.
We all know that innovation drives the industry, so we at SpringML got very excited when Google approached with the news that they are extending the BQML supported models to include K-Means alongside the Linear and Logistic Regression. SpringML played a key role in the announcement by developing a full case study and implementation guide for the usage of the models as well as feature selection for the models trained.
Three demos were developed for common use cases using bigQueryML. Mock data was generated as per the use case and schema mentioned below:
- Customer is a B2B office supplier: OS Inc
- OS Inc’s customers are stored in CRM in Account object; one customer is DIY Inc
- DIY’s employees can login to OS’s website to order supplies
- Customer’s employee information is stored in User object in CRM
- Website activity of each user is tracked in GA360
- User in GA360 can be tied back to each CRM User which in turn can be traced back to Account information
The three demos that were developed are:
- Customer Segmentation using K Means Clustering was done with the aim to be able to understand underlying correlation amongst the customer’s users
- Lifetime Value Classification using Logistic Regression implemented across 3 classes (depending on the priority of a customer’s user) in order to predict the future value of a customer
- Conversion Prediction using Binary Classification was done with the aim of predicting if a customer would sign-up for membership within the first 3 months of usage
All the cases implemented were not only interesting but also quite relatable to real life scenarios. The process helped us realize that BQML not only helped reduce time of model delivery but also helped to speed up the process of data cleaning and interpretation with tools like Data Prep and Data Studio to compliment it.
Customer Segmentation
Customer Segmentation is an important process for marketing and strategies allowing the concerned authorities to create specifically aimed and more relevant schemes based on the general similarity between customers.
This use case was implemented by using the K Means algorithm which was rolled out in NEXT’19. The features used by us to train this case were Annual Revenue of account, Time on screen of the users, Unique screen views of the users, Lifetime value of the users and Age of the users. All these features were either selected from the GA360 or CRM datasets. The below flowchart shows the process followed.
Going through the various k values we found that the best clustering was observed at the k value of 9 determined using elbow method on the graph below.
Customer Lifetime Value
Customer Lifetime Value (LTV) is an important metric used to forecast the total future worth of a customer. This practice helps an organization to value customer, not just based on current standing but also help organization determine their future worth leading to better customer tapping and revenue generation.
BQML, being versatile, has a Logistic regression model to perform a multiclass classification. We code to classify users into 3 classes: Small, Medium and Large. The features used in the case are Time on screen of the users, Unique screen views of the user, Screen resolution usually used by the user, Loyalty program value of the user, Country code of the user and the age of the user. The model was created using the following steps.
The final model achieved an accuracy of about 71% which is really exceptional considering the ease of model generation as well as the speed of model creation from. Not only did BQML help generate the model in a matter of seconds from the data but also Google being an AI first company, helped the model creation with no extra time needed to perform hyper parameter tuning.
Conversion/Purchase Prediction
Conversion/Purchase Prediction predicts if a user would potentially sign-up for a membership program that the account might offer. Membership for a customer has a cost but provides additional benefits to them. It is also in the company’s interest to provide membership options since it will help streamline operations and secure recurring revenue.
In this use case we used the Logistic regression model to create a binary classifier which segregated users as high and low potential to convert into signed-up customer within 3 months of their first usage based on the changes in their monthly usage patterns. The features used in this case were Total time on screen by the user for the first three months, Unique screen views of the user, Lifetime value of the user and Age of the user. The model generation was done using the same steps as in the use case discussed earlier.
Using the BQML’s architecture we were able to create a model with 75% accuracy in a matter of few minutes.
Finally, based on our experience of BQML we are very thrilled about the unlimited potential it brings to the field by not only being able to bring data and ML to the same grounds but also by helping bridge the gap by allowing the same manipulation languages handle both the pillars of data analytics. We at SpringML are completely ready for these cases and are looking forward to using this amazing tool to help various association realize the value BQML brings to the table.
Finally, based on our experience of BQML we are very thrilled of the unlimited potential it brings in the field by not only being able to bring data and ML to the same grounds but also by helping bridge the gap by allowing the same manipulation languages handle both the pillars of data analytics. We at SpringML are completely ready for these cases and looking forward to using this amazing tool to help various association realize the value BQML brings to the table.