Predictive modeling approach

Several companies are focusing on predictive analytics. Some offer a platform while others offer an out of the box solution.  Platform companies say ML is easy, rapid, etc.  On the other hand off the shelf software providers tend to focus on how complex their models are. Of course there could be several right ways to achieve good predictive results.

Our approach to ML is to use the best tool and algorithm that tackles the customer’s business problem and provides accurate results and that can also be easily understood.  Unless models are interpretable, they will not be trusted or used.  We dont necessarily have to start bottoms-up, and implement an algorithm from scratch every time.  There are building blocks available in R and Python that we can take advantage of.


We don’t like black boxes.  We’ve learned that business users trust a model if we’re able to explain the results of the model and if it makes intuitive sense.  By now several of you may know of the common statistic discussion that correlation does not imply causation.  There are several amusing examples floating around on the internet on this topic.  One hilarious graphic correlates the US housing index and the crash of 2008 with number of babies named Ava in the US.  The general point being it’s important to know how models and algorithms work in order to interpret the results correctly.

I remember my computer science professor in college cautioning us that just because you know Linked Lists doesn’t mean you need to use that on every computer programming project.  In a lot of cases a simple array would do the trick.  Similarly in ML sometimes basic techniques are surprisingly effective.  Algorithms are abundant but they are garbage in garbage out.  And the critical factor is the need to understand the business domain and data.   It’s also important to know the intricacies of cleaning the data, feature engineering, model testing & validation and interpreting the results.  More on this process in a future blog.