Building Successful AI/ML Teams

Welcome back to another episode in SpringML’s Podcast Vision 2020.

Today, we are continuing our conversation with SpringML’s Chief Technology Officer and co-founder Girish Reddy.  In this episode, Girish shares his perspective around building development teams for AI ML and advanced analytics use cases.

Episode Transcript

Megan- Good morning, Girish, delighted to take this time to discuss and get your point of view. I’m building development teams for AI & ML and advanced analytics use cases. So curious, how have you gone about building the development team at SpringML? 

Girish- Over the last five years, we’ve learned a bunch of things, and I believe we’ve been successful in putting together an incredible team that can learn new technologies and meets customer requirements in this ever-evolving fast-changing machine learning and data analytics world.  

If we look back at our journeys and how customers have evolved, what we realized is, along with the technology around machine learning & data analytics, data warehousing customers have evolved as well. They have become more sophisticated and they understand a lot better about machine learning. They have built internal teams, not all of them, but several of them have built an internal team so that they come to the table, customers come to the table with their own point of view, and they certainly expect us to bring that expertise.

So to meet those changing demands, what we have done is look back at how a successful project will look like for SpringML, leveraging Google technologies, and implementing the solutions for customers. 

The number one thing is understanding the customer’s domain. What kind of business do they run? That’s something that we have heard from customers that we should be aligned to, understand their business objectives and understand their data. That would be the next thing. Before jumping into any solution or any machine learning model development, what kinds of data resources do they have? What type of analysis can be done, and what kind of models can eventually be developed?

So given those learnings, the team that we usually put together and the candidates and the talent that we tend to hire do need to have those kinds of skills. Being able to understand a customer’s business, being able to understand their data, and then dive into machine learning development.

Megan- When you’re looking for profiles to help build out your team to explore AI & ML use cases within your organization. What are some of the backgrounds or key components on the CV that you look at?

Girish-  The team starts what we call as a technical project manager. This person is also wearing multiple hats. They manage the project and manage the customer relationship. But even they help define what the business needs are, what the business goals are etc. This is a person with at least 10 to 12 years of experience in the business of data, not necessarily machine learning but in the business of data and analytics so they can understand what the customer is after. So that they help define that roadmap and understand the customer’s requirements. 

The second important role is that of the data scientist. So now, this is the person that understands the technical capabilities, technical tools from Google stack that includes AutoML, TensorFlow Etc. So this is a person that when we look to hire a data scientist. We look for a person that has very strong fundamentals in statistics, has done a lot of recent model development using a diverse toolset that is very important to us, having a diverse background and experience in many different tools.

The toolset in this area is constantly evolving and so being able to pick up new tools, learn new tools, and then finalize on a particular toolset that best meets a customers’ problem is very crucial. So having that diverse skill-set and a background as a data scientist is very important in a good foundation. But having worked with many tools is always a plus.

And the final team member that forms the team is machine learning operations(MLOps). This is the person that helps operationalize the machine learning model. So once the data scientist builds the model, test the model; if the customer is happy with it, we want to put that into production. So now, we have to operationalize that model, and ML engineer fills that role.

Megan-  So when you look in a managed Services team to help with what we were calling MLOps or maintaining those ML models and production, are there other profiles that we have within SpringML to support?

Girish- Yeah, I mean managed services are a slightly different team that is put together. Obviously. This is one where the models are already developed, and we want a team that can help support these models in production and show that the customer’s applications or users can consume these models daily.

So the skill set that there would be if you might still require somebody who has data science skills but may not be somebody who is very experienced or a rockstar data scientist. On the other hand, you need someone who understands enough of Data Science so that they know that when a model is retrained, they know what to look for, they know whether the model is continuing to perform well and if not know enough to tweak the model so that it meets and continues to meet those SLA’s and requirements around accuracy.

On the other hand, data engineering and ML engineering, data integration skills are also vital for a successful managed services team. Where because the model is now integrated and being executed by various clients and applications of being able to monitor those pipelines and proactively addressing any issues. Whether at a performance level or just error handling skills. So that kind of skill-set becomes more critical in a managed services environment.

Megan- I was wondering if you could share an example for our audience maybe touch on ML Ops in particular as you talk about the talent and building out teams to take analytics to the next level by adopting AI & ML technologies.

Girish- So we have several examples from the most recent one of a mining company where we have developed a few different models. These are models that we developed to identify anomalies in the customer’s data so that we can proactively inform people about any potential defects potential failures in their machines, trucks, and equipment. So these were the models we built by consuming a large amount of data? And once the models were put into production, we put together a managed services team, this team comes on board and understands the models that were built from the original data science. The data scientist understands how the models were built and how the model was put into production, but the responsibility of the managed services team now becomes how the model can continue to leverage the new data that is becoming available. Help retrain the models to make sure that the models continue to improve in overall accuracy. Ensure that whatever the models are predicting daily predicting the failures and such are, in fact, accurate. Sometimes, between how the models were developed and how they are eventually put into production, there may be a slight difference in the overall accuracy.

So the managed services team especially when a model is newly put into production should take care to see that the model is performing according to customers’ expectations. In other words, not performing poorly, not providing too many false positives and such so that the customer can continue to rely on the model predictions, and if it does for whatever reason for performing poorly, maybe emitting too many false positives immediately jumping to analyze the root cause, tweak the model whether it is tuning the model or future engineering. Whatever is available at their disposal to quickly put in the fixes so that the model is back on track.

Megan- What is the role of this New Concept MLOps in supporting the team to bridge the gap between model development or DevOps, like DevOps and operations that is production? Is there anything new, or is it stealing from the kind of classic DevOps processes?

Girish- There are a few differences. The machine learning ecosystem is developing different tools and techniques in general, and Google technology in specific examples. There are things like AI hub from Google, which encompasses tools like Kubeflow that allows us to deploy machine learning models in production. It has these tools to provide the benefit of years and years of research & actual experience of deploying models in production by Google within their internal product teams.

So it takes into account several things around, how a model should be deployed? How should a model be retrained? However, a new model should be promoted to production model versioning and so on and so forth. So very machine learning model-specific pipelines are those are things that are different from a typical deployment life cycle, and we tend to leverage those tools as in when we can.

Megan-  That’s really fascinating. Thank you so much for your time today and for sharing your expertise in building out development and deployment teams for AI & ML and also this exciting new topic, MLOps.