Reddit explores Google Cloud for Efficiency and Scale

Imagine being the forebearer of Net Neutrality and in solidarity, all your user base stands by your stance to keep the internet open. It was hard to miss this picture when the “front page to the internet” turned RED recently to keep the fight going despite setbacks.

After all, this was not the first time Reddit users have stood up for a cause. Here’s a famous clip from Reddit co-founder Alexis Ohanian’s old TED Talk speaking about “Mr. Splashy Pants” and how they helped save the whales.

Reddit is home to thousands of communities organized around people’s interests. Registered members (known a “redditors”) submit content such as links, text posts, images and videos, which are then upvoted or downvoted by other redditors. Valued at $1.8B in July 2017 while raising a fresh $200M funding round, Reddit has 330 million monthly active users, ranking as the #4 most visited website i the U.S. and #8 in the world. Now imagine how reliable, flexible and agile a data infrastructure that supports this needs to be to keep over 330 million users engaged monthly!

Reddit is rightfully the  “front page to the internet” and as its popularity has spiked in the recent years, its  need to continuously innovate and improve on its data usage  has been the focus of its engineering team. While  cloud hosting costs play a role, nothing beats their requirement for performance,  desire to grow daily active users and remaining sticky with redditors.

To evaluate options for the increasing volume and performance needs, SpringML and Reddit partnered to prototype different solutions on Google Cloud. This allowed a decent benchmarking test and ultimately pieces of the prototype were integrated into the data pipeline at Reddit

Using an inbuilt visualization engine we analyzed terabytes of user data to determine user retention rates based on predefined cohorts. In doing so, we measured user retention by specific event types that triggered based on user interaction within Reddit website. Taking it a step further we created a messaging flow leveraging other components of the Google Cloud Platform  (such as Big Query) to look at retention data, identify places it changed and track down potential reasons or features or launches that correlate with them to provide feedback to the engineering team. All in all, these analyses allowed the team to measure and improve user retention so that they can keep more users on the platform over time.

“SpringML is a fantastic partner, delivering on a complete data pipeline allowing us to evaluate Google Cloud components at scale. They executed the project with expertise, passion, and agility, completing the project early and under budget!”

— Luis Bitencourt-Emilio , Principal Director of Engineering – Machine Learning, Reddit