As in many other businesses, knowledge of the customers’ needsis crucial for successuture customers are one of the main factors, if not the most important, for the growth of the business,urrent customersthe business revenue, and prior customers once drove the business model. Learning the behaviour that makes a customer remain as such and figuring out the signs that made a customer leaveare much more important tasks than devising techniques to acquire new customers it is cheaper, from a business cost perspective, to focus on retention rather than acquisition.
In particular, the number of customers who discontinue the service within a given period is known as the churn rate (or attrition rate). Any business that wants to grow must keep a growth rate higher than the churn rate (as the net rate will be the difference of these two). Therefore, when addressing the potential directions that a business can take the objectives are double, aim at keeping the current customers while also increasing the growth rate, and work to reduce the attrition rate.
Identify the reasons among the different units that are leading to the current attrition rate and devise actions to address the problems. For example, if the policy of the business is to cancel the subscription of a customer if the payment of the card fails (e.g., the card is expired), a plausible solution would be to keep trying for few days before actually cancelling the customer’s subscription (this will incur in a cost for the business but it is likely to be lower than losing the customer).
Design and develop solutions to identify the likelihood of each customer (or a subgroup) to leave or cancel the services in an automated manner (otherwise it is probably not cost effective anyways). For example, by examining the viewing patterns of customers in different types of devices we can identify which customers deviate from other similar ones who have left the service before and signal the retention business unit so that these customers receive an e-mail suggesting new content to watch.
We decided to leverage on machine learning and we built a predictive classification model whose output would be such scoring (probability) per customer, one probability for the likelihood of a customer to leave the service (churn) and another one, the difference up to one, for thecustomer to remain using the service. Our first task was to define churn in our context, which we defined as ”any customer who has no subscription in the next 30 consecutive days from the current day has officially churned”. With such definition and the prior data we had (about 2 years) our problem became one of those under the umbrella of supervised learning.
Our predictive model would have to learn, by means of some algorithm, the differences between an already identified churning customer, and another one who was identified as non-churning as per our definition based on some prior data, putting our problem under the category of supervised learning problems. In machine learning, a supervised learning model aims at inferring a mathematical function from prior and labelled sample data so that it can be later used to calculate (predict) the label for unseen samples.
The model we built turned out becoming a neuronal network whose details will come in a later post. And as with any other machine learning model, we trained and tuned our neuronal network with about 70% of the data we had (as per standard proportions in machine learning community) and evaluated it with about 20% of the remaining data (leaving 10% for final testing). We will also describe in the next part of this post the steps we took for such training as we had to figure out a way to solve the bias introduced by the imbalanced proportion of customers churning vs non-churning in our datasets which is another problem on its own in the community.
In the following illustration we show an architectonical overview of the different elements and steps involved. With the highlighting we are trying to depict the data source of the model that is in production because there can be multiple models and each one will require a set of features, not necessarily the same among the models nor sourced in the same data either.
All the training (and prediction) happened in the Google Cloud Platform, which is used at Bonnier Broadcasting for data insights. We mostly worked with its core data warehousing analysis tool, BigQuery, a popular multi-tenant serverless database that allowed us to not only store the data itself but also do pretty much any complex analysis of our large-scale datasets. We also used the next-generation data processing tool, Dataflow, which is based on Apache Beam, an open source unified programming model to define and execute large-scale data processing pipelines.
Finally, and for the most important part, we took advantage of the distributed environment for machine learning insights, Cloud Machine Learning Engine, based on TensorFlow, a powerful open source computational framework of stateful data-flow graphs specialised in processing and running neuronal networks among other complex tasks.
In the next part of this article, we will describe the technical parts in detail. It will include details on how we selected the relevant data we used to successfully predict the churn rate, how we engineered some of the features we used in our model, further details on the algorithm we used for the predictive model, and all the automatisation process we had to prepare to be able to provide to the business unit in the organisation with a daily report of the customers and their likelihood of churning (or not :).
Guillermo Rodríguez-Cano & Maryam Olyaei
We would like to thank our colleagues, Kristoffer Adwent, Erik Ferm and Marcus Olsson, for their joint work on this project, and Josef Eriksson for helping us understand the quirks of some of the features we have derived for the predictive model, and many other people of other teams who have helped with their comments and efforts.