Figuring Out Which Customers May Leave

CASE STUDY:

Predicting if customers will churn based on demographic data and cost data from CRM

The aim is to use various features from a customer CRM list to accurately predict behaviour to retain customers. Dataset comes from a Kaggle data set here. Code used to explore the data and here. Churn Rate is defined to be the rate at which customers stop subscribing to a service.

Three different models were used and from that, a model was chosen with an accuracy of 78%, precision of 60% and recall of 56% for the churned customers. We were also able to identify key features that are important in determining if a customer will leave or not. Monthly charges as well as customers having month to month contracts was an important feature for both the logistic regression and random forest models. There was no analysis done here on profitability.

Defn (Accuracy): The percentage of correctly classified predictions of the model

(Correctly Classified Predictions / Total Predictions)

Defn (Precision): The ratio of predictions correctly classified as X to the predictions classified as X (Correctly and Incorrectly)

(Correctly Classified as X / Total Predicted to be X)

Defn (Recall): The ratio of predictions correctly classified as X to actually number of outcomes that are X

(Correctly Classified as X / Total No. of X outcomes)

Data Analysis:

The following features were used in the data set with a total of 7024 different users, of which 1869 reported to have churned.

    Gender
    Teneur
    Online Security
    Streaming TV
    Payment Method
  


    Senior Citizen
    Phone Service
    Online Backup
    Streaming Movies
    Monthly Charges
  

    Partner
    Multiple Lines
    Device Provider
    Contract
    Total Charges
  

    Dependents
    Internet Service
    Tech Support
    Paperless Billing
    Churn
  

Gender	Teneur	Online Security	Streaming TV	Payment Method
Senior Citizen	Phone Service	Online Backup	Streaming Movies	Monthly Charges
Partner	Multiple Lines	Device Provider	Contract	Total Charges
Dependents	Internet Service	Tech Support	Paperless Billing	Churn

Of the customers, 26.5% have churned. It is this set of customers we aim to accurately predict with the features mentioned above. The following table of categorical features show what value of each feature has the highest churn rate.

    Feature
    Value
    Churn Percentage
  


    Payment Method
    Electronic Check
    45%
  

    Contract
    Month to Month
    43%
  

    Tech Support
    No
    42%
  

    Internet Service
    Fiber Optic
    42%
  

    Online Security
    No
    42%
  

    Senior Citizen
    Yes
    42%
  

Feature	Value	Churn Percentage
Payment Method	Electronic Check	45%
Contract	Month to Month	43%
Tech Support	No	42%
Internet Service	Fiber Optic	42%
Online Security	No	42%
Senior Citizen	Yes	42%

Violin Plots are useful to investigate the distribution of Churned and non-churned customers.

Looking into the violin plots we can see a clear disparity in distributions for Monthly Charges and Tenure. Of the Churned customers, there is a higher proportion of people with a lower tenure and higher monthly fee. In the Total Charges curve, it is similarly distributed for both churned and non-churned customers. This is likely due to customers with a lower tenure having higher monthly charges resulting in a similar total charge value to the non-churned customers who have low monthly charges but longer tenure. This is demonstrated in the scatter plot.

Logistic Regression

Initially take a look at the performance of a logistic regression.

Accuracy: 78%
An accuracy of 78% is certainly not bad however, in this instance it is more important to predict if the user has Churned correctly. Thus a recall of 48% is a poor performance.

    
    Precision
    Recall
    f1-Score
    Support
  


    Not Churned
    84%
    88%
    86%
    1545
  

    Churned
    61%
    53%
    57%
    562
  

	Precision	Recall	f1-Score	Support
Not Churned	84%	88%	86%	1545
Churned	61%	53%	57%	562

Feature Importance for Logistic Regression:

Using the coefficient to determine the feature importance. Facture importance is a process of finding the most important features in a model. In this instance it is looking at the coefficient which essentially looks as the change in probability of churning is there was a change in feature.

In the logistic regression, the most important features are

Thus from the logistic regression, these above features should be flagged as signals of someone more likely to churn.

    Feature Importance
  
    Contract: Month to Month
  
    Contract: Two Years
  
    Online Security: No
  
    Payment Method: Check
  
    Paperless Billing

Feature Importance
Contract: Month to Month
Contract: Two Years
Online Security: No
Payment Method: Check
Paperless Billing

Random Forest:

Let’s increase model complexity and investigate the performance of a Random Forest.

Accuracy: 76%
Clearly a decrease across the board. Especially in recall of churned users.

    
    Precision
    Recall
    f1-Score
    Support
  


    Not Churned
    81%
    88%
    85%
    1550
  

    Churned
    57%
    44%
    49%
    557
  

	Precision	Recall	f1-Score	Support
Not Churned	81%	88%	85%	1550
Churned	57%	44%	49%	557

    Feature Importance
  
    Total Charges
  
    Monthly Charges
  
    Tenure
  
    Contract: Month to Month
  
    Online Security: No

Feature Importance
Total Charges
Monthly Charges
Tenure
Contract: Month to Month
Online Security: No

Feature Importance for Logistic Regression:

Just running through the exercise to see if we have an overlap in feature importance with the logistic regression.

The Random Forest clearly gives more weight to the continuous variables such as Total Charges, Monthly Charges and Tenure. There is an overlap with Month to Month contracts and having no online security. This suggests a clear importance of these features.

Deep Learning:

Now let’s go even more complicated and try a neural network using TensorFlow. First, let’s use a 1 hidden layer model with 20 nodes.

Accuracy: 76%

So there is a slight increase in the Recall for Churned users. However, I would seek for something better than 58%.

    
    Precision
    Recall
    f1-Score
    Support
  


    Not Churned
    85%
    87%
    86%
    1550
  

    Churned
    60%
    56%
    58%
    557
  

	Precision	Recall	f1-Score	Support
Not Churned	85%	87%	86%	1550
Churned	60%	56%	58%	557

The Neural Network is certainly not a bad route but can we do better by making it more complex and thus allow it to correctly capture the non-linearities.

Now we are going to use a 3 layer model with 2000, 1000 and 500 nodes respectively. We are also going to introduce a dropout, checkpoints and early stopping to avoid overfitting.

Well we did slightly worse in fact. Not ideal.

    
    Precision
    Recall
    f1-Score
    Support
  


    Not Churned
    84%
    87%
    86%
    1550
  

    Churned
    60%
    55%
    58%
    557
  

	Precision	Recall	f1-Score	Support
Not Churned	84%	87%	86%	1550
Churned	60%	55%	58%	557

Future Models to try:

K-Nearest Neighbours
Support Vector Machine

This is detailed analysis from Kaggle.

Kevin SynnottMay 14, 2020Comment