Credit Card Fraud Detection — Part 2

Published in

Analytics Vidhya

10 min readFeb 8, 2021

In this part we’ll dive deeper into the models that and the data-imbalance techniques we’re going to use.

We start by dropping all the other amounts(and keep only the log scaled amount) that we had added in the data frame and put the all the features that we have in ‘X’. We have to predict ‘Class’, so we drop it from ‘X’ and store it in a label ‘y’.

As we saw in the EDA portion (Part-1), this dataset has a HUGE class imbalance.

Why is Class Imbalance a Problem?

When a statistical classifier is trained on a highly imbalanced dataset, it has a tendency to pick the patterns in the most popular class and ignore the rest.

For example, in this dataset, 99.9% of the data are labelled as ‘Not Fraud’ and rest are ‘Fraud’. So, even if a model classifies everything that it sees as ‘Not Fraud’, the accuracy is going to be 99.9% which seems excellent.

But is the model good? NO, because it is not classifying any of the transaction as ‘Fraud’. So, even if the model has an accuracy of 99.9%, it is completely useless!

We need some strategies to work with in such a dataset or we need to use some other metrics(except for accuracy) in such scenarios.

Dealing with Class Imbalance

In this blog, we’ll use 4 techniques to deal with the class imbalance.

1. Under Sample majority class

In under-sampling, the number of samples in majority class are down sampled(by eliminating them randomly)to align them to the number of samples in minority class.

This can lead to Data Inefficiency as loss of useful data can make the decision boundary between minority and majority samples harder to learn for rule-based classifiers.

This technique is only effective when minority class has sufficient data despite of being affected by severe imbalance.

2. Over Sample minority class

This is the exact opposite of Under Sampling. The number of samples are duplicated randomly in the minority class to align them to the number of samples in the majority class.

This can lead to overfitting as it makes exact copies of the minority class samples(It is not adding any new information to the model, but is simply replicating the data samples).

3. Synthetic Minority Over Sampling TEchnique(SMOTE)

SMOTE is an oversampling technique which creates new synthetic examples in the minority class which are similar to the ones already there, instead of simply replicating them.

SMOTE first selects a minority class instance a at random and finds its k nearest minority class neighbors. The synthetic instance is then created by choosing one of the k nearest neighbors b at random and connecting a and b to form a line segment in the feature space. The synthetic instances are generated as a convex combination of the two chosen instances a and b.
Page 47, Imbalanced Learning: Foundations, Algorithms, and Applications, 2013.

SMOTE Process includes the following:

Identifying the feature vector and its nearest neighbor.
Taking difference between the two.
Multiplying the difference with a random number between 0 and 1.
Identifying a new point on the line segment by adding the- random number to the feature vector.
Repeat the process for identified feature vectors.

Limitations of this approach is when synthetic examples are created without considering the majority class, it results in creation of ambiguous samples if there is a strong overlap between the classes.

4. Adaptive Synthetic Samples(ADASYN)

The essential idea behind ADASYN is to use weighted distribution for different minority class examples according to their level of difficulty in learning(more synthetic data is generated for the minority class examples that are easier to learn).

The ADASYN approach improves learning with respect to the data distributions in two ways:

It reduces the bias introduced by the class imbalance.
It adaptively shifts the classification decision boundary toward the difficult examples.

NOTE: It is important to split into test and train sets before Oversampling techniques are applied. Oversampling before splitting the data can lead to same observation being there in test and train sets which can simply allow our model to memorize those data points(and causing overfitting).

Importing Dependencies:

Let us prepare dataset for all these class imbalanced methods.

Random Under Sample Dataset

Random Over Sampler Dataset

SMOTE Dataset

ADASYN Dataset

Metrics

CONFUSION MATRIX(Error Matrix), allows visualization of the performance of an algorithm.

True Positive (TP) : Fraud correctly identified as Fraud

True Negative (TN) : Non-fraud correctly identified as Non-fraud

False Positive (FP) : Fraud incorrectly identified as Non-fraud

False Negative (FN) : Non-fraud incorrectly identified as Fraud

Accuracy

(TP +TN) / (TP + TN + FP +FN)

As discussed before, we’ll need some other metrics to evaluate our model(except for accuracy).

Precision

TP / (TP + FP)

Precision tells us how many of the correctly predicted cases actually turned out to be positive(how likely the prediction of the positive class is correct).

Recall

TP / (TP + FN)

Recall tells us how many of the actual positive cases we were able to predict correctly with our model(how good the model is to recognize a positive class).

F1 score

When we try to increase Precision, Recall goes down and vice versa. The F1-score captures both the trends in a single value. It is the harmonic mean of Recall and Precision.

F1 score: 2 x ((Precision x Recall) / (Precision + Recall))

F1 score is maximum when Precision and Recall are equal.

For this dataset, we’re going to compare the results of various models using F1 score.

ROC Curve

The ROC curve is a plot for the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

For calculating the performance of each Classification model(with all the five datasets), it would be really beneficial to create a function which could evaluate all the metrics mentioned above and store them so that they could be compared later.

Classification Algorithms

Here, we’ll discuss all these algorithms concisely before applying them.

For each of these Classification Algorithms, we’ll check all the class-imbalance techniques(for the datasets made above) and will compare the results in the end using the metrics mentioned before. So, let’s get started!

1. Logical Regression Classifier

Model Name : LR IMBALANCED
Test Accuracy :0.99826
Test AUC : 0.50000
Test Precision : 0.00000
Test Recall : 0.00000
Test F1 : 0.00000
Confusion Matrix : 
 [[84970     0]
 [  148     0]]


Model Name : LR UNDERSAMPLE
Test Accuracy :0.99826
Test AUC : 0.50000
Test Precision : 0.00000
Test Recall : 0.00000
Test F1 : 0.00000
Confusion Matrix : 
 [[84970     0]
 [  148     0]]


Model Name : LR OVERSAMPLE
Test Accuracy :0.99659
Test AUC : 0.79594
Test Precision : 0.27673
Test Recall : 0.59459
Test F1 : 0.37768
Confusion Matrix : 
 [[84740   230]
 [   60    88]]


Model Name : LR SMOTE
Test Accuracy :0.99659
Test AUC : 0.79594
Test Precision : 0.27673
Test Recall : 0.59459
Test F1 : 0.37768
Confusion Matrix : 
 [[84740   230]
 [   60    88]]


Model Name : LR ADASYN 
Test Accuracy :0.99633
Test AUC : 0.82279
Test Precision : 0.26966
Test Recall : 0.64865
Test F1 : 0.38095
Confusion Matrix : 
 [[84710   260]
 [   52    96]]

ROC curve for Logistic Regression classifier

2. Random Forest Classifier

Model Name : RF IMABALANCED
Test Accuracy :0.99952
Test AUC : 0.88847
Test Precision : 0.93496
Test Recall : 0.77703
Test F1 : 0.84871
Confusion Matrix : 
 [[84962     8]
 [   33   115]]


Model Name : RF UNDERSAMPLE
Test Accuracy :0.96343
Test AUC : 0.93784
Test Precision : 0.04173
Test Recall : 0.91216
Test F1 : 0.07981
Confusion Matrix : 
 [[81870  3100]
 [   13   135]]


Model Name : RF OVERSAMPLE
Test Accuracy :0.99952
Test AUC : 0.87835
Test Precision : 0.95726
Test Recall : 0.75676
Test F1 : 0.84528
Confusion Matrix : 
 [[84965     5]
 [   36   112]]


Model Name : RF SMOTE
Test Accuracy :0.99947
Test AUC : 0.91542
Test Precision : 0.86014
Test Recall : 0.83108
Test F1 : 0.84536
Confusion Matrix : 
 [[84950    20]
 [   25   123]]


Model Name : RF ADASYN
Test Accuracy :0.99947
Test AUC : 0.91542
Test Precision : 0.86014
Test Recall : 0.83108
Test F1 : 0.84536
Confusion Matrix : 
 [[84950    20]
 [   25   123]]

3. Gaussian Naïve Bayes Classifier

Model Name : NB IMBALANCED
Test Accuracy :0.99316
Test AUC : 0.80097
Test Precision : 0.14658
Test Recall : 0.60811
Test F1 : 0.23622
Confusion Matrix : 
 [[84446   524]
 [   58    90]]


Model Name : NB UNDERSAMPLE
Test Accuracy :0.99026
Test AUC : 0.88046
Test Precision : 0.12541
Test Recall : 0.77027
Test F1 : 0.21570
Confusion Matrix : 
 [[84175   795]
 [   34   114]]


Model Name : NB OVERSAMPLE
Test Accuracy :0.99119
Test AUC : 0.87418
Test Precision : 0.13559
Test Recall : 0.75676
Test F1 : 0.22998
Confusion Matrix : 
 [[84256   714]
 [   36   112]]


Model Name : NB SMOTE
Test Accuracy :0.99161
Test AUC : 0.88113
Test Precision : 0.14358
Test Recall : 0.77027
Test F1 : 0.24204
Confusion Matrix : 
 [[84290   680]
 [   34   114]]


Model Name : NB ADASYN
Test Accuracy :0.99118
Test AUC : 0.89103
Test Precision : 0.13978
Test Recall : 0.79054
Test F1 : 0.23756
Confusion Matrix : 
 [[84250   720]
 [   31   117]]

4. Decision Tree Classifier

Model Name : DT IMBALANCED
Test Accuracy :0.99915
Test AUC : 0.86805
Test Precision : 0.76761
Test Recall : 0.73649
Test F1 : 0.75172
Confusion Matrix : 
 [[84937    33]
 [   39   109]]


Model Name : DT UNDERSAMPLE
Test Accuracy :0.90412
Test AUC : 0.91151
Test Precision : 0.01642
Test Recall : 0.91892
Test F1 : 0.03225
Confusion Matrix : 
 [[76821  8149]
 [   12   136]]


Model Name : DT OVERSAMPLE
Test Accuracy :0.99887
Test AUC : 0.84767
Test Precision : 0.66883
Test Recall : 0.69595
Test F1 : 0.68212
Confusion Matrix : 
 [[84919    51]
 [   45   103]]


Model Name : DT SMOTE
Test Accuracy :0.99807
Test AUC : 0.90461
Test Precision : 0.46875
Test Recall : 0.81081
Test F1 : 0.59406
Confusion Matrix : 
 [[84834   136]
 [   28   120]]


Model Name : DT ADASYN
Test Accuracy :0.99769
Test AUC : 0.89092
Test Precision : 0.41281
Test Recall : 0.78378Test F1 : 0.54079
Confusion Matrix : 
 [[84805   165]
 [   32   116]]

5. K-Nearest Neighbor Classifier

Model Name : KNN IMBALANCE
Test Accuracy :0.99834
Test AUC : 0.52365
Test Precision : 1.00000
Test Recall : 0.04730
Test F1 : 0.09032
Confusion Matrix : 
 [[84970     0]
 [  141     7]]


Model Name : KNN UNDERSAMPLE
Test Accuracy :0.64224
Test AUC : 0.61171
Test Precision : 0.00282
Test Recall : 0.58108
Test F1 : 0.00562
Confusion Matrix : 
 [[54580 30390]
 [   62    86]]


Model Name : KNN OVERSAMPLE
Test Accuracy :0.99823
Test AUC : 0.61802
Test Precision : 0.47945
Test Recall : 0.23649
Test F1 : 0.31674
Confusion Matrix : 
 [[84932    38]
 [  113    35]]


Model Name : KNN SMOTE
Test Accuracy :0.98089
Test AUC : 0.82180
Test Precision : 0.05851
Test Recall : 0.66216
Test F1 : 0.10752
Confusion Matrix : 
 [[83393  1577]
 [   50    98]]


Model Name : KNN ADASYN
Test Accuracy :0.98032
Test AUC : 0.83164
Test Precision : 0.05842
Test Recall : 0.68243
Test F1 : 0.10762
Confusion Matrix : 
 [[83342  1628]
 [   47   101]]

6. XG Boost Classifier

Model Name : XGBOOST IMBALANCED
Test Accuracy :0.99954
Test AUC : 0.90871
Test Precision : 0.90977
Test Recall : 0.81757
Test F1 : 0.86121
Confusion Matrix : 
 [[84958    12]
 [   27   121]]


Model Name : XGBOOST UNDERSAMPLE
Test Accuracy :0.91485
Test AUC : 0.92025
Test Precision : 0.01858
Test Recall : 0.92568
Test F1 : 0.03643
Confusion Matrix : 
 [[77733  7237]
 [   11   137]]


Model Name : XGBOOST OVERSAMPLE
Test Accuracy :0.99948
Test AUC : 0.91206
Test Precision : 0.87143
Test Recall : 0.82432
Test F1 : 0.84722
Confusion Matrix : 
 [[84952    18]
 [   26   122]]


Model Name : XGBOOST SMOTE
Test Accuracy :0.99935
Test AUC : 0.91874
Test Precision : 0.80000
Test Recall : 0.83784
Test F1 : 0.81848
Confusion Matrix : 
 [[84939    31]
 [   24   124]]


Model Name : XGBOOST ADASYN 
Test Accuracy :0.99928
Test AUC : 0.91533
Test Precision : 0.77358
Test Recall : 0.83108
Test F1 : 0.80130
Confusion Matrix : 
 [[84934    36]
 [   25   123]]

7. MLP Classifier

Now, we’ll compare all of the F1 score(as accuracy is not a good metric for imbalanced dataset) for test dataset and will compare it for all the models(and all the datasets).

For this, we’ll create a dictionary, ‘comparision’, in which key is is the Label and the value is the list which contains the scores of all the models that we had appended before.

Image Depicting Comparison of various Classifiers(applied with Data Imbalance techniques)

The F1 score of XGBoost(Over Sample) is maximum and is same as that of XGBoost(Imbalanced). It is followed by that of Random Forest(Over Sample).

Conclusion

In this blog, we understood how data imbalance is a major challenge to deal with during building a model. We compared different techniques for dealing with data imbalance for different classification algorithms.

References

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

This paper presents a novel adaptive synthetic (ADASYN) sampling approach for learning from imbalanced data sets. The…

ieeexplore.ieee.org

How to Deal with Imbalanced Data

A Step-by-Step Guide to handling imbalanced datasets in Python

towardsdatascience.com

Confusion Matrix in Machine Learning with EXAMPLE

What is Confusion Matrix? A confusion matrix is a performance measurement technique for Machine learning…

www.guru99.com

Fraud Detection under, oversampling, SMOTE, ADASYN

Explore and run machine learning code with Kaggle Notebooks | Using data from Credit Card Fraud Detection

www.kaggle.com