Clustering and cluster analysis template
Seetharam annepu
ifelse(require(cluster),{library("cluster")},install.packages("cluster"))
## Loading required package: cluster
## [1] "cluster"
ifelse(require(fpc),{library("fpc")},install.packages("fpc"))
## Loading required package: fpc
## Warning: package 'fpc' was built under R version 3.5.3
## [1] "fpc"
ifelse(require(NbClust),{library("NbClust")},install.packages("NbClust"))
## Loading required package: NbClust
## [1] "NbClust"
ifelse(require(dendextend),{library("dendextend")},install.packages("dendextend"))
## Loading required package: dendextend
## Warning: package 'dendextend' was built under R version 3.5.3
##
## ---------------------
## Welcome to dendextend version 1.9.0
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## Or contact: <tal.galili@gmail.com>
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following object is masked from 'package:stats':
##
## cutree
## [1] "dendextend"
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
CC General credit card data set from kaggle.com is being used for this clustering exercise.
data.df<-read.csv(file.choose(),header = T)
the first clustering method is hierarchical clustering with euclidean distance method. in this chunk, number of meaningful clusters is determined using the dendogram. before plotting the dendogram, data needs to be normalized to eliminate the clustering bias that can occur because of different scales of different attributes. therefore, we bring the data to a standard scale. normalizing brings all the data points on single scale and, thus, all columns are given equal importance.
Here Complete linkage method is used. that is, the distance between two clusters is equal to the distance between the farthest elements of both clusters
####hierarchical clustering euclidean complete linkage method
data.df <- data.df[-1] #removing the ID attribute
normalized_data <- scale(data.df)
d <- dist(normalized_data, method = "euclidean")
fit <- hclust(d, method="complete")
plot(fit)
rect.hclust(fit, k=9, border="blue")
fit_c <- as.dendrogram(fit)
cdendo = color_branches(fit_c,k=20) #Coloured dendrogram branches
plot(cdendo)
by concatening the custer id with each customer, cluster summary can be found out. the results show that hierarchical clustering is susceptible to outliers
groups <- cutree(fit, k=20)
membership<-as.matrix(groups)
for(i in c(1:10)){
print(paste0("cluster",i))
print(summary(subset(data.df,membership[,1]==i)))
}
## [1] "cluster1"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 0.00 Min. :0.0000 Min. : 0.0 Min. : 0.0
## 1st Qu.: 98.68 1st Qu.:0.8182 1st Qu.: 20.0 1st Qu.: 0.0
## Median : 750.15 Median :1.0000 Median : 309.2 Median : 0.0
## Mean : 1358.75 Mean :0.8654 Mean : 638.1 Mean : 347.7
## 3rd Qu.: 1772.32 3rd Qu.:1.0000 3rd Qu.: 887.9 3rd Qu.: 412.0
## Max. :16304.89 Max. :1.0000 Max. :7479.9 Max. :7479.9
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.00 Min. : 0.0 Min. :0.00000
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:0.08333
## Median : 59.24 Median : 0.0 Median :0.41667
## Mean : 290.64 Mean : 832.5 Mean :0.45797
## 3rd Qu.: 385.76 3rd Qu.: 1024.4 3rd Qu.:0.90000
## Max. :4538.84 Max. :14371.8 Max. :1.00000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.1429
## Mean :0.1621 Mean :0.3332
## 3rd Qu.:0.2500 3rd Qu.:0.6667
## Max. :1.0000 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. : 50
## 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 1500
## Median :0.0000 Median : 0.000 Median : 6.000 Median : 3000
## Mean :0.1260 Mean : 2.656 Mean : 9.969 Mean : 4128
## 3rd Qu.:0.1667 3rd Qu.: 4.000 3rd Qu.:13.000 3rd Qu.: 6000
## Max. :1.5000 Max. :40.000 Max. :88.000 Max. :30000
## NA's :1
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 0.019 Min. :0.0000 Min. : 6.00
## 1st Qu.: 354.1 1st Qu.: 163.755 1st Qu.:0.0000 1st Qu.:12.00
## Median : 749.5 Median : 279.545 Median :0.0000 Median :12.00
## Mean : 1299.5 Mean : 603.652 Mean :0.1508 Mean :11.48
## 3rd Qu.: 1587.6 3rd Qu.: 711.671 3rd Qu.:0.1429 3rd Qu.:12.00
## Max. :22889.2 Max. :11142.932 Max. :1.0000 Max. :12.00
## NA's :305
## [1] "cluster2"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 12.42 Min. :0.6364 Min. : 847.7 Min. : 0
## 1st Qu.: 599.48 1st Qu.:1.0000 1st Qu.: 2903.5 1st Qu.: 1502
## Median : 1820.22 Median :1.0000 Median : 4337.2 Median : 2604
## Mean : 2767.68 Mean :0.9902 Mean : 4885.1 Mean : 3288
## 3rd Qu.: 4007.18 3rd Qu.:1.0000 3rd Qu.: 5994.1 3rd Qu.: 4309
## Max. :18495.56 Max. :1.0000 Max. :15704.0 Max. :14215
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.2500
## 1st Qu.: 639.3 1st Qu.: 0.0 1st Qu.:1.0000
## Median :1277.0 Median : 0.0 Median :1.0000
## Mean :1598.2 Mean : 464.1 Mean :0.9644
## 3rd Qu.:2154.5 3rd Qu.: 295.9 3rd Qu.:1.0000
## Max. :7571.4 Max. :7540.3 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.6667 1st Qu.:0.6667
## Median :0.9000 Median :0.9167
## Mean :0.7740 Mean :0.7968
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.00 Min. : 5.00 Min. : 800
## 1st Qu.:0.00000 1st Qu.: 0.00 1st Qu.: 45.00 1st Qu.: 5000
## Median :0.00000 Median : 0.00 Median : 66.00 Median : 7500
## Mean :0.06668 Mean : 1.41 Mean : 73.57 Mean : 7823
## 3rd Qu.:0.08333 3rd Qu.: 1.00 3rd Qu.: 92.00 3rd Qu.:10150
## Max. :0.83333 Max. :23.00 Max. :309.00 Max. :22000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0 Min. : 41.85 Min. :0.0000 Min. : 6.00
## 1st Qu.: 2123 1st Qu.: 210.02 1st Qu.:0.0000 1st Qu.:12.00
## Median : 3646 Median : 553.12 Median :0.0000 Median :12.00
## Mean : 4312 Mean : 1179.01 Mean :0.2436 Mean :11.95
## 3rd Qu.: 5692 3rd Qu.: 1310.60 3rd Qu.:0.4167 3rd Qu.:12.00
## Max. :24199 Max. :13621.71 Max. :1.0000 Max. :12.00
## NA's :3
## [1] "cluster3"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 758.1 Min. :0.7273 Min. : 0.00 Min. : 0.00
## 1st Qu.:1418.3 1st Qu.:1.0000 1st Qu.: 22.16 1st Qu.: 0.00
## Median :2149.2 Median :1.0000 Median : 288.39 Median : 0.00
## Mean :2698.9 Mean :0.9873 Mean : 540.29 Mean : 130.39
## 3rd Qu.:3517.1 3rd Qu.:1.0000 3rd Qu.: 729.82 3rd Qu.: 45.65
## Max. :7957.0 Max. :1.0000 Max. :4004.00 Max. :2463.00
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.00 Min. :0.00000
## 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.:0.08333
## Median : 176.7 Median : 19.04 Median :0.50000
## Mean : 409.9 Mean : 697.90 Mean :0.49959
## 3rd Qu.: 558.2 3rd Qu.: 948.85 3rd Qu.:1.00000
## Max. :3887.0 Max. :10616.27 Max. :1.00000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.4167
## Mean :0.05059 Mean :0.4733
## 3rd Qu.:0.08333 3rd Qu.:1.0000
## Max. :0.41667 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.000 Min. : 0.00 Min. : 1000
## 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 1200
## Median :0.08333 Median : 1.000 Median : 7.00 Median : 2500
## Mean :0.08724 Mean : 2.323 Mean :13.59 Mean : 2933
## 3rd Qu.:0.16667 3rd Qu.: 3.000 3rd Qu.:18.00 3rd Qu.: 4000
## Max. :0.50000 Max. :17.000 Max. :79.00 Max. :11000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 3572 Min. :0.000000 Min. :11.00
## 1st Qu.: 156.1 1st Qu.: 8320 1st Qu.:0.000000 1st Qu.:12.00
## Median : 415.8 Median :12026 Median :0.000000 Median :12.00
## Mean : 981.7 Mean :13557 Mean :0.002688 Mean :11.91
## 3rd Qu.:1221.5 3rd Qu.:17332 3rd Qu.:0.000000 3rd Qu.:12.00
## Max. :8735.6 Max. :31871 Max. :0.083333 Max. :12.00
## NA's :3
## [1] "cluster4"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 168.6 Min. :0.7273 Min. : 0.0 Min. : 0.0
## 1st Qu.: 2210.4 1st Qu.:1.0000 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 4350.6 Median :1.0000 Median : 421.1 Median : 230.0
## Mean : 4438.8 Mean :0.9828 Mean :1186.5 Mean : 756.1
## 3rd Qu.: 6064.0 3rd Qu.:1.0000 3rd Qu.:1858.7 3rd Qu.: 989.1
## Max. :12373.3 Max. :1.0000 Max. :7394.2 Max. :6678.3
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 922.7 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.: 3303.2 1st Qu.:0.0000
## Median : 62.4 Median : 5157.2 Median :0.3636
## Mean : 430.4 Mean : 6411.7 Mean :0.4501
## 3rd Qu.: 630.1 3rd Qu.: 8733.1 3rd Qu.:0.9091
## Max. :5106.0 Max. :18857.1 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.1667 Median :0.08333
## Mean :0.2759 Mean :0.32763
## 3rd Qu.:0.5000 3rd Qu.:0.75000
## Max. :1.0000 Max. :1.00000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.1667 Min. : 2.00 Min. : 0.00 Min. : 1000
## 1st Qu.:0.5000 1st Qu.:17.00 1st Qu.: 0.00 1st Qu.: 5000
## Median :0.6667 Median :26.00 Median : 7.00 Median : 7500
## Mean :0.6649 Mean :27.37 Mean : 19.04 Mean : 7745
## 3rd Qu.:0.8333 3rd Qu.:33.00 3rd Qu.: 29.00 3rd Qu.:10000
## Max. :1.0909 Max. :80.00 Max. :130.00 Max. :18500
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0 Min. : 114.6 Min. :0.00000 Min. : 6.00
## 1st Qu.: 1786 1st Qu.: 777.3 1st Qu.:0.00000 1st Qu.:12.00
## Median : 4590 Median : 1395.4 Median :0.00000 Median :12.00
## Mean : 6014 Mean : 1732.8 Mean :0.06113 Mean :11.64
## 3rd Qu.: 9737 3rd Qu.: 2079.8 3rd Qu.:0.08333 3rd Qu.:12.00
## Max. :21321 Max. :12245.9 Max. :0.91667 Max. :12.00
## NA's :1
## [1] "cluster5"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 2990 Min. :0.9091 Min. : 0 Min. : 0
## 1st Qu.: 6443 1st Qu.:1.0000 1st Qu.: 0 1st Qu.: 0
## Median : 8967 Median :1.0000 Median :1621 Median :1067
## Mean : 9017 Mean :0.9886 Mean :2060 Mean :1187
## 3rd Qu.:11712 3rd Qu.:1.0000 3rd Qu.:3920 3rd Qu.:1821
## Max. :14581 Max. :1.0000 Max. :4996 Max. :3403
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. :20277 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.:22178 1st Qu.:0.0000
## Median : 553.7 Median :24662 Median :0.4167
## Mean : 873.1 Mean :24478 Mean :0.4792
## 3rd Qu.:1467.5 3rd Qu.:26526 3rd Qu.:1.0000
## Max. :2859.2 Max. :29282 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.2500 Median :0.1667
## Mean :0.3438 Mean :0.3646
## 3rd Qu.:0.5625 3rd Qu.:0.6875
## Max. :1.0000 Max. :1.0000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.3333 Min. :10.00 Min. : 0.00 Min. : 7000
## 1st Qu.:0.5417 1st Qu.:22.75 1st Qu.: 0.00 1st Qu.: 8875
## Median :0.7083 Median :26.50 Median :19.00 Median :13000
## Mean :0.6657 Mean :30.38 Mean :21.12 Mean :12625
## 3rd Qu.:0.8333 3rd Qu.:31.50 3rd Qu.:34.50 3rd Qu.:15750
## Max. :0.9091 Max. :69.00 Max. :61.00 Max. :18500
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. :18342 Min. : 1107 Min. :0.00000 Min. :11.00
## 1st Qu.:20079 1st Qu.: 1786 1st Qu.:0.00000 1st Qu.:12.00
## Median :20680 Median : 2906 Median :0.04167 Median :12.00
## Mean :22653 Mean : 5525 Mean :0.12626 Mean :11.88
## 3rd Qu.:25941 3rd Qu.: 5906 3rd Qu.:0.22917 3rd Qu.:12.00
## Max. :28233 Max. :21235 Max. :0.45454 Max. :12.00
## [1] "cluster6"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. :11631 Min. :1 Min. :14686 Min. : 0
## 1st Qu.:11637 1st Qu.:1 1st Qu.:14897 1st Qu.:1185
## Median :11643 Median :1 Median :15108 Median :2370
## Mean :14106 Mean :1 Mean :17268 Mean :3940
## 3rd Qu.:15343 3rd Qu.:1 3rd Qu.:18559 3rd Qu.:5910
## Max. :19043 Max. :1 Max. :22010 Max. :9449
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. :12561 Min. : 0 Min. :1
## 1st Qu.:12650 1st Qu.: 0 1st Qu.:1
## Median :12738 Median : 0 Median :1
## Mean :13328 Mean :1141 Mean :1
## 3rd Qu.:13712 3rd Qu.:1711 3rd Qu.:1
## Max. :14686 Max. :3423 Max. :1
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :1
## 1st Qu.:0.2500 1st Qu.:1
## Median :0.5000 Median :1
## Mean :0.4167 Mean :1
## 3rd Qu.:0.6250 3rd Qu.:1
## Max. :0.7500 Max. :1
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. :0.0000 Min. :216.0 Min. :12000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:257.0 1st Qu.:12800
## Median :0.00000 Median :0.0000 Median :298.0 Median :13600
## Mean :0.02778 Mean :0.6667 Mean :287.0 Mean :14533
## 3rd Qu.:0.04167 3rd Qu.:1.0000 3rd Qu.:322.5 3rd Qu.:15800
## Max. :0.08333 Max. :2.0000 Max. :347.0 Max. :18000
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. :11401 Min. :10285 Min. :0 Min. :12
## 1st Qu.:13703 1st Qu.:10969 1st Qu.:0 1st Qu.:12
## Median :16005 Median :11653 Median :0 Median :12
## Mean :16808 Mean :13520 Mean :0 Mean :12
## 3rd Qu.:19512 3rd Qu.:15137 3rd Qu.:0 3rd Qu.:12
## Max. :23019 Max. :18621 Max. :0 Max. :12
## [1] "cluster7"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 1088 Min. :0.8182 Min. : 6336 Min. : 0
## 1st Qu.: 2593 1st Qu.:1.0000 1st Qu.: 8655 1st Qu.:1197
## Median : 4209 Median :1.0000 Median : 9970 Median :2560
## Mean : 5090 Mean :0.9835 Mean :10100 Mean :2697
## 3rd Qu.: 6552 3rd Qu.:1.0000 3rd Qu.:11049 3rd Qu.:3909
## Max. :13673 Max. :1.0000 Max. :14605 Max. :6489
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 4267 Min. : 0.00 Min. :0.3333
## 1st Qu.: 5669 1st Qu.: 0.00 1st Qu.:0.9167
## Median : 6308 Median : 0.00 Median :1.0000
## Mean : 7403 Mean : 603.28 Mean :0.9280
## 3rd Qu.: 9037 3rd Qu.: 84.84 3rd Qu.:1.0000
## Max. :12541 Max. :4446.46 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.1667
## 1st Qu.:0.4375 1st Qu.:0.9375
## Median :0.6250 Median :1.0000
## Mean :0.5795 Mean :0.9053
## 3rd Qu.:0.7500 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.000 Min. : 5.0 Min. : 6000
## 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.:108.8 1st Qu.: 9200
## Median :0.00000 Median : 0.000 Median :137.0 Median :10500
## Mean :0.07197 Mean : 2.091 Mean :136.7 Mean :11991
## 3rd Qu.:0.06250 3rd Qu.: 0.750 3rd Qu.:188.2 3rd Qu.:15000
## Max. :0.75000 Max. :26.000 Max. :224.0 Max. :20000
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 1942 Min. : 204.8 Min. :0.0000 Min. :12
## 1st Qu.: 5614 1st Qu.: 514.5 1st Qu.:0.0000 1st Qu.:12
## Median : 8432 Median : 1244.9 Median :0.0000 Median :12
## Mean : 9392 Mean : 2870.8 Mean :0.1868 Mean :12
## 3rd Qu.:13299 3rd Qu.: 3187.7 3rd Qu.:0.2083 3rd Qu.:12
## Max. :16826 Max. :17494.9 Max. :1.0000 Max. :12
## [1] "cluster8"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. :2493 Min. :0.8182 Min. :21803 Min. :21803
## 1st Qu.:2676 1st Qu.:1.0000 1st Qu.:25812 1st Qu.:22451
## Median :2877 Median :1.0000 Median :26594 Median :24078
## Mean :4029 Mean :0.9697 Mean :26850 Mean :24213
## 3rd Qu.:4594 3rd Qu.:1.0000 3rd Qu.:27664 3rd Qu.:26166
## Max. :8152 Max. :1.0000 Max. :32540 Max. :26547
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.5833
## 1st Qu.: 325.8 1st Qu.: 0.0 1st Qu.:1.0000
## Median :2318.7 Median : 0.0 Median :1.0000
## Mean :2637.4 Mean : 295.3 Mean :0.9306
## 3rd Qu.:4729.8 3rd Qu.: 0.0 3rd Qu.:1.0000
## Max. :5992.4 Max. :1771.8 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.5000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.1458
## Median :1.0000 Median :0.4583
## Mean :0.9167 Mean :0.4861
## 3rd Qu.:1.0000 3rd Qu.:0.8333
## Max. :1.0000 Max. :1.0000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. :0.0000 Min. : 33.00 Min. : 9000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.: 59.50 1st Qu.:11875
## Median :0.00000 Median :0.0000 Median : 71.00 Median :14750
## Mean :0.01389 Mean :0.6667 Mean : 73.83 Mean :16167
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.: 91.50 3rd Qu.:16875
## Max. :0.08333 Max. :4.0000 Max. :114.00 Max. :30000
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. :17575 Min. : 534.0 Min. :0.08333 Min. :12
## 1st Qu.:22895 1st Qu.: 543.8 1st Qu.:0.31250 1st Qu.:12
## Median :25591 Median : 1267.0 Median :0.83333 Median :12
## Mean :24732 Mean : 2987.4 Mean :0.65278 Mean :12
## 3rd Qu.:27104 3rd Qu.: 2341.9 3rd Qu.:0.97917 3rd Qu.:12
## Max. :30029 Max. :11853.8 Max. :1.00000 Max. :12
## [1] "cluster9"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 4.383 Min. :0.09091 Min. : 0.00 Min. : 0.0
## 1st Qu.: 268.577 1st Qu.:0.27273 1st Qu.: 4.44 1st Qu.: 0.0
## Median :1132.386 Median :0.45454 Median : 21.00 Median : 0.0
## Mean :3155.677 Mean :0.56566 Mean : 743.60 Mean : 525.5
## 3rd Qu.:6744.846 3rd Qu.:1.00000 3rd Qu.:1788.84 3rd Qu.: 0.0
## Max. :8248.178 Max. :1.00000 Max. :2483.26 Max. :2483.3
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.00 Min. : 0 Min. :0.00000
## 1st Qu.: 0.00 1st Qu.: 0 1st Qu.:0.08333
## Median : 4.44 Median : 5626 Median :0.08333
## Mean : 220.54 Mean : 8490 Mean :0.25000
## 3rd Qu.: 21.00 3rd Qu.:14836 3rd Qu.:0.33333
## Max. :1788.84 Max. :21944 Max. :1.00000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.08333
## Mean :0.0463 Mean :0.19444
## 3rd Qu.:0.0000 3rd Qu.:0.16667
## Max. :0.3333 Max. :1.00000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.000 Min. : 0.000 Min. : 6000
## 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:10000
## Median :0.08333 Median : 1.000 Median : 1.000 Median :13000
## Mean :0.15741 Mean : 5.111 Mean : 3.333 Mean :12889
## 3rd Qu.:0.33333 3rd Qu.:11.000 3rd Qu.: 5.000 3rd Qu.:16500
## Max. :0.41667 Max. :20.000 Max. :12.000 Max. :18000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. :21440 Min. : 406.8 Min. :0.0000 Min. :12
## 1st Qu.:23151 1st Qu.:1009.6 1st Qu.:0.1000 1st Qu.:12
## Median :29272 Median :1471.7 Median :0.3333 Median :12
## Mean :29489 Mean :1376.9 Mean :0.3799 Mean :12
## 3rd Qu.:33486 3rd Qu.:1814.2 3rd Qu.:0.5000 3rd Qu.:12
## Max. :40628 Max. :2150.0 Max. :1.0000 Max. :12
## NA's :1
## [1] "cluster10"
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. :11548 Min. :1 Min. :41050 Min. :40624
## 1st Qu.:12030 1st Qu.:1 1st Qu.:43048 1st Qu.:40658
## Median :12513 Median :1 Median :45045 Median :40693
## Mean :12513 Mean :1 Mean :45045 Mean :40693
## 3rd Qu.:12996 3rd Qu.:1 3rd Qu.:47042 3rd Qu.:40727
## Max. :13479 Max. :1 Max. :49040 Max. :40761
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 426.3 Min. : 0.0 Min. :0.8333
## 1st Qu.:2389.3 1st Qu.:139.5 1st Qu.:0.8750
## Median :4352.3 Median :279.1 Median :0.9167
## Mean :4352.3 Mean :279.1 Mean :0.9167
## 3rd Qu.:6315.3 3rd Qu.:418.6 3rd Qu.:0.9583
## Max. :8278.3 Max. :558.2 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.6667 Min. :0.4167
## 1st Qu.:0.7500 1st Qu.:0.5417
## Median :0.8333 Median :0.6667
## Mean :0.8333 Mean :0.6667
## 3rd Qu.:0.9167 3rd Qu.:0.7917
## Max. :1.0000 Max. :0.9167
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. :0.00 Min. :101 Min. :17000
## 1st Qu.:0.02083 1st Qu.:0.25 1st Qu.:115 1st Qu.:18375
## Median :0.04167 Median :0.50 Median :129 Median :19750
## Mean :0.04167 Mean :0.50 Mean :129 Mean :19750
## 3rd Qu.:0.06250 3rd Qu.:0.75 3rd Qu.:143 3rd Qu.:21125
## Max. :0.08333 Max. :1.00 Max. :157 Max. :22500
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. :36067 Min. : 2974 Min. :0.08333 Min. :12
## 1st Qu.:38783 1st Qu.: 6209 1st Qu.:0.12500 1st Qu.:12
## Median :41499 Median : 9444 Median :0.16667 Median :12
## Mean :41499 Mean : 9444 Mean :0.16667 Mean :12
## 3rd Qu.:44215 3rd Qu.:12679 3rd Qu.:0.20833 3rd Qu.:12
## Max. :46931 Max. :15914 Max. :0.25000 Max. :12
colnames(membership)[1] <- "Cluster_ID"
credit_cluster <- cbind(data.df,membership)
#credit_cluster
#write.csv(airlinescluster,file="airlinescluster.csv")
credit_cluster[["Cluster_ID"]]=as.factor(credit_cluster[["Cluster_ID"]])
credit_cluster %>%
group_by(Cluster_ID) %>%
summarise_all("mean")->cluster_means
cluster_means
credit_cluster %>%
group_by(Cluster_ID) %>%
summarise(Record_count=n())->count.df
cluster_means %>% inner_join(count.df, by = "Cluster_ID")
above table of clusters shows that clusters 13, 16, 17, 18, 19 and 20 contains only one record and these clusters can be assumed to contain outliers or eccentric records
cols<-names(cluster_means)
length(cols)
## [1] 18
l <- htmltools::tagList()
for(i in (2:15)){
# gather(key, value, -Cluster_ID)%>%
l[[i]]<-as_widget(plot_ly(cluster_means, x = ~Cluster_ID, y = cluster_means[[cols[i]]], type = 'bar', text=n, color = ~Cluster_ID )%>%
layout(title = cols[i],
xaxis = list(title = "Cluster ID"),
yaxis = list(title = cols[i])))
}
l
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning: Ignoring 1 observations
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
#plot_ly( x = ~Cluster_ID, y = ~CASH_ADVANCE, type = 'bar', name = 'Cash Advance')
#+ facet_wrap(~ key)
In this method, ward linkage is used. in ward linkage sum of squares of the combined group is compred. And the lowest sum of squares are combined first
d2 <- dist(normalized_data, method = "euclidean")
fit2 <- hclust(d2, method="ward.D2")
plot(fit2)
rect.hclust(fit2, k=11, border="blue")
#plot(cut(fit2, h=11)$upper)
cd <- as.dendrogram(fit2)
cd = color_branches(cd,k=11)
plot(cd)
nclust <- (nrow(normalized_data)-1):1 # Number of clusters at each step
hval<-11
# Find the number of clusters that first exceeds hval
k <- max(nclust[which(diff(fit2$height < hval) == -1) + 1])
#groups2 <- cutree(fit2, h=11)
# cutree.h <- function(tree,h) {
# # this line adapted from cutree(...) code
# k <- nrow(tree$merge) + 2L - apply(outer(c(hc$height, Inf), h, ">"), 2, which.max)
# return(cutree(tree,k=k))
# }
#groups2 <- cutree.h(hc, h = 11)
groups <- cutree(fit2, k=11)
membership<-as.matrix(groups)
#membership
colnames(membership)[1] <- "Cluster_ID"
for(i in c(1:11)){
print(summary(subset(data.df,membership[,1]==i)))
}
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 0.735 Min. :0.1818 Min. : 0.0 Min. : 0.0
## 1st Qu.: 723.855 1st Qu.:1.0000 1st Qu.: 0.0 1st Qu.: 0.0
## Median :1315.479 Median :1.0000 Median : 20.0 Median : 0.0
## Mean :1610.551 Mean :0.9666 Mean : 232.9 Mean : 196.2
## 3rd Qu.:2185.296 3rd Qu.:1.0000 3rd Qu.: 256.2 3rd Qu.: 182.5
## Max. :7659.192 Max. :1.0000 Max. :5080.9 Max. :4674.2
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.00 Min. : 0.0 Min. :0.00000
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:0.00000
## Median : 0.00 Median : 365.0 Median :0.08333
## Mean : 36.98 Mean : 939.4 Mean :0.13317
## 3rd Qu.: 0.00 3rd Qu.:1418.9 3rd Qu.:0.25000
## Max. :1823.56 Max. :7981.2 Max. :1.00000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000
## Mean :0.08223 Mean :0.05117
## 3rd Qu.:0.08333 3rd Qu.:0.00000
## Max. :0.75000 Max. :1.00000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. : 50
## 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 1500
## Median :0.1667 Median : 2.000 Median : 1.000 Median : 2500
## Mean :0.1733 Mean : 3.643 Mean : 2.556 Mean : 3483
## 3rd Qu.:0.2500 3rd Qu.: 5.000 3rd Qu.: 3.000 3rd Qu.: 5000
## Max. :0.9167 Max. :33.000 Max. :79.000 Max. :15500
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 5.038 Min. :0.00000 Min. : 8.00
## 1st Qu.: 368.5 1st Qu.: 257.774 1st Qu.:0.00000 1st Qu.:12.00
## Median : 650.7 Median : 456.684 Median :0.00000 Median :12.00
## Mean :1068.9 Mean : 939.296 Mean :0.02248 Mean :11.84
## 3rd Qu.:1266.7 3rd Qu.: 796.296 3rd Qu.:0.00000 3rd Qu.:12.00
## Max. :9282.0 Max. :20551.620 Max. :0.91667 Max. :12.00
## NA's :48
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 0.0 Min. :0.0000 Min. : 8.4 Min. : 8.4
## 1st Qu.: 125.9 1st Qu.:1.0000 1st Qu.: 720.6 1st Qu.: 637.6
## Median : 300.1 Median :1.0000 Median :1271.3 Median :1063.1
## Mean : 807.9 Mean :0.9699 Mean :1510.1 Mean :1283.3
## 3rd Qu.:1143.0 3rd Qu.:1.0000 3rd Qu.:2154.8 3rd Qu.:1798.7
## Max. :5694.0 Max. :1.0000 Max. :5078.1 Max. :4558.3
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.1667
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.:0.7500
## Median : 22.0 Median : 0.0 Median :1.0000
## Mean : 226.8 Mean : 170.7 Mean :0.8768
## 3rd Qu.: 296.7 3rd Qu.: 0.0 3rd Qu.:1.0000
## Max. :1979.8 Max. :5353.0 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.1667 Min. :0.00000
## 1st Qu.:0.7136 1st Qu.:0.00000
## Median :0.9167 Median :0.08333
## Mean :0.8435 Mean :0.24050
## 3rd Qu.:1.0000 3rd Qu.:0.41667
## Max. :1.0000 Max. :1.00000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.000 Min. : 2.00 Min. : 300
## 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 12.00 1st Qu.: 2500
## Median :0.00000 Median : 0.000 Median : 20.00 Median : 4500
## Mean :0.03618 Mean : 0.676 Mean : 22.68 Mean : 5469
## 3rd Qu.:0.00000 3rd Qu.: 0.000 3rd Qu.: 29.00 3rd Qu.: 7500
## Max. :0.58333 Max. :17.000 Max. :101.00 Max. :25000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 3.198 Min. :0.0000 Min. : 6.00
## 1st Qu.: 731.3 1st Qu.: 169.393 1st Qu.:0.0000 1st Qu.:12.00
## Median :1357.4 Median : 190.250 Median :0.2500 Median :12.00
## Mean :1636.9 Mean : 388.394 Mean :0.3886 Mean :11.94
## 3rd Qu.:2246.8 3rd Qu.: 389.089 3rd Qu.:0.8258 3rd Qu.:12.00
## Max. :8987.5 Max. :3414.220 Max. :1.0000 Max. :12.00
## NA's :12
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 0.000 Min. :0.0000 Min. : 0.0 Min. : 0.0
## 1st Qu.: 7.826 1st Qu.:0.2727 1st Qu.: 58.0 1st Qu.: 0.0
## Median : 26.787 Median :0.4545 Median : 195.0 Median : 0.0
## Mean : 153.219 Mean :0.4197 Mean : 351.1 Mean : 175.6
## 3rd Qu.: 91.662 3rd Qu.:0.5455 3rd Qu.: 476.2 3rd Qu.: 159.0
## Max. :8248.178 Max. :1.0000 Max. :2971.8 Max. :2971.8
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.00000
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.:0.08333
## Median : 66.9 Median : 0.0 Median :0.25000
## Mean : 176.0 Mean : 351.6 Mean :0.32625
## 3rd Qu.: 265.4 3rd Qu.: 0.0 3rd Qu.:0.50000
## Max. :2166.6 Max. :10239.1 Max. :1.00000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.1667
## Mean :0.06618 Mean :0.2523
## 3rd Qu.:0.08333 3rd Qu.:0.4167
## Max. :0.63636 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.0000 Min. : 0.000 Min. : 300
## 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 1.000 1st Qu.: 1500
## Median :0.00000 Median : 0.0000 Median : 4.000 Median : 3000
## Mean :0.03494 Mean : 0.7874 Mean : 5.424 Mean : 3961
## 3rd Qu.:0.00000 3rd Qu.: 0.0000 3rd Qu.: 8.000 3rd Qu.: 5000
## Max. :0.57143 Max. :24.0000 Max. :50.000 Max. :23000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 0.019 Min. :0.00000 Min. : 6.00
## 1st Qu.: 187.0 1st Qu.: 74.921 1st Qu.:0.00000 1st Qu.:12.00
## Median : 445.6 Median : 118.675 Median :0.08333 Median :12.00
## Mean : 1349.1 Mean : 168.816 Mean :0.25889 Mean :11.81
## 3rd Qu.: 1160.7 3rd Qu.: 167.560 3rd Qu.:0.45454 3rd Qu.:12.00
## Max. :40627.6 Max. :3245.693 Max. :1.00000 Max. :12.00
## NA's :160
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 1.198 Min. :0.4545 Min. : 0.0 Min. : 0.00
## 1st Qu.: 161.820 1st Qu.:1.0000 1st Qu.: 372.2 1st Qu.: 0.00
## Median : 708.991 Median :1.0000 Median : 671.7 Median : 45.65
## Mean : 920.760 Mean :0.9808 Mean : 951.8 Mean : 310.32
## 3rd Qu.:1372.565 3rd Qu.:1.0000 3rd Qu.:1364.6 3rd Qu.: 449.14
## Max. :5315.946 Max. :1.0000 Max. :4065.7 Max. :3227.85
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.1667
## 1st Qu.: 264.9 1st Qu.: 0.0 1st Qu.:0.7500
## Median : 464.6 Median : 0.0 Median :0.9167
## Mean : 641.7 Mean : 209.2 Mean :0.8483
## 3rd Qu.: 835.6 3rd Qu.: 58.5 3rd Qu.:1.0000
## Max. :3927.1 Max. :4232.1 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.08333
## 1st Qu.:0.00000 1st Qu.:0.63636
## Median :0.08333 Median :0.90909
## Mean :0.14136 Mean :0.78767
## 3rd Qu.:0.25000 3rd Qu.:1.00000
## Max. :0.75000 Max. :1.00000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.000 Min. : 0.00 Min. : 450
## 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 12.00 1st Qu.: 1500
## Median :0.00000 Median : 0.000 Median : 15.00 Median : 2500
## Mean :0.04987 Mean : 1.009 Mean : 20.02 Mean : 3298
## 3rd Qu.:0.08333 3rd Qu.: 1.000 3rd Qu.: 26.00 3rd Qu.: 4500
## Max. :0.58333 Max. :17.000 Max. :111.00 Max. :16000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 2.891 Min. :0.00000 Min. :10.00
## 1st Qu.: 409.6 1st Qu.: 175.229 1st Qu.:0.00000 1st Qu.:12.00
## Median : 764.2 Median : 269.758 Median :0.00000 Median :12.00
## Mean :1096.1 Mean : 570.588 Mean :0.08585 Mean :11.96
## 3rd Qu.:1411.0 3rd Qu.: 678.561 3rd Qu.:0.09091 3rd Qu.:12.00
## Max. :8481.0 Max. :6521.086 Max. :1.00000 Max. :12.00
## NA's :16
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 6.956 Min. :0.1818 Min. : 630.4 Min. : 0
## 1st Qu.: 599.694 1st Qu.:1.0000 1st Qu.: 3145.5 1st Qu.: 1561
## Median : 1619.811 Median :1.0000 Median : 4360.6 Median : 2679
## Mean : 2641.970 Mean :0.9815 Mean : 5111.4 Mean : 3266
## 3rd Qu.: 3631.351 3rd Qu.:1.0000 3rd Qu.: 6102.0 3rd Qu.: 4247
## Max. :19043.139 Max. :1.0000 Max. :22500.0 Max. :14215
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.00 Min. :0.2500
## 1st Qu.: 572.6 1st Qu.: 0.00 1st Qu.:0.9167
## Median : 1282.9 Median : 0.00 Median :1.0000
## Mean : 1846.7 Mean : 358.74 Mean :0.9456
## 3rd Qu.: 2348.0 3rd Qu.: 48.54 3rd Qu.:1.0000
## Max. :22500.0 Max. :15133.53 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.5833 1st Qu.:0.5833
## Median :0.8333 Median :0.9167
## Mean :0.7292 Mean :0.7597
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.0000 Min. : 5.00 Min. : 650
## 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 35.00 1st Qu.: 5000
## Median :0.00000 Median : 0.0000 Median : 54.00 Median : 7500
## Mean :0.04561 Mean : 0.9776 Mean : 68.46 Mean : 8012
## 3rd Qu.:0.08333 3rd Qu.: 1.0000 3rd Qu.: 86.00 3rd Qu.:10500
## Max. :1.00000 Max. :48.0000 Max. :347.00 Max. :30000
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0 Min. : 14.01 Min. :0.0000 Min. : 6.00
## 1st Qu.: 2254 1st Qu.: 218.11 1st Qu.:0.0000 1st Qu.:12.00
## Median : 3817 Median : 527.74 Median :0.0000 Median :12.00
## Mean : 4638 Mean : 1219.72 Mean :0.2231 Mean :11.95
## 3rd Qu.: 5878 3rd Qu.: 1248.17 3rd Qu.:0.3333 3rd Qu.:12.00
## Max. :24199 Max. :22011.78 Max. :1.0000 Max. :12.00
## NA's :4
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 166.3 Min. :0.5000 Min. : 0.0 Min. : 0.0
## 1st Qu.: 3615.3 1st Qu.:1.0000 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 5262.7 Median :1.0000 Median : 467.1 Median : 99.0
## Mean : 5431.1 Mean :0.9857 Mean : 896.2 Mean : 484.0
## 3rd Qu.: 6888.8 3rd Qu.:1.0000 3rd Qu.:1277.4 3rd Qu.: 685.3
## Max. :16304.9 Max. :1.0000 Max. :7394.2 Max. :4900.0
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.: 1930 1st Qu.:0.0000
## Median : 135.0 Median : 3369 Median :0.5000
## Mean : 412.3 Mean : 3754 Mean :0.4799
## 3rd Qu.: 582.5 3rd Qu.: 5095 3rd Qu.:0.9167
## Max. :5106.0 Max. :14827 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.08333 Median :0.2500
## Mean :0.19519 Mean :0.3725
## 3rd Qu.:0.33333 3rd Qu.:0.7500
## Max. :1.00000 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.0000 Min. : 0.00 Min. : 0.00 Min. : 1000
## 1st Qu.:0.2500 1st Qu.: 5.00 1st Qu.: 0.00 1st Qu.: 6000
## Median :0.4167 Median : 9.00 Median : 9.00 Median : 8000
## Mean :0.4051 Mean :10.59 Mean : 15.69 Mean : 8551
## 3rd Qu.:0.5833 3rd Qu.:15.00 3rd Qu.: 21.00 3rd Qu.:10500
## Max. :1.0000 Max. :62.00 Max. :142.00 Max. :21500
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0 Min. : 114.6 Min. :0.00000 Min. : 6.00
## 1st Qu.: 1375 1st Qu.: 1087.5 1st Qu.:0.00000 1st Qu.:12.00
## Median : 2036 Median : 1661.5 Median :0.00000 Median :12.00
## Mean : 2945 Mean : 1945.1 Mean :0.01605 Mean :11.73
## 3rd Qu.: 3499 3rd Qu.: 2433.3 3rd Qu.:0.00000 3rd Qu.:12.00
## Max. :17085 Max. :11142.9 Max. :0.91667 Max. :12.00
## NA's :8
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 0.00 Min. :0.0000 Min. : 0.0 Min. : 0.0
## 1st Qu.: 55.18 1st Qu.:0.7143 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 406.59 Median :0.8750 Median : 230.0 Median : 0.0
## Mean : 847.90 Mean :0.7978 Mean : 485.7 Mean : 278.0
## 3rd Qu.:1175.80 3rd Qu.:1.0000 3rd Qu.: 634.2 3rd Qu.: 248.9
## Max. :7801.51 Max. :1.0000 Max. :6520.0 Max. :6520.0
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.:0.0000
## Median : 0.0 Median : 253.9 Median :0.5000
## Mean : 207.7 Mean : 986.5 Mean :0.4758
## 3rd Qu.: 288.5 3rd Qu.:1469.9 3rd Qu.:0.8889
## Max. :3200.0 Max. :8422.6 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000
## Mean :0.1332 Mean :0.3366
## 3rd Qu.:0.1667 3rd Qu.:0.7500
## Max. :1.0000 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. : 300
## 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 1200
## Median :0.1250 Median : 1.000 Median : 5.000 Median : 1500
## Mean :0.1893 Mean : 3.179 Mean : 6.744 Mean : 2544
## 3rd Qu.:0.3000 3rd Qu.: 5.000 3rd Qu.: 9.000 3rd Qu.: 3000
## Max. :1.5000 Max. :26.000 Max. :98.000 Max. :17500
## NA's :1
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. : 0.312 Min. :0.0000 Min. : 6.000
## 1st Qu.: 145.5 1st Qu.: 103.356 1st Qu.:0.0000 1st Qu.: 7.000
## Median : 316.6 Median : 160.332 Median :0.0000 Median : 8.000
## Mean : 619.9 Mean : 357.180 Mean :0.1682 Mean : 7.738
## 3rd Qu.: 696.5 3rd Qu.: 376.217 3rd Qu.:0.2000 3rd Qu.: 9.000
## Max. :7821.0 Max. :7243.733 Max. :1.0000 Max. :11.000
## NA's :64
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 198 Min. :0.4545 Min. : 0.0 Min. : 0.00
## 1st Qu.: 2269 1st Qu.:1.0000 1st Qu.: 0.0 1st Qu.: 0.00
## Median : 4247 Median :1.0000 Median : 210.0 Median : 15.75
## Mean : 4769 Mean :0.9685 Mean : 824.8 Mean : 569.56
## 3rd Qu.: 7035 3rd Qu.:1.0000 3rd Qu.: 870.4 3rd Qu.: 549.95
## Max. :14581 Max. :1.0000 Max. :7194.5 Max. :6678.26
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 1535 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.: 4891 1st Qu.:0.0000
## Median : 0.0 Median : 8529 Median :0.2250
## Mean : 255.3 Mean : 9658 Mean :0.3425
## 3rd Qu.: 303.8 3rd Qu.:12384 3rd Qu.:0.6750
## Max. :2859.2 Max. :47137 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.04167 Median :0.0000
## Mean :0.18444 Mean :0.2209
## 3rd Qu.:0.27273 3rd Qu.:0.4167
## Max. :1.00000 Max. :1.0000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.08333 Min. : 1.00 Min. : 0.000 Min. : 1000
## 1st Qu.:0.55357 1st Qu.: 20.75 1st Qu.: 0.000 1st Qu.: 5000
## Median :0.75000 Median : 28.50 Median : 3.000 Median : 9000
## Mean :0.69193 Mean : 33.51 Mean : 9.821 Mean : 8489
## 3rd Qu.:0.83333 3rd Qu.: 40.25 3rd Qu.:12.250 3rd Qu.:11500
## Max. :1.09091 Max. :123.00 Max. :77.000 Max. :19600
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 332.4 Min. : 19.49 Min. :0.00000 Min. : 6.00
## 1st Qu.: 3421.8 1st Qu.: 682.91 1st Qu.:0.00000 1st Qu.:12.00
## Median : 7941.5 Median : 1312.28 Median :0.00000 Median :12.00
## Mean : 9268.3 Mean : 1912.02 Mean :0.08339 Mean :11.52
## 3rd Qu.:12508.5 3rd Qu.: 2205.30 3rd Qu.:0.09091 3rd Qu.:12.00
## Max. :39048.6 Max. :21235.06 Max. :0.80000 Max. :12.00
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 6.647 Min. :0.3636 Min. : 106.6 Min. : 0.0
## 1st Qu.: 43.745 1st Qu.:0.9091 1st Qu.: 495.8 1st Qu.: 0.0
## Median : 87.047 Median :1.0000 Median : 991.1 Median : 0.0
## Mean :118.810 Mean :0.9427 Mean :1264.7 Mean : 220.0
## 3rd Qu.:146.223 3rd Qu.:1.0000 3rd Qu.:1648.5 3rd Qu.: 187.8
## Max. :683.849 Max. :1.0000 Max. :7387.8 Max. :3894.5
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.000 Min. :0.2500
## 1st Qu.: 433.6 1st Qu.: 0.000 1st Qu.:0.8333
## Median : 807.5 Median : 0.000 Median :1.0000
## Mean :1044.9 Mean : 6.409 Mean :0.8944
## 3rd Qu.:1397.5 3rd Qu.: 0.000 3rd Qu.:1.0000
## Max. :4538.8 Max. :960.975 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.7500
## Median :0.0000 Median :0.9167
## Mean :0.1007 Mean :0.8481
## 3rd Qu.:0.1667 3rd Qu.:1.0000
## Max. :0.7500 Max. :1.0000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.000000 Min. :0.00000 Min. : 3.00 Min. : 400
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:12.00 1st Qu.: 2500
## Median :0.000000 Median :0.00000 Median :15.00 Median : 4000
## Mean :0.002027 Mean :0.02973 Mean :20.79 Mean : 4925
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:24.00 3rd Qu.: 6500
## Max. :0.166667 Max. :3.00000 Max. :75.00 Max. :19500
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 110.6 Min. : 55.72 Min. :0.4167 Min. :10.00
## 1st Qu.: 535.1 1st Qu.: 153.60 1st Qu.:0.7273 1st Qu.:12.00
## Median :1036.2 Median : 169.00 Median :0.8889 Median :12.00
## Mean :1346.1 Mean : 221.69 Mean :0.8477 Mean :11.98
## 3rd Qu.:1686.9 3rd Qu.: 178.84 3rd Qu.:1.0000 3rd Qu.:12.00
## Max. :7943.6 Max. :4553.16 Max. :1.0000 Max. :12.00
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 1269 Min. :0.2727 Min. :12552 Min. :10901
## 1st Qu.: 2709 1st Qu.:1.0000 1st Qu.:21952 1st Qu.:16330
## Median : 3392 Median :1.0000 Median :26402 Median :21803
## Mean : 4812 Mean :0.9561 Mean :27505 Mean :22417
## 3rd Qu.: 6170 3rd Qu.:1.0000 3rd Qu.:31920 3rd Qu.:25819
## Max. :13479 Max. :1.0000 Max. :49040 Max. :40761
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0 Min. :0.1667
## 1st Qu.: 767.9 1st Qu.: 0 1st Qu.:1.0000
## Median : 4732.3 Median : 0 Median :1.0000
## Mean : 5087.9 Mean : 1618 Mean :0.9051
## 3rd Qu.: 7345.8 3rd Qu.: 0 3rd Qu.:1.0000
## Max. :15497.2 Max. :19513 Max. :1.0000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.1667 Min. :0.0000
## 1st Qu.:0.6667 1st Qu.:0.5000
## Median :1.0000 Median :0.9167
## Mean :0.8464 Mean :0.7087
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.000 Min. : 3.0 Min. : 7500
## 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 71.0 1st Qu.:12000
## Median :0.00000 Median : 0.000 Median :101.0 Median :17000
## Mean :0.06159 Mean : 2.609 Mean :124.1 Mean :16000
## 3rd Qu.:0.00000 3rd Qu.: 0.000 3rd Qu.:148.5 3rd Qu.:18000
## Max. :0.66667 Max. :35.000 Max. :358.0 Max. :30000
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. :13002 Min. : 410.8 Min. :0.0000 Min. :10.00
## 1st Qu.:22446 1st Qu.: 566.1 1st Qu.:0.1250 1st Qu.:12.00
## Median :26652 Median : 1149.7 Median :0.5833 Median :12.00
## Mean :28139 Mean : 2599.1 Mean :0.5334 Mean :11.91
## 3rd Qu.:32847 3rd Qu.: 2606.0 3rd Qu.:0.9583 3rd Qu.:12.00
## Max. :50721 Max. :15914.5 Max. :1.0000 Max. :12.00
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## Min. : 2749 Min. :1 Min. : 0.0 Min. : 0.00
## 1st Qu.: 3670 1st Qu.:1 1st Qu.: 0.0 1st Qu.: 0.00
## Median : 5010 Median :1 Median : 384.5 Median : 0.00
## Mean : 5511 Mean :1 Mean : 991.5 Mean : 67.77
## 3rd Qu.: 6994 3rd Qu.:1 3rd Qu.: 926.1 3rd Qu.: 5.00
## Max. :10571 Max. :1 Max. :7739.5 Max. :669.00
##
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## Min. : 0.0 Min. : 0.0 Min. :0.0000
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.:0.0000
## Median : 239.9 Median : 0.0 Median :0.4848
## Mean : 923.7 Mean : 925.1 Mean :0.5110
## 3rd Qu.: 926.1 3rd Qu.:1059.8 3rd Qu.:1.0000
## Max. :7739.5 Max. :4909.9 Max. :1.0000
##
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.4432
## Mean :0.03333 Mean :0.4860
## 3rd Qu.:0.02083 3rd Qu.:1.0000
## Max. :0.33333 Max. :1.0000
##
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## Min. :0.00000 Min. : 0.00 Min. : 0.00 Min. : 1700
## 1st Qu.:0.00000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 3750
## Median :0.00000 Median : 0.00 Median : 9.50 Median : 5250
## Mean :0.06667 Mean : 1.90 Mean : 25.95 Mean : 5752
## 3rd Qu.:0.08333 3rd Qu.: 2.25 3rd Qu.: 38.75 3rd Qu.: 8250
## Max. :0.33333 Max. :11.00 Max. :162.00 Max. :10200
##
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## Min. : 0.0 Min. :24302 Min. :0 Min. :11.00
## 1st Qu.: 219.6 1st Qu.:26626 1st Qu.:0 1st Qu.:12.00
## Median : 548.5 Median :29020 Median :0 Median :12.00
## Mean :1264.3 Mean :36723 Mean :0 Mean :11.95
## 3rd Qu.:1795.1 3rd Qu.:42881 3rd Qu.:0 3rd Qu.:12.00
## Max. :4560.8 Max. :76406 Max. :0 Max. :12.00
## NA's :1
credit_cluster2 <- cbind(data.df,membership)
credit_cluster2[["Cluster_ID"]]=as.factor(credit_cluster2[["Cluster_ID"]])
credit_cluster2 %>%
group_by(Cluster_ID) %>%
summarise_all("mean")->cluster_means2
credit_cluster2 %>%
group_by(Cluster_ID) %>%
summarise(Record_count=n())->count.df2
cluster_means2 %>% inner_join(count.df2, by = "Cluster_ID")->cluster_means2
cluster_means2
cols<-names(cluster_means2)
l <- htmltools::tagList()
for(i in (2:15)){
# gather(key, value, -Cluster_ID)%>%
l[[i]]<-as_widget(plot_ly(cluster_means2, x = ~Cluster_ID, y = cluster_means2[[cols[i]]], type = 'bar', text=n, color = ~Cluster_ID )%>%
layout(title = cols[i],
xaxis = list(title = "Cluster ID"),
yaxis = list(title = cols[i])))
}
l
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning: Ignoring 1 observations
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
kmeans will not accept missing values in data. hence, we have to go an extra step to find the na values per column and impute them with imputation techniques.
apply(data.df,2, function(x) sum(is.na(x)))
## BALANCE BALANCE_FREQUENCY
## 0 0
## PURCHASES ONEOFF_PURCHASES
## 0 0
## INSTALLMENTS_PURCHASES CASH_ADVANCE
## 0 0
## PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
## 0 0
## PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY
## 0 0
## CASH_ADVANCE_TRX PURCHASES_TRX
## 0 0
## CREDIT_LIMIT PAYMENTS
## 1 0
## MINIMUM_PAYMENTS PRC_FULL_PAYMENT
## 313 0
## TENURE
## 0
data.df[ which( is.na( data.df$MINIMUM_PAYMENTS )), "MINIMUM_PAYMENTS" ] <-
mean( data.df[["MINIMUM_PAYMENTS"]] , na.rm = TRUE )
data.df[ which( is.na( data.df$CREDIT_LIMIT )), "CREDIT_LIMIT" ] <-
mean( data.df[["CREDIT_LIMIT"]] , na.rm = TRUE )
normalized_data <- scale(data.df)
Moving on to kmeans clustering. to determine number of clusters, lets plot scree/elbow plot sum of squares within the group (yup you are right just like anova) for different number of clusters. the point of highest change of slope represents the optimum number of clusters
Cluster_Variability <- matrix(nrow=40, ncol=1)
for (i in 10:40) Cluster_Variability[i] <- kmeans(normalized_data,centers=i, nstart=i+30)$tot.withinss
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 447500)
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
## Warning: did not converge in 10 iterations
plot(10:40, Cluster_Variability[10:40], type="b", xlab="Number of clusters", ylab="Within groups sum of squares")
from the scree plot we can assume 17 could be appropriate number of clusters
this chunk is the actual kmean clustering algorithm. pretty simple eh?
fit <- kmeans(normalized_data, centers=17, iter.max=20, nstart=10)
kmeans_credit<-cbind(data.df, fit$cluster)
names(kmeans_credit)
## [1] "BALANCE" "BALANCE_FREQUENCY"
## [3] "PURCHASES" "ONEOFF_PURCHASES"
## [5] "INSTALLMENTS_PURCHASES" "CASH_ADVANCE"
## [7] "PURCHASES_FREQUENCY" "ONEOFF_PURCHASES_FREQUENCY"
## [9] "PURCHASES_INSTALLMENTS_FREQUENCY" "CASH_ADVANCE_FREQUENCY"
## [11] "CASH_ADVANCE_TRX" "PURCHASES_TRX"
## [13] "CREDIT_LIMIT" "PAYMENTS"
## [15] "MINIMUM_PAYMENTS" "PRC_FULL_PAYMENT"
## [17] "TENURE" "fit$cluster"
kmeans_credit[["fit$cluster"]]=as.factor(kmeans_credit[["fit$cluster"]])
kmeans_credit %>%
group_by(fit$cluster) %>%
summarise_all("mean")->cluster_kmeans
kmeans_credit %>%
group_by(fit$cluster) %>%
summarise(Record_count=n())->count.dfk
cluster_kmeans %>% inner_join(count.dfk, by = "fit$cluster")->cluster_kmeans
cluster_kmeans
cols<-names(cluster_kmeans)
l <- htmltools::tagList()
for(i in (2:18)){
# gather(key, value, -Cluster_ID)%>%
l[[i]]<-as_widget(plot_ly(cluster_kmeans, x = cluster_kmeans[["fit$cluster"]] , y = cluster_kmeans[[cols[i]]], type = 'bar', color = cluster_kmeans[["fit$cluster"]] )%>%
layout(title = cols[i],
xaxis = list(title = "Cluster ID"),
yaxis = list(title = cols[i])))
}
l
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
## Warning: textfont.color doesn't (yet) support data arrays
kmeans_credit[["fit$cluster"]]<-as.factor(kmeans_credit[["fit$cluster"]])
library(RColorBrewer)
colourCount = length(unique(kmeans_credit[["fit$cluster"]]))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))
cols<-names(kmeans_credit)
for(i in (1:17)){
print(ggplot(kmeans_credit, aes(x=`fit$cluster`, y=kmeans_credit[[cols[i]]],fill = `fit$cluster`)) +
geom_violin(scale = "width")+ geom_boxplot(width=0.1)+
scale_fill_manual(values = getPalette(colourCount),guide = guide_legend(ncol=2)) +theme_minimal()+
labs( x="Cluster ID", y=cols[[i]]))
}
Interpretation of clusters and Inferences:
given that we have absolutely no information or explanation about any attributes of the dataset, all the inferences on attributes made here are purely based on assumptions derived solely from column names.
- cluster 7 has lowest balance and balance frequency. hence, these people are prompt or punctual about their credit card payments. also, they dont seem to make big purchases. an ad campaign that details the emi schemes could be targeted towards them to increase their purchases.
2)Cluster 15 customers are the heavy users of the credit card with high-amount purchases and one-off purchases. we need to retain their loyalty
3)cluster 13 customers use the cash advance services of the credit card most.
4)cluster 1 people does most transactions but of lower value transactions.
5)cluster 10 has highest minimum payments.
6)cluster 4 has lowest one-off puchase frequency.
hence by custering we can specifically target parsimonious and high spenders with ad campaigns
final words:
In this markdown we have only used euclidean hierarchical clustering with complete and ward linkage methods but we can also use other linkage methods such as median and single linkage methods and other distance functions such Mahalanobis distance or absolute distance. additionally we could try other variants of kmeans such as kmedian.