Firstly, lets discuss whats PCA is insimple terms:
PCA is based on a decomposition of the data matrix X into two matrices V and U

X=U*V'

The two matrices V and U are orthogonal. The matrix V is usually called the loadings matrix, and the matrix U is called the scores matrix. The loadings can be understood as the weights for each original variable (data column of original dataset) when calculating the principal component. The matrix U contains the original data in a rotated coordinate system. Furthermore, we can use PCA or SVD to reduce the dimentionality of original data to few important components, which capture highest variances, for faster processing of data in machine learning. you will learn more about this in next post.
ifelse(require(data.table),{library("data.table")},install.packages("data.table"))
## Loading required package: data.table
## [1] "data.table"
ifelse(require(dplyr),{library("dplyr")},install.packages("dplyr"))
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## [1] "dplyr"
ifelse(require(rgl),{library("rgl")},install.packages("rgl"))
## Loading required package: rgl
## Warning: package 'rgl' was built under R version 3.5.3
## [1] "rgl"
ifelse(require(pca3d),{library("pca3d")},install.packages("pca3d"))
## Loading required package: pca3d
## Warning: package 'pca3d' was built under R version 3.5.3
## [1] "pca3d"
ifelse(require(plotly),{library("plotly")},install.packages("plotly"))
## Loading required package: plotly
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## [1] "plotly"
dataWines<-read.delim(file.choose(),header = F, sep=",")
setnames(dataWines, old = c('V1','V2','V3','V4','V5','V6','V7','V8','V9','V10','V11','V12','V13','V14'),
new = c('Type','Alcohol','Malic acid','Ash','Alcalinity of ash','Magnesium','Total phenols','Flavanoids','Nonflavanoid phenols','Proanthocyanins','Color intensity','Hue','OD280/OD315 of diluted wines','Proline'))
apply(dataWines,2, function(x) sum(is.na(x)))
##                         Type                      Alcohol 
##                            0                            0 
##                   Malic acid                          Ash 
##                            0                            0 
##            Alcalinity of ash                    Magnesium 
##                            0                            0 
##                Total phenols                   Flavanoids 
##                            0                            0 
##         Nonflavanoid phenols              Proanthocyanins 
##                            0                            0 
##              Color intensity                          Hue 
##                            0                            0 
## OD280/OD315 of diluted wines                      Proline 
##                            0                            0
pca<-princomp(dataWines[,c(2:ncol(dataWines))], cor = TRUE, scores = TRUE, covmat = NULL)
summary(pca)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     2.1692972 1.5801816 1.2025273 0.9586313 0.92370351
## Proportion of Variance 0.3619885 0.1920749 0.1112363 0.0706903 0.06563294
## Cumulative Proportion  0.3619885 0.5540634 0.6652997 0.7359900 0.80162293
##                            Comp.6     Comp.7     Comp.8     Comp.9
## Standard deviation     0.80103498 0.74231281 0.59033665 0.53747553
## Proportion of Variance 0.04935823 0.04238679 0.02680749 0.02222153
## Cumulative Proportion  0.85098116 0.89336795 0.92017544 0.94239698
##                           Comp.10    Comp.11    Comp.12     Comp.13
## Standard deviation     0.50090167 0.47517222 0.41081655 0.321524394
## Proportion of Variance 0.01930019 0.01736836 0.01298233 0.007952149
## Cumulative Proportion  0.96169717 0.97906553 0.99204785 1.000000000
plot(pca)

PCA plot shows that the first three PCA components capture maximum variance between the columns and these three components can be used to reconstruct the data with minimum data loss.

loadings(pca)
## 
## Loadings:
##                              Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Alcohol                       0.144  0.484  0.207         0.266  0.214
## Malic acid                   -0.245  0.225        -0.537         0.537
## Ash                                  0.316 -0.626  0.214  0.143  0.154
## Alcalinity of ash            -0.239        -0.612               -0.101
## Magnesium                     0.142  0.300 -0.131  0.352 -0.727       
## Total phenols                 0.395        -0.146 -0.198  0.149       
## Flavanoids                    0.423        -0.151 -0.152  0.109       
## Nonflavanoid phenols         -0.299        -0.170  0.203  0.501 -0.259
## Proanthocyanins               0.313        -0.149 -0.399 -0.137 -0.534
## Color intensity                      0.530  0.137               -0.419
## Hue                           0.297 -0.279         0.428  0.174  0.106
## OD280/OD315 of diluted wines  0.376 -0.164 -0.166 -0.184  0.101  0.266
## Proline                       0.287  0.365  0.127  0.232  0.158  0.120
##                              Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
## Alcohol                              0.396  0.509  0.212   0.226   0.266 
## Malic acid                   -0.421               -0.309          -0.122 
## Ash                           0.149 -0.170 -0.308          0.499         
## Alcalinity of ash             0.287  0.428  0.200         -0.479         
## Magnesium                    -0.323 -0.156  0.271                        
## Total phenols                       -0.406  0.286 -0.320  -0.304   0.304 
## Flavanoids                          -0.187        -0.163                 
## Nonflavanoid phenols         -0.595 -0.233  0.196  0.216  -0.117         
## Proanthocyanins              -0.372  0.368 -0.209  0.134   0.237         
## Color intensity               0.228               -0.291          -0.604 
## Hue                          -0.232  0.437        -0.522          -0.259 
## OD280/OD315 of diluted wines                0.137  0.524          -0.601 
## Proline                              0.120 -0.576  0.162  -0.539         
##                              Comp.13
## Alcohol                             
## Malic acid                          
## Ash                          -0.141 
## Alcalinity of ash                   
## Magnesium                           
## Total phenols                -0.464 
## Flavanoids                    0.832 
## Nonflavanoid phenols          0.114 
## Proanthocyanins              -0.117 
## Color intensity                     
## Hue                                 
## OD280/OD315 of diluted wines -0.157 
## Proline                             
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.077  0.077  0.077  0.077  0.077  0.077  0.077  0.077
## Cumulative Var  0.077  0.154  0.231  0.308  0.385  0.462  0.538  0.615
##                Comp.9 Comp.10 Comp.11 Comp.12 Comp.13
## SS loadings     1.000   1.000   1.000   1.000   1.000
## Proportion Var  0.077   0.077   0.077   0.077   0.077
## Cumulative Var  0.692   0.769   0.846   0.923   1.000
pca$loadings 
## 
## Loadings:
##                              Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Alcohol                       0.144  0.484  0.207         0.266  0.214
## Malic acid                   -0.245  0.225        -0.537         0.537
## Ash                                  0.316 -0.626  0.214  0.143  0.154
## Alcalinity of ash            -0.239        -0.612               -0.101
## Magnesium                     0.142  0.300 -0.131  0.352 -0.727       
## Total phenols                 0.395        -0.146 -0.198  0.149       
## Flavanoids                    0.423        -0.151 -0.152  0.109       
## Nonflavanoid phenols         -0.299        -0.170  0.203  0.501 -0.259
## Proanthocyanins               0.313        -0.149 -0.399 -0.137 -0.534
## Color intensity                      0.530  0.137               -0.419
## Hue                           0.297 -0.279         0.428  0.174  0.106
## OD280/OD315 of diluted wines  0.376 -0.164 -0.166 -0.184  0.101  0.266
## Proline                       0.287  0.365  0.127  0.232  0.158  0.120
##                              Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
## Alcohol                              0.396  0.509  0.212   0.226   0.266 
## Malic acid                   -0.421               -0.309          -0.122 
## Ash                           0.149 -0.170 -0.308          0.499         
## Alcalinity of ash             0.287  0.428  0.200         -0.479         
## Magnesium                    -0.323 -0.156  0.271                        
## Total phenols                       -0.406  0.286 -0.320  -0.304   0.304 
## Flavanoids                          -0.187        -0.163                 
## Nonflavanoid phenols         -0.595 -0.233  0.196  0.216  -0.117         
## Proanthocyanins              -0.372  0.368 -0.209  0.134   0.237         
## Color intensity               0.228               -0.291          -0.604 
## Hue                          -0.232  0.437        -0.522          -0.259 
## OD280/OD315 of diluted wines                0.137  0.524          -0.601 
## Proline                              0.120 -0.576  0.162  -0.539         
##                              Comp.13
## Alcohol                             
## Malic acid                          
## Ash                          -0.141 
## Alcalinity of ash                   
## Magnesium                           
## Total phenols                -0.464 
## Flavanoids                    0.832 
## Nonflavanoid phenols          0.114 
## Proanthocyanins              -0.117 
## Color intensity                     
## Hue                                 
## OD280/OD315 of diluted wines -0.157 
## Proline                             
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.077  0.077  0.077  0.077  0.077  0.077  0.077  0.077
## Cumulative Var  0.077  0.154  0.231  0.308  0.385  0.462  0.538  0.615
##                Comp.9 Comp.10 Comp.11 Comp.12 Comp.13
## SS loadings     1.000   1.000   1.000   1.000   1.000
## Proportion Var  0.077   0.077   0.077   0.077   0.077
## Cumulative Var  0.692   0.769   0.846   0.923   1.000
biplot(pca$loadings[,1:2],pca$loadings[,1:2]) ## if you didn't want rescaling of the axis

biplot(pca)

pc_scores <- pca$scores
#pc_scores
gr <- factor(dataWines[,1])
pca2d(pca, col=dataWines$Type,group=gr, biplot=TRUE )

#pca3d(pca, col=dataWines$Type,group=gr)
wines <-dataWines[,c(2:ncol(dataWines))] 
normalized_data <- scale(wines)
wineCluster_Variability <- matrix(nrow=10, ncol=1)
for (i in 1:10) wineCluster_Variability[i] <- kmeans(normalized_data,centers=i, nstart=10)$tot.withinss
plot(1:10, wineCluster_Variability, type="b", xlab="Number of clusters", ylab="Within groups sum of squares")

winesk <- kmeans(normalized_data, centers=3, iter.max=20, nstart=5)
wine_clust<-cbind(dataWines, winesk$cluster)
#write.csv(wine_clust,file="wine_clusters_k.csv")
#summary(wine_clust[which(wine_clust$`winesk$cluster`==3),])
with(wine_clust, table(`winesk$cluster`, Type))
##               Type
## winesk$cluster  1  2  3
##              1 59  3  0
##              2  0  3 48
##              3  0 65  0
wine_clust[["Type"]]=as.factor(wine_clust[["Type"]])
wine_clust %>%
group_by(Type) %>%
    summarise_all("mean")->cluster_means
cluster_means
## # A tibble: 3 x 15
##   Type  Alcohol `Malic acid`   Ash `Alcalinity of ~ Magnesium
##   <fct>   <dbl>        <dbl> <dbl>            <dbl>     <dbl>
## 1 1        13.7         2.01  2.46             17.0     106. 
## 2 2        12.3         1.93  2.24             20.2      94.5
## 3 3        13.2         3.33  2.44             21.4      99.3
## # ... with 9 more variables: `Total phenols` <dbl>, Flavanoids <dbl>,
## #   `Nonflavanoid phenols` <dbl>, Proanthocyanins <dbl>, `Color
## #   intensity` <dbl>, Hue <dbl>, `OD280/OD315 of diluted wines` <dbl>,
## #   Proline <dbl>, `winesk$cluster` <dbl>
cols<-names(cluster_means)
length(cols)
## [1] 15
l <- htmltools::tagList()
for(i in (2:15)){
 # gather(key, value, -Cluster_ID)%>%
l[[i]]<-as_widget(plot_ly(cluster_means, x = ~`winesk$cluster`, y = cluster_means[[cols[i]]], type = 'bar', text=n, color = ~`winesk$cluster` )%>%
  layout(title = cols[i],
         xaxis = list(title = "Cluster ID"),
         yaxis = list(title = cols[i])))
                  
}
l
## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

## Warning: textfont.color doesn't (yet) support data arrays

With the PCA we can find which all factors of the wines are correlated and how these attributes differentiate each wine type. PCA fundamentally captures the variances between the factors or attributes of the wines, helping wine makers to incorporate these variances in their wines to make different types of wines.

now lets use tableau for more detailed analysis

Tableau analysis

Tableau analysis

Tableau analysis

Tableau analysis

We can subset the features by noticing the features which vary widely in composition over each cluster. Wines mostly differ on color intensity, Magnesium, alkalinity of ash, malic acid, proline and total phenols composition. All correlated features of the wines are shown in one color in the below visualization

Tableau analysis

Tableau analysis