1 Preparing for the analysis
1.1 Install and load the package odetector
This vignette is designed as an introduction to use the R package ‘odetector
’ (Cebeci et al, 2022). You can download the recent version of the package from CRAN with the following command:
install.packages("odetector")
You can also install the constantly updated version of the package from Github as follows:
if(!require(devtools))
install.packages("devtools", repo="https://cloud.r-project.org")
::install_github("zcebeci/odetector") devtools
If you have already installed ‘odetector
’, you can load it into R working environment by using the following command:
library(odetector)
1.2 Load the data set
We demonstrate outlier detection with ‘odetector
’ on a synthetic data set consisting of the three features (p1
, p2
and p3
) of four clusters. This three-dimensional data set was created by using the R Package ‘MixSim
’ (Melnykov et al, 2013). The data set consists of a total of 130 data objects, 30 in each cluster in addition to 10 samples as the outliers at the bottom.
In the following code chunk, the dataset is loaded into R working environment and its first and last rows are displayed for giving an idea about its content.
data(x3p4c)
head(x3p4c)
## p1 p2 p3 cl
## [1,] 0.9968195 0.3472756 0.4891324 1
## [2,] 0.9933293 0.3108594 0.5058799 1
## [3,] 1.0163660 0.3563446 0.5144635 1
## [4,] 0.9969506 0.4276673 0.4889330 1
## [5,] 0.9883648 0.3068805 0.5345190 1
## [6,] 0.9213208 0.2802930 0.6342422 1
tail(x3p4c)
## p1 p2 p3 cl
## [125,] 0.68032778 0.4338203 0.24720960 0
## [126,] 0.09832769 0.6726663 0.71949486 0
## [127,] 0.78069212 0.5256899 0.82378164 0
## [128,] 0.25003095 0.5115713 0.29874354 0
## [129,] 0.19075014 0.9381229 0.05666282 0
## [130,] 0.62097163 0.8947515 0.66280757 0
The following command plots the data set by the clusters. The black marked objects in the plot are the outliers.
pairs(x3p4c[,-4], col=x3p4c[,4]+1)
2 Possibilistic Fuzzy C-Means Cluster Analysis
The outlier detection algorithm in the package ‘odetector
’ uses the typicality degrees which are produced by a possibilistic clustering algorithm such as Possibilistic C-means (PCM), Fuzzy Possibilistic C-means (FPCM), Possibilistic Fuzzy C-means (PFCM) or Unsupervised Possibilistic Fuzzy C-means (UPFC). In this example, we use the outlier detection process on the results from UPFC algorithm (Wu et al, 2010) implemented in the package ‘ppclust
’ (Cebeci, 2018). For the details see the manual and vignettes of the R package ‘ppclust
’ at https://CRAN.R-project.org/package=ppclust. If required, in order to run UPFC, the ‘ppclust
’ can be loadede into working environment as follows:
if(!require(ppclust)){
install.packages("ppclust", repo="https://cloud.r-project.org");
}
For clustering we select the columns of features from the data set ‘x3p4c
’ and store in the data frame named x
as follows:
<- x3p4c[,-4]
x head(x)
## p1 p2 p3
## [1,] 0.9968195 0.3472756 0.4891324
## [2,] 0.9933293 0.3108594 0.5058799
## [3,] 1.0163660 0.3563446 0.5144635
## [4,] 0.9969506 0.4276673 0.4889330
## [5,] 0.9883648 0.3068805 0.5345190
## [6,] 0.9213208 0.2802930 0.6342422
tail(x)
## p1 p2 p3
## [125,] 0.68032778 0.4338203 0.24720960
## [126,] 0.09832769 0.6726663 0.71949486
## [127,] 0.78069212 0.5256899 0.82378164
## [128,] 0.25003095 0.5115713 0.29874354
## [129,] 0.19075014 0.9381229 0.05666282
## [130,] 0.62097163 0.8947515 0.66280757
Since the data set ’x3p4c
has four clusters, we run UPFC for 4 clusters and display the firsrt row of clustering results with following commands:
require(ppclust)
<- upfc(x, centers=4)
res.upfc head(res.upfc$t)
## Cluster 1 Cluster 2 Cluster 3 Cluster 4
## 1 0.0001413868 2.244214e-04 0.9218609 0.0018554019
## 2 0.0001624117 8.354150e-05 0.9068443 0.0033327338
## 3 0.0001264145 2.301656e-04 0.9104836 0.0015015496
## 4 0.0001618827 1.438301e-03 0.8493402 0.0006668342
## 5 0.0002465250 6.533282e-05 0.9175368 0.0047133546
## 6 0.0021139592 1.646894e-05 0.7123001 0.0281144849
In clustering based outlier detection, the use of optimal number of clusters is very critical point in order to properly partition a data set. Because we need the optimal number of clusters before starting the clustering algorithm, and it totally affect the result of clustering. One can determine it by running an appropriate clustering algorithm for a series of number of clusters in a range, namely ‘c1
’ and ‘c2
’, and calculate the clustering validation process. The majority of the validation indices has been proposed for the results from hard clustering algorithms, i.e. K-means. For validating the partitioning results of the fuzzy clustering algorithms a plenty number of clustering validation indexes, i.e, partition entropy (PE), partition coefficient (PC), Xie-Beni index (XB), Kwon index (Kwon), Fuzzy Hypervolume index (FHV) etc., have been proposed. In Cebeci (2020) the R implementations of this sort of fuzzy and possibilistic validation indexes are described. Below there is an example using R package ‘fcvalid
’ for determining the optimal number of clusters (k) in the data set ‘x3p4c
’. It can be installed from Github as follows:
if(!require(devtools))
install.packages("devtools", repo="https://cloud.r-project.org")
suppressMessages(devtools::install_github("zcebeci/fcvalid"))
After installing the package, run the fcm
function of the ppclust
package by changing the cluster number from c1
to c2
. Then get the fuzzy index values with the relevant function in the package ‘fcvalid
’
library(ppclust)
library(fcvalid)
<- 2 #Starting number of clusters
c1 <- 5 #Final number of clusters
c2 <- c("PC","MPC","PE","XB","Kwon", "TSS", "CL", "FS", "PBMF","FSIL","FHV", "APD")
indnames <- matrix(ncol=length(indnames), nrow=(c2-c1+1))
indvals colnames(indvals) <- indnames
rownames(indvals) <- paste0("c=", c1:c2)
<- 1
i for(c in c1:c2){
<- ppclust::fcm(x=x, centers=c, nstart=3)
resfcm 1] <- pc(resfcm)
indvals[i,2] <- mpc(resfcm)
indvals[i,3] <- pe(resfcm)
indvals[i,4] <- xb(resfcm)
indvals[i,5] <- kwon(resfcm)
indvals[i,6] <- tss(resfcm)
indvals[i,7] <- cl(resfcm)
indvals[i,8] <- fs(resfcm)
indvals[i,9] <- pbm(resfcm)
indvals[i,10] <- si(resfcm)$sif
indvals[i,11] <- fhv(resfcm)
indvals[i,12] <- apd(resfcm)
indvals[i,<- i+1
i }
In the result from the R script above, you will see that the majority of fuzzy indices suggests the optimal number of clusters as 3 while some suggests as 4. For example, as a powerful fuzzy index, the Fuzzy Hypervolume (FHV) index suggests 4 clusters in the data set. So we can extract this number as the optimal number of clusters as follows:
# Display the fuzzy indices in various runs of FCM
<- round(t(indvals),3)
indvals print(indvals)
# Optimal number of clusters with Fuzzy Hypervolume (FHV) index
<- colnames(indvals)[which.min(indvals["FHV",])]
optk
optk<- unname(which.min(indvals["FHV",])) + 1
k k
Below, there is an example to run UPFC algorithm with the optimal number of clusters found in the previous example. In the result object named res.upfc
, t
contains the typicality degrees to be used in outlier detection.
<- upfc(x, centers=k)
res.upfc head(res.upfc$t)
3 Outlier Detection
The outlier detection algorithm uses the object of ppclust
class which is returned by possibilistic and fuzzy clustering algorithm as shown in the previous section. Outlier detection is started with the predefined default values of the function detect.outliers
as follows:
<- detect.outliers(res.upfc) res.out
In order to change the threshold typicality, the arguments alpha
and alpha2
can be set to different threshold values. In the following command alpha
is set to 0.05 for the Approach 1 and alpha2
is set to 0.4 for the Approach 2. See the package manual for the details about these arguments.
<- detect.outliers(res.upfc, alpha=0.05, alpha2=0.4) res.out
The structure of result object:
str(res.out)
## List of 5
## $ X : num [1:130, 1:3] 0.997 0.993 1.016 0.997 0.988 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:130] "1" "2" "3" "4" ...
## .. ..$ : chr [1:3] "p1" "p2" "p3"
## $ outliers1: Named int [1:10] 121 122 123 124 125 126 127 128 129 130
## ..- attr(*, "names")= chr [1:10] "Obj.1" "Obj.2" "Obj.3" "Obj.4" ...
## $ outliers2: Named int [1:11] 66 121 122 123 124 125 126 127 128 129 ...
## ..- attr(*, "names")= chr [1:11] "Obj.1" "Obj.2" "Obj.3" "Obj.4" ...
## $ outliers3: Named int(0)
## ..- attr(*, "names")= chr(0)
## $ call : language detect.outliers(x = res.upfc, alpha = 0.05, alpha2 = 0.4)
## - attr(*, "class")= chr "outliers"
3.1 Print the outliers
The result of the detect.outliers
is an object of outliers
class. The components of this class can be displayed individually. For example, while the first command displays the result detected with the Approach 1, the second one displays the outliers detected with Approach 2:
$outliers1 res.out
## Obj.1 Obj.2 Obj.3 Obj.4 Obj.5 Obj.6 Obj.7 Obj.8 Obj.9 Obj.10
## 121 122 123 124 125 126 127 128 129 130
$outliers2 res.out
## Obj.1 Obj.2 Obj.3 Obj.4 Obj.5 Obj.6 Obj.7 Obj.8 Obj.9 Obj.10 Obj.11
## 66 121 122 123 124 125 126 127 128 129 130
The full list of outliers obtained with all of the approaches can be displayed together with the function print.outliers
as follows:
print(res.out)
## Outliers from detect.outliers(x = res.upfc, alpha = 0.05, alpha2 = 0.4)
## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
## List of outliers computed with Approach 1
## p1 p2 p3
## 121 0.48240696 0.6277246 0.12007434
## 122 0.56841818 0.7060801 0.49683778
## 123 0.21629682 0.8977075 0.50038525
## 124 0.74970854 0.7773038 0.84919884
## 125 0.68032778 0.4338203 0.24720960
## 126 0.09832769 0.6726663 0.71949486
## 127 0.78069212 0.5256899 0.82378164
## 128 0.25003095 0.5115713 0.29874354
## 129 0.19075014 0.9381229 0.05666282
## 130 0.62097163 0.8947515 0.66280757
##
## List of data set with the outliers (marked with *) by using Approach 1
## 0.997 0.347 0.489
## 0.993 0.311 0.506
## 1.016 0.356 0.514
## 0.997 0.428 0.489
## 0.988 0.307 0.535
## 0.921 0.280 0.634
## 0.975 0.389 0.484
## 0.965 0.429 0.504
## 0.947 0.409 0.574
## 0.890 0.300 0.542
## 0.961 0.337 0.563
## 0.838 0.233 0.722
## 0.927 0.440 0.589
## 0.972 0.351 0.466
## 0.910 0.391 0.569
## 0.967 0.404 0.465
## 0.907 0.365 0.520
## 0.982 0.444 0.486
## 0.923 0.373 0.597
## 0.862 0.296 0.596
## 0.957 0.312 0.561
## 0.965 0.369 0.552
## 0.920 0.344 0.527
## 1.003 0.321 0.541
## 0.932 0.285 0.488
## 0.986 0.328 0.506
## 0.610 0.022 0.721
## 0.608 0.065 0.816
## 0.478 0.078 0.756
## 0.500 0.123 0.779
## 0.540 0.041 0.727
## 0.499 0.098 0.734
## 0.575 -0.010 0.662
## 0.584 0.038 0.777
## 0.598 0.007 0.675
## 0.616 0.034 0.659
## 0.477 0.043 0.696
## 0.595 0.039 0.847
## 0.600 -0.010 0.684
## 0.611 -0.043 0.633
## 0.512 0.124 0.712
## 0.543 0.055 0.833
## 0.642 -0.013 0.747
## 0.701 -0.064 0.746
## 0.554 0.120 0.836
## 0.563 0.053 0.674
## 0.596 0.023 0.771
## 0.600 -0.029 0.677
## 0.448 0.120 0.713
## 0.545 0.042 0.802
## 0.600 -0.039 0.697
## 0.506 0.091 0.691
## 0.598 0.048 0.804
## 0.536 0.065 0.709
## 0.536 0.039 0.694
## 0.529 0.067 0.657
## 0.492 0.077 0.743
## 0.549 0.074 0.663
## 0.580 0.071 0.786
## 0.587 0.027 0.629
## 0.674 0.027 0.784
## 0.591 0.088 0.769
## 0.596 0.030 0.813
## 0.560 0.076 0.692
## 0.860 0.837 0.286
## 0.789 0.817 0.305
## 0.875 1.082 0.402
## 0.969 1.080 0.351
## 0.960 0.954 0.291
## 0.950 1.094 0.484
## 0.940 0.932 0.394
## 0.999 1.040 0.401
## 0.940 0.960 0.378
## 1.025 1.068 0.302
## 0.911 1.058 0.334
## 0.895 0.973 0.341
## 0.943 1.067 0.374
## 0.893 1.075 0.407
## 0.981 1.039 0.426
## 0.876 1.002 0.370
## 0.875 1.099 0.411
## 0.941 0.998 0.314
## 0.949 1.050 0.473
## 0.872 0.785 0.325
## 0.808 0.900 0.407
## 0.847 0.934 0.320
## 0.911 1.039 0.407
## 0.831 0.964 0.400
## 0.939 0.917 0.358
## 0.447 0.330 0.915
## 0.333 0.410 0.959
## 0.446 0.468 0.832
## 0.353 0.362 0.820
## 0.368 0.262 0.797
## 0.401 0.401 0.919
## 0.445 0.352 0.779
## 0.490 0.501 0.681
## 0.446 0.460 0.834
## 0.327 0.306 0.950
## 0.408 0.424 0.746
## 0.345 0.388 0.807
## 0.399 0.501 0.786
## 0.425 0.540 0.668
## 0.368 0.468 0.793
## 0.408 0.252 0.963
## 0.524 0.528 0.726
## 0.365 0.460 0.884
## 0.420 0.465 0.854
## 0.422 0.417 0.827
## 0.524 0.398 0.758
## 0.395 0.382 0.912
## 0.574 0.230 0.720
## 0.452 0.509 0.715
## 0.428 0.367 0.856
## 0.473 0.491 0.794
## 0.359 0.421 0.875
## 0.368 0.498 0.801
## 0.346 0.485 0.779
## 0.372 0.403 0.881
## 0.373 0.389 0.924
## * 0.482 0.628 0.120
## * 0.568 0.706 0.497
## * 0.216 0.898 0.500
## * 0.750 0.777 0.849
## * 0.680 0.434 0.247
## * 0.098 0.673 0.719
## * 0.781 0.526 0.824
## * 0.250 0.512 0.299
## * 0.191 0.938 0.057
## * 0.621 0.895 0.663
##
## List of outliers computed with Approach 2
## p1 p2 p3
## 66 0.78902573 0.8172686 0.30533197
## 121 0.48240696 0.6277246 0.12007434
## 122 0.56841818 0.7060801 0.49683778
## 123 0.21629682 0.8977075 0.50038525
## 124 0.74970854 0.7773038 0.84919884
## 125 0.68032778 0.4338203 0.24720960
## 126 0.09832769 0.6726663 0.71949486
## 127 0.78069212 0.5256899 0.82378164
## 128 0.25003095 0.5115713 0.29874354
## 129 0.19075014 0.9381229 0.05666282
## 130 0.62097163 0.8947515 0.66280757
##
## List of data set with the outliers (marked with *) by using Approach 2
## 0.997 0.347 0.489
## 0.993 0.311 0.506
## 1.016 0.356 0.514
## 0.997 0.428 0.489
## 0.988 0.307 0.535
## 0.921 0.280 0.634
## 0.975 0.389 0.484
## 0.965 0.429 0.504
## 0.947 0.409 0.574
## 0.890 0.300 0.542
## 0.961 0.337 0.563
## 0.838 0.233 0.722
## 0.927 0.440 0.589
## 0.972 0.351 0.466
## 0.910 0.391 0.569
## 0.967 0.404 0.465
## 0.907 0.365 0.520
## 0.982 0.444 0.486
## 0.923 0.373 0.597
## 0.862 0.296 0.596
## 0.957 0.312 0.561
## 0.965 0.369 0.552
## 0.920 0.344 0.527
## 1.003 0.321 0.541
## 0.932 0.285 0.488
## 0.986 0.328 0.506
## 0.610 0.022 0.721
## 0.608 0.065 0.816
## 0.478 0.078 0.756
## 0.500 0.123 0.779
## 0.540 0.041 0.727
## 0.499 0.098 0.734
## 0.575 -0.010 0.662
## 0.584 0.038 0.777
## 0.598 0.007 0.675
## 0.616 0.034 0.659
## 0.477 0.043 0.696
## 0.595 0.039 0.847
## 0.600 -0.010 0.684
## 0.611 -0.043 0.633
## 0.512 0.124 0.712
## 0.543 0.055 0.833
## 0.642 -0.013 0.747
## 0.701 -0.064 0.746
## 0.554 0.120 0.836
## 0.563 0.053 0.674
## 0.596 0.023 0.771
## 0.600 -0.029 0.677
## 0.448 0.120 0.713
## 0.545 0.042 0.802
## 0.600 -0.039 0.697
## 0.506 0.091 0.691
## 0.598 0.048 0.804
## 0.536 0.065 0.709
## 0.536 0.039 0.694
## 0.529 0.067 0.657
## 0.492 0.077 0.743
## 0.549 0.074 0.663
## 0.580 0.071 0.786
## 0.587 0.027 0.629
## 0.674 0.027 0.784
## 0.591 0.088 0.769
## 0.596 0.030 0.813
## 0.560 0.076 0.692
## 0.860 0.837 0.286
## * 0.789 0.817 0.305
## 0.875 1.082 0.402
## 0.969 1.080 0.351
## 0.960 0.954 0.291
## 0.950 1.094 0.484
## 0.940 0.932 0.394
## 0.999 1.040 0.401
## 0.940 0.960 0.378
## 1.025 1.068 0.302
## 0.911 1.058 0.334
## 0.895 0.973 0.341
## 0.943 1.067 0.374
## 0.893 1.075 0.407
## 0.981 1.039 0.426
## 0.876 1.002 0.370
## 0.875 1.099 0.411
## 0.941 0.998 0.314
## 0.949 1.050 0.473
## 0.872 0.785 0.325
## 0.808 0.900 0.407
## 0.847 0.934 0.320
## 0.911 1.039 0.407
## 0.831 0.964 0.400
## 0.939 0.917 0.358
## 0.447 0.330 0.915
## 0.333 0.410 0.959
## 0.446 0.468 0.832
## 0.353 0.362 0.820
## 0.368 0.262 0.797
## 0.401 0.401 0.919
## 0.445 0.352 0.779
## 0.490 0.501 0.681
## 0.446 0.460 0.834
## 0.327 0.306 0.950
## 0.408 0.424 0.746
## 0.345 0.388 0.807
## 0.399 0.501 0.786
## 0.425 0.540 0.668
## 0.368 0.468 0.793
## 0.408 0.252 0.963
## 0.524 0.528 0.726
## 0.365 0.460 0.884
## 0.420 0.465 0.854
## 0.422 0.417 0.827
## 0.524 0.398 0.758
## 0.395 0.382 0.912
## 0.574 0.230 0.720
## 0.452 0.509 0.715
## 0.428 0.367 0.856
## 0.473 0.491 0.794
## 0.359 0.421 0.875
## 0.368 0.498 0.801
## 0.346 0.485 0.779
## 0.372 0.403 0.881
## 0.373 0.389 0.924
## * 0.482 0.628 0.120
## * 0.568 0.706 0.497
## * 0.216 0.898 0.500
## * 0.750 0.777 0.849
## * 0.680 0.434 0.247
## * 0.098 0.673 0.719
## * 0.781 0.526 0.824
## * 0.250 0.512 0.299
## * 0.191 0.938 0.057
## * 0.621 0.895 0.663
##
## List of outliers computed with Approach 3
## No outliers detected.
##
## List of outliers computed with Approach 4
## No outliers detected.
3.2 Summarize the outliers
The function summary.outliers
calculates the descriptive statistics of the detected outliers.
summary(res.out)
## Summary of Outliers from detect.outliers(x = res.upfc, alpha = 0.05, alpha2 = 0.4)
## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
##
## Summary of outliers computed with Approach 1
## p1 p2 p3
## Min. :0.09833 Min. :0.4338 Min. :0.05666
## 1st Qu.:0.22473 1st Qu.:0.5512 1st Qu.:0.26009
## Median :0.52541 Median :0.6894 Median :0.49861
## Mean :0.46379 Mean :0.6985 Mean :0.47752
## 3rd Qu.:0.66549 3rd Qu.:0.8654 3rd Qu.:0.70532
## Max. :0.78069 Max. :0.9381 Max. :0.84920
##
## Summary of outliers computed with Approach 2
## p1 p2 p3
## Min. :0.09833 Min. :0.4338 Min. :0.05666
## 1st Qu.:0.23316 1st Qu.:0.5767 1st Qu.:0.27298
## Median :0.56842 Median :0.7061 Median :0.49684
## Mean :0.49336 Mean :0.7093 Mean :0.46187
## 3rd Qu.:0.71502 3rd Qu.:0.8560 3rd Qu.:0.69115
## Max. :0.78903 Max. :0.9381 Max. :0.84920
##
## Summary of outliers computed with Approach 3
## No outliers detected
## Summary of outliers in small clusters
## No outliers detected
##
## Available components:
## [1] "X" "outliers1" "outliers2" "outliers3" "call"
3.3 Visualiation of the outliers
There are many ways of visual representation of the results of outlier detection analysis. A traditional way is to plot the results by using the functions plot.outliers
and pairs.outliers
.
3.3.1 Plot the outliers
The function plot.outliers
plots the scattering of outliers. The argument ot is used to assing the number of approach to calculate the outliers. It ranges bbetween 1 and 4.
plot(res.out, ot=1)
plot(res.out, ot=2)
3.3.2 Pairwise Scatter Plots
In order to display the outliers by the pairs of features, use the function pairs.outliers
as follows:
pairs(res.out)
3.4 Remove the outliers
For furher analysis of data, the outliers can be removed from the original data set by using the function remove.outliers
as follows:
<- remove.outliers(res.out, sc=FALSE) Xr
In the above command, the option sc
is set to TRUE
if the data objects in small clusters are desired to be treated as collective outliers. Compare the following figure to the figure which has been plotted for the original data in the second section.
pairs(Xr, col=x3p4c[,4]+1)
4 Using different threshold values
As default, the outlier detection algorithm uses alpha
, threshold typicality level of 0.05. While the much more outliers is expected with the higher level of this argument, the lower values can be resulted with the less number of outliers. In the following command alpha
has been set to 0.1 instead of the default value of 0.05.
<- detect.outliers(res.upfc, alpha=0.1) res.out
As seen below, the data object 66
is also evaluated as the outlier with the setting of alpha=0.1
. For the details, see the package manual of ‘odetector
’.
$outliers1 res.out
## Obj.1 Obj.2 Obj.3 Obj.4 Obj.5 Obj.6 Obj.7 Obj.8 Obj.9 Obj.10 Obj.11
## 66 121 122 123 124 125 126 127 128 129 130
plot(res.out, ot=1)
Citing this vignette and the package odetector
Cebeci, Z., Cebeci, C., Tahtali, Y. and Bayyurt, L. 2022. Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering. Peerj Computer Science, 8:e1060. https://doi.org/10.7717/peerj-cs.1060.
References
Wu, X., Wu, B., Sun, J. & Fu, H. (2010). Unsupervised possibilistic fuzzy clustering. J. of Information & Computational Sci., 7(5): 1075-1080.
Melnykov, V., Chen,W-C. & Maitra, R. (2013). MixSim: An R package for simulating data to study performance of clustering algorithms. J. of Statistical Software, 51(12):1-25. DOI: https://doi.org/10.18637/jss.v051.i12.
Cebeci, Z. (2018), Comparison of internal validity indices for fuzzy clustering. Journal of Agricultural Informatics, 10(2):1-14. DOI: https://doi.org/10.17700/jai.2019.10.2.537.
Cebeci, Z. (2020). fcvalid: An R Package for Internal Validation of Probabilistic and Possibilistic Clustering. Sakarya University Journal of Computer and Information Sciences, 3(1), 11-27. DOI: https://doi.org/10.35377/saucis.03.01.664560.
Cebeci, Z., Cebeci, C., Tahtali, Y. & Bayyurt, L. (2022). Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering. Peerj Computer Science, 8:e1060. https://doi.org/10.7717/peerj-cs.1060.