In the first step, we generate a simple dataset. where C1 and C2 are dominated by C3, C3 is dominated by C4, and is C4 dominated by C5. There is no dominant-distribution relation between C1 and C2.
# Simulation section
nInv<-100
initMean=10
stepMean=20
std=8
simData1<-c()
simData1$Values<-rnorm(nInv,mean=initMean,sd=std)
simData1$Group<-rep(c("C1"),times=nInv)
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C2"),times=nInv))
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2*stepMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C3"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C4"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C5"),times=nInv) )
The framework is used to analyze the data below.
# Simple ordering inference section
library(EDOIF)
## Loading required package: boot
# parameter setting
bootT=1000 # Number of times of sampling with replacement
alpha=0.05 # significance significance level
#======= input
Values=simData1$Values
Group=simData1$Group
#=============
A1<-EDOIF(Values,Group,bootT = bootT, alpha=alpha )
We print the result of our framework below.
print(A1) # print results in text
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A <U+227A> B
## A dominant-distribution network density:0.900000
## Distribution: C1
## Mean:9.074610 95CI:[ 7.416565,10.913566]
## Distribution: C2
## Mean:9.571799 95CI:[ 8.115621,10.991122]
## Distribution: C3
## Mean:48.641842 95CI:[ 46.935907,50.410831]
## Distribution: C4
## Mean:70.433208 95CI:[ 69.031927,72.161735]
## Distribution: C5
## Mean:90.130574 95CI:[ 88.492751,91.875444]
## =======================================================
## Mean difference of C2 (n=100) minus C1 (n=100): C1 <U+2280> C2
## :p-val 0.4154
## Mean Diff:0.497189 95CI:[ -1.632478,2.792858]
##
## Mean difference of C3 (n=100) minus C1 (n=100): C1 <U+227A> C3
## :p-val 0.0000
## Mean Diff:39.567232 95CI:[ 37.120519,42.246510]
##
## Mean difference of C4 (n=100) minus C1 (n=100): C1 <U+227A> C4
## :p-val 0.0000
## Mean Diff:61.358598 95CI:[ 58.905366,63.793044]
##
## Mean difference of C5 (n=100) minus C1 (n=100): C1 <U+227A> C5
## :p-val 0.0000
## Mean Diff:81.055964 95CI:[ 78.631608,83.752169]
##
## Mean difference of C3 (n=100) minus C2 (n=100): C2 <U+227A> C3
## :p-val 0.0000
## Mean Diff:39.070044 95CI:[ 36.670593,41.334377]
##
## Mean difference of C4 (n=100) minus C2 (n=100): C2 <U+227A> C4
## :p-val 0.0000
## Mean Diff:60.861409 95CI:[ 58.803587,62.864864]
##
## Mean difference of C5 (n=100) minus C2 (n=100): C2 <U+227A> C5
## :p-val 0.0000
## Mean Diff:80.558775 95CI:[ 78.389285,82.837822]
##
## Mean difference of C4 (n=100) minus C3 (n=100): C3 <U+227A> C4
## :p-val 0.0000
## Mean Diff:21.791366 95CI:[ 19.330900,24.172359]
##
## Mean difference of C5 (n=100) minus C3 (n=100): C3 <U+227A> C5
## :p-val 0.0000
## Mean Diff:41.488731 95CI:[ 39.163272,43.771835]
##
## Mean difference of C5 (n=100) minus C4 (n=100): C4 <U+227A> C5
## :p-val 0.0000
## Mean Diff:19.697366 95CI:[ 17.363485,22.178291]
The first plot is the plot of mean-difference confidence intervals
plot(A1,options =1)
The second plot is the plot of mean confidence intervals
plot(A1,options =2)
The third plot is a dominant-distribution network.
out<-plot(A1,options =3)
We generate more complicated dataset of mixture distributions. C1, C2, C3, and C4 are dominated by C5. There is no dominant-distribution relation among C1, C2, C3, and C4.
library(EDOIF)
# parameter setting
bootT=1000
alpha=0.05
nInv<-1200
start_time <- Sys.time()
#======= input
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
Values=simData3$Values
Group=simData3$Group
#=============
A3<-EDOIF(Values,Group, bootT=bootT, alpha=alpha, methodType ="perc")
A3
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A <U+227A> B
## A dominant-distribution network density:0.500000
## Distribution: C3
## Mean:81.449593 95CI:[ 78.867804,84.063172]
## Distribution: C2
## Mean:81.809903 95CI:[ 80.339347,83.227538]
## Distribution: C4
## Mean:82.385487 95CI:[ 80.101899,84.951571]
## Distribution: C1
## Mean:82.953944 95CI:[ 80.338677,85.769927]
## Distribution: C5
## Mean:139.957369 95CI:[ 136.685226,142.578980]
## =======================================================
## Mean difference of C2 (n=1200) minus C3 (n=1200): C3 <U+2280> C2
## :p-val 0.6222
## Mean Diff:0.360310 95CI:[ -2.635325,3.304207]
##
## Mean difference of C4 (n=1200) minus C3 (n=1200): C3 <U+2280> C4
## :p-val 0.3823
## Mean Diff:0.935894 95CI:[ -2.781728,4.160345]
##
## Mean difference of C1 (n=1200) minus C3 (n=1200): C3 <U+2280> C1
## :p-val 0.0624
## Mean Diff:1.504351 95CI:[ -2.213064,5.157605]
##
## Mean difference of C5 (n=1200) minus C3 (n=1200): C3 <U+227A> C5
## :p-val 0.0000
## Mean Diff:58.507776 95CI:[ 54.391212,62.239108]
##
## Mean difference of C4 (n=1200) minus C2 (n=1200): C2 <U+2280> C4
## :p-val 0.2846
## Mean Diff:0.575584 95CI:[ -2.120584,3.430463]
##
## Mean difference of C1 (n=1200) minus C2 (n=1200): C2 <U+227A> C1
## :p-val 0.0292
## Mean Diff:1.144040 95CI:[ -1.813826,4.160002]
##
## Mean difference of C5 (n=1200) minus C2 (n=1200): C2 <U+227A> C5
## :p-val 0.0000
## Mean Diff:58.147466 95CI:[ 54.657763,61.312185]
##
## Mean difference of C1 (n=1200) minus C4 (n=1200): C4 <U+2280> C1
## :p-val 0.1128
## Mean Diff:0.568457 95CI:[ -3.024515,4.198713]
##
## Mean difference of C5 (n=1200) minus C4 (n=1200): C4 <U+227A> C5
## :p-val 0.0000
## Mean Diff:57.571882 95CI:[ 53.520551,61.123886]
##
## Mean difference of C5 (n=1200) minus C1 (n=1200): C1 <U+227A> C5
## :p-val 0.0000
## Mean Diff:57.003426 95CI:[ 53.037586,60.890256]
plot(A3)
end_time <- Sys.time()
end_time - start_time
## Time difference of 17.16731 secs
Generating \(A\) dominates \(B\) with different degrees of uniform noise
library(ggplot2)
nInv<-1000
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
#plot(density(simData3$V3))
dat <- data.frame(dens = c(simData3$V3, simData3$V5)
, lines = rep(c("B", "A"), each = nInv))
#Plot.
p1<-ggplot(dat, aes(x = dens, fill = lines)) + geom_density(alpha = 0.5) +xlim(-400, 400)+ ylim(0, 0.07) + ylab("Density [0,1]") +xlab("Values") + theme( axis.text.x = element_text(face="bold",
size=12) )
theme_update(text = element_text(face="bold", size=12) )
p1$labels$fill<-"Categories"
plot(p1)
## Warning: Removed 7 rows containing non-finite values (stat_density).