The independenceWeights
package constructs weights
designed to minimize the weighted statistical dependence between a
continuous exposure variable and a vector of confounder variables and
implements the methods of Huling, Greifer, and Chen (2021) for doing so.
In estimating a causal dose-response function, confounding bias is a
function of the dependence between confounders and the
exposure/treatment, so weights that minimize the dependence aim to
mitigate confounding bias directly.
Install the development version from GitHub with:
# install.packages("devtools")
::install_github("jaredhuling/independenceWeights") devtools
This is a basic example which shows how to estimate and utilize the distance covariance optimal weights (DCOWs) of Huling, Greifer, and Chen (2021):
library(independenceWeights)
Simulate data with a continuous treatment that has a confounded relationship with a response. Data are simulated according to the simulation setup of Vegetabile et al. (2021).
<- simulate_confounded_data(seed = 999, nobs = 500)
simdat <- simdat$data$Y ## response
y <- simdat$data$A ## treatment
A <- as.matrix(simdat$data[c("Z1", "Z2", "Z3", "Z4", "Z5")]) ## confounders X
Now estimate weights to adjust for confounders using the distance covariance optimal weights (DCOWs), which aim to mitigate the dependence between A and X:
<- independence_weights(A, X)
dcows
dcows#> Unweighted distance covariance: 0.3963
#> Optimized weighted dependence distance: 0.0246
#> Effective sample size: 264.0099
#>
#> Weight ranges:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.2767 0.8215 1.0000 1.4120 5.7360
Alternatively, information about any set of weights can be printed
via weighted_energy_stats()
. The unweighted distance
covariance is a measure of the unadjusted dependence between confounders
and treatment. The weighted dependence distance is a measure of this
dependence after weighting. The weighted energy distance
between A and weighted A measures how well the marginal distribution of
A is preserved after weighting. Similarly for X. The effective sample
size estimates the effective sample size after weighting.
weighted_energy_stats(A, X, dcows$weights)
#> Unweighted distance covariance: 0.3963
#> Weighted dependence distance: 0.0246
#> Weighted energy distance(A, weighted A): 0.0014
#> Weighted energy distance(X, weighted X): 0.0025
#> Effective sample size: 264.0099
We can assess traditional measures of balance such as marginal
weighted correlations with cobalt
:
library(cobalt)
Even though the DCOWs aim to mitigate joint dependence between X and
A, they result in very small marginal weighted correlations between them
as well. (changing poly
to 10 below would reveal that even
up to 10th degree polynomials of covariates as well-decorrelated):
bal.tab(A ~ Z1 + Z2 + Z3 + Z4 + Z5, data = data.frame(A = A, X),
weights = list(DCOW = dcows$weights),
un = TRUE, int = TRUE, poly = 1, stats = "cor")
#> Balance Measures
#> Type Corr.Un Corr.Adj
#> Z1 Contin. -0.1351 -0.0021
#> Z2 Contin. 0.4445 0.0161
#> Z3 Contin. -0.0397 -0.0284
#> Z4 Contin. 0.0267 -0.0057
#> Z5 Binary 0.0691 0.0009
#> Z1 * Z2 Contin. -0.0944 0.0016
#> Z1 * Z3 Contin. -0.1539 -0.0080
#> Z1 * Z4 Contin. -0.0168 -0.0007
#> Z1 * Z5_0 Contin. -0.1445 -0.0101
#> Z1 * Z5_1 Contin. 0.0438 0.0101
#> Z2 * Z3 Contin. 0.2647 -0.0129
#> Z2 * Z4 Contin. 0.0488 -0.0027
#> Z2 * Z5_0 Contin. -0.0266 -0.0005
#> Z2 * Z5_1 Contin. 0.0884 0.0027
#> Z3 * Z4 Contin. 0.0260 -0.0077
#> Z3 * Z5_0 Contin. -0.0730 -0.0064
#> Z3 * Z5_1 Contin. 0.0669 0.0019
#> Z4 * Z5_0 Contin. -0.0016 -0.0038
#> Z4 * Z5_1 Contin. 0.0486 -0.0037
#>
#> Effective sample sizes
#> Total
#> Unadjusted 500.
#> Adjusted 264.01
We can also minimize the weighted dependence subject to the constraint that marginal moments of X and A are decorrelated:
<- independence_weights(A, X,
dcows_decor decorrelate_moments = TRUE)
dcows_decor#> Unweighted distance covariance: 0.3963
#> Optimized weighted dependence distance: 0.0251
#> Effective sample size: 260.8907
#>
#> Weight ranges:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.2491 0.8344 1.0000 1.4330 5.7290
bal.tab(A ~ Z1 + Z2 + Z3 + Z4 + Z5, data = data.frame(A = A, X),
weights = list(DCOW = dcows$weights,
DCOW_decor = dcows_decor$weights),
un = TRUE, int = TRUE, stats = "cor")
#> Balance Measures
#> Type Corr.Un Corr.DCOW Corr.DCOW_decor
#> Z1 Contin. -0.1351 -0.0021 0.0000
#> Z2 Contin. 0.4445 0.0161 -0.0004
#> Z3 Contin. -0.0397 -0.0284 -0.0002
#> Z4 Contin. 0.0267 -0.0057 -0.0002
#> Z5 Binary 0.0691 0.0009 -0.0000
#> Z1 * Z2 Contin. -0.0944 0.0016 0.0023
#> Z1 * Z3 Contin. -0.1539 -0.0080 -0.0021
#> Z1 * Z4 Contin. -0.0168 -0.0007 0.0058
#> Z1 * Z5_0 Contin. -0.1445 -0.0101 -0.0087
#> Z1 * Z5_1 Contin. 0.0438 0.0101 0.0104
#> Z2 * Z3 Contin. 0.2647 -0.0129 -0.0024
#> Z2 * Z4 Contin. 0.0488 -0.0027 0.0019
#> Z2 * Z5_0 Contin. -0.0266 -0.0005 -0.0012
#> Z2 * Z5_1 Contin. 0.0884 0.0027 0.0012
#> Z3 * Z4 Contin. 0.0260 -0.0077 -0.0011
#> Z3 * Z5_0 Contin. -0.0730 -0.0064 -0.0020
#> Z3 * Z5_1 Contin. 0.0669 0.0019 0.0019
#> Z4 * Z5_0 Contin. -0.0016 -0.0038 0.0018
#> Z4 * Z5_1 Contin. 0.0486 -0.0037 -0.0032
#>
#> Effective sample sizes
#> Total
#> All 500.
#> DCOW 264.01
#> DCOW_decor 260.89
Now use the weights to estimate the causal average dose response function (ADRF)
## create grid
<- seq(min(simdat$data$A), 50, length.out=500)
trt_vec
## estimate ADRF
<- weighted_kernel_est(A, y, dcows$weights, trt_vec)$est
adrf_hat
## estimate naively without weights
<- weighted_kernel_est(A, y, rep(1, length(y)), trt_vec)$est
adrf_hat_unwtd
<- c(-4.75, 4.75)
ylims plot(x = simdat$data$A, y = simdat$data$Y, ylim = ylims,
xlim = c(0,50),
xlab = "A", ylab = "Y")
## true ADRF
lines(x = trt_vec, y = simdat$true_adrf(trt_vec), col = "blue", lwd=2)
## estimated ADRF
lines(x = trt_vec, y = adrf_hat, col = "red", lwd=2)
## naive estimate
lines(x = trt_vec, y = adrf_hat_unwtd, col = "green", lwd=2)
legend("bottomleft", c("True ADRF", "Unweighted Est.", "DCOW-weighted Est."),
col = c("blue", "green", "red"), lty = 1, lwd = 2)