This vignette shows a small example how functions to work with labelled data can be implemented in a typical data visualization workflow.
In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly:
factor(c("low", "high", "mid", "high", "low"))
#> [1] low high mid high low
#> Levels: high low mid
Reading SPSS-data with haven or sjlabelled keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the sjlabelled-package:
library(sjlabelled)
data(efc)
str(efc$e42dep)
#> num [1:908] 3 3 3 4 4 4 4 4 4 4 ...
#> - attr(*, "label")= chr "elder's dependency"
#> - attr(*, "labels")= Named num [1:4] 1 2 3 4
#> ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
While all plotting and table functions of the sjPlot-package make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics:
library(sjlabelled)
data(efc)
barplot(
table(efc$e42dep, efc$e16sex),
beside = TRUE,
legend.text = TRUE
)
As you can see in the above figure, the plot has neither axis nor legend labels.
as_label()
is a sjlabelled-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled.
barplot(
table(sjlabelled::as_label(efc$e42dep),
::as_label(efc$e16sex)),
sjlabelledbeside = TRUE,
legend.text = TRUE
)
There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame:
get_label()
to get variable labelsget_labels()
to get value labelsset_label()
to set variable labels (add them as vector attribute)set_labels()
to set value labels (add them as vector attribute)With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted.
barplot(
table(sjlabelled::as_label(efc$e42dep),
::as_label(efc$e16sex)),
sjlabelledbeside = TRUE,
legend.text = TRUE,
main = get_label(efc$e42dep)
)
The base subset()
function drops label attributes (or vector attributes in general) when subsetting data. In the sjlabelled-package, there are handy functions to deal with this problem: copy_labels()
and remove_labels()
.
copy_labels()
adds back labels to a subsetted data frame based on the original data frame. And remove_labels()
removes all label attributes.
<- subset(efc, subset = e16sex == 1, select = c(4:8))
efc.sub str(efc.sub)
#> 'data.frame': 296 obs. of 5 variables:
#> $ e17age : num 74 68 80 72 94 79 67 80 76 88 ...
#> $ e42dep : num 4 4 1 3 3 4 3 4 2 4 ...
#> $ c82cop1: num 4 3 3 4 3 3 4 2 2 3 ...
#> $ c83cop2: num 2 4 2 2 2 2 1 3 2 2 ...
#> $ c84cop3: num 4 4 1 1 1 4 2 4 2 4 ...
<- copy_labels(efc.sub, efc)
efc.sub str(efc.sub)
#> 'data.frame': 296 obs. of 5 variables:
#> $ e17age : num 74 68 80 72 94 79 67 80 76 88 ...
#> ..- attr(*, "label")= chr "elder' age"
#> $ e42dep : num 4 4 1 3 3 4 3 4 2 4 ...
#> ..- attr(*, "label")= chr "elder's dependency"
#> ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
#> .. ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
#> $ c82cop1: num 4 3 3 4 3 3 4 2 2 3 ...
#> ..- attr(*, "label")= chr "do you feel you cope well as caregiver?"
#> ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
#> .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
#> $ c83cop2: num 2 4 2 2 2 2 1 3 2 2 ...
#> ..- attr(*, "label")= chr "do you find caregiving too demanding?"
#> ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
#> .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
#> $ c84cop3: num 4 4 1 1 1 4 2 4 2 4 ...
#> ..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your friends?"
#> ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
#> .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The sjlabelled-package supports this feature and offers useful functions for these tasks.