--- title: "Introduction" output: volker::html_report vignette: > %\VignetteIndexEntry{Introduction} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r include=FALSE} knitr::opts_chunk$set( comment = "#>", collapse = TRUE, echo = TRUE, message = FALSE, knitr.table.format = "html" ) options( vlkr.fig.settings=list( html = list( dpi = 96, scale = 1, width = 910, pxperline = 12 ) ) ) ``` ## How to use the volkeR package? First, load the package, set the plot theme and get some data. ```{r, warning=FALSE} # Load the package library(volker) # Set the basic plot theme theme_set(theme_vlkr()) # Load an example dataset ds from the package ds <- volker::chatgpt ``` ## How to generate tables and plots? Decide whether your data is categorical or metric and choose the appropriate function: - `tab_counts()` shows frequency tables. - `plot_counts()` generates simple and stacked bar charts. - `effect_counts()` calculates test statistics for categorical data.
- `tab_metrics()` creates tables with distribution parameters. - `plot_metrics()` visualises distributions in density plots, box plots or scatter plots. - `effect_metrics()` calculates test statistics for metric data. The column selection determines whether to analyse single variables, item lists or to compare and correlate multiple variables. **Try it out!** ### Categorical variables ```{r} # A single variable tab_counts(ds, use_private) ``` ```{r} # A list of variables tab_counts(ds, c(use_private, use_work)) ``` ```{r} # Variables matched by a pattern tab_counts(ds, starts_with("use_")) ``` ### Metric variables ```{r} # One metric variable tab_metrics(ds, sd_age) ``` ```{r} # Multiple metric items tab_metrics(ds, starts_with("cg_adoption_")) ``` ### Cross tabulation and group comparison Provide a grouping column in the third parameter to compare different groups. ```{r} tab_counts(ds, adopter, sd_gender) ``` For metric variables, you can compare the mean values. ```{r} # Compare the means of one grouping variable (including the confidence interval) tab_metrics(ds, sd_age, sd_gender, ci = TRUE) ``` By default, the crossing variable is treated as categorical. You can change this behaviour using the metric-parameter to calculate correlations: ```{r} # Correlate two metric variables tab_metrics(ds, sd_age, use_work, metric = TRUE, ci = TRUE) ``` Each table function has a corresponding plot function with parameters to pimp the result. See the function help (F1 key) to learn the options. For example, you can use the `prop` parameter to grow bars to 100%. The `numbers` parameter prints frequencies and percentages onto the bars. ```{r} ds |> filter(sd_gender != "diverse") |> plot_counts(adopter, sd_gender, prop="rows", numbers=c("p","n")) ``` Further, the effect-functions conduct statistical tests: ```{r} ds |> filter(sd_gender != "diverse") |> effect_counts(adopter, sd_gender) ``` # Automatically generate reports ## Getting started Reports combine plots, tables and effect calculations. Optionally, for item batteries, an index, clusters or factors are calculated and reported. To see an example or develop own reports, use the volker report template in RStudio: - Create a new R Markdown document from the main menu - In the popup select the "From Template" option - Select the volker template. - The template contains a working example. Just click knit to see the result. Have fun with developing own reports! ## Custom reports To generate a volker-report from any R-Markdown document, add `volker::html_report` to the output options of your Markdown document: ``` --- title: "How to create reports?" output: volker::html_report --- ``` Then, you can generate combined outputs using the report-functions. One advantage of the report-functions is that plots are automatically scaled to fit the page. See the function help for further options (F1 key). ```{r} ds %>% filter(sd_gender != "diverse") %>% report_metrics(starts_with("cg_adoption_"), sd_gender, box=TRUE, ci=TRUE) ``` ## Custom tab sheets By default, a header and tabsheets are automatically created. You can mix in custom content. - If you want to add content before the report outputs, set the title parameter to `FALSE` and add your own title. - A good place for methodological details is a custom tabsheet next to the "Plot" and the "Table" buttons. You can add a tab by setting the close-parameter to `FALSE` and adding a new header on the fifth level (5 x # followed by the tab name). Close your custom new tabsheet with `#### {-}` (4 x #). All together, the following report output is generated by the pattern: ```{r} #> ### Adoption types #> #> ```{r echo=FALSE} #> ds %>% #> filter(sd_gender != "diverse") %>% #> report_counts(adopter, sd_gender, prop="rows", title=FALSE, close=FALSE, box=TRUE, ci=TRUE) #> ``` #> #> ##### Method #> Basis: Only male and female respondents. #> #> #### {-} ``` ### Adoption types ```{r echo=FALSE} ds %>% filter(sd_gender != "diverse") %>% report_counts(adopter, sd_gender, prop="rows", title=FALSE, close=FALSE, box=TRUE, ci=TRUE) ``` ##### Method Basis: Only male and female respondents. #### {-} # Theming Plot and table functions share a number of parameters that can be used to customize the outputs. Lookup the available parameters in the help of the specific function. The theme_vlkr()-function lets you customise colors: ```{r} theme_set(theme_vlkr( base_fill = c("#F0983A","#3ABEF0","#95EF39","#E35FF5","#7A9B59"), base_gradient = c("#FAE2C4","#F0983A") )) ``` # Custom labels Labels used in plots and tables are stored in the comment attribute of the variable. You can inspect all labels using the `codebook()`-function: ```{r} codebook(ds) ``` You can set specific column labels by providing a named list to the items-parameter of `labs_apply()`: ```{r} ds %>% labs_apply( items = list( "cg_adoption_advantage_01" = "Allgemeine Vorteile", "cg_adoption_advantage_02" = "Finanzielle Vorteile", "cg_adoption_advantage_03" = "Vorteile bei der Arbeit", "cg_adoption_advantage_04" = "Macht mehr Spaß" ) ) %>% tab_metrics(starts_with("cg_adoption_advantage_")) ``` Labels for values inside a column can be adjusted by providing a named list to the values-parameter of `labs_apply()`. In addition, select the columns where value labels should be changed: ```{r} ds %>% labs_apply( cols=starts_with("cg_adoption"), values = list( "1" = "Stimme überhaupt nicht zu", "2" = "Stimme nicht zu", "3" = "Unentschieden", "4" = "Stimme zu", "5" = "Stimme voll und ganz zu" ) ) %>% plot_metrics(starts_with("cg_adoption")) ``` To conveniently manage all labels of a dataset, save the result of `codebook()` to an Excel file, change the labels manually in a copy of the Excel file, and finally call `labs_apply()` with your revised codebook. ```{r, eval = FALSE} library(readxl) library(writexl) # Save codebook to a file codes <- codebook(ds) write_xlsx(codes,"codebook.xlsx") # Load and apply a codebook from a file codes <- read_xlsx("codebook_revised.xlsx") ds <- labs_apply(ds, codebook) ``` Be aware that some data operations such as `mutate()` from the tidyverse loose labels on their way. In this case, store the labels (in the codebook attribute of the data frame) before the operation and restore them afterwards: ```{r} ds %>% labs_store() %>% mutate(sd_age = 2024 - sd_age) %>% labs_restore() %>% tab_metrics(sd_age) ``` # Index calculation for item batteries You can calculate mean indexes from a bunch of items using `add_index()`. A new column is created with the average value of all selected columns for each case. Reliability and number of items are calculated with `psych::alpha()` and stored as column attribute named "psych.alpha". The reliability values are printed by `tab_metrics()`. **Add a single index** ```{r} ds %>% add_index(starts_with("cg_adoption_")) %>% tab_metrics(idx_cg_adoption) ``` **Compare the index values by group** ```{r} ds %>% add_index(starts_with("cg_adoption_")) %>% tab_metrics(idx_cg_adoption, adopter) ``` **Add multiple indizes and summarize them** ```{r} ds %>% add_index(starts_with("cg_adoption_")) %>% add_index(starts_with("cg_adoption_advantage")) %>% add_index(starts_with("cg_adoption_fearofuse")) %>% add_index(starts_with("cg_adoption_social")) %>% tab_metrics(starts_with("idx_cg_adoption")) ``` # Factor and cluster Analysis The easiest way to conduct factor analysis or cluster analyses is to use the respective parameters in the `report_metrics()` function. ```{r} ds |> report_metrics(starts_with("cg_adoption"), factors = TRUE, clusters = TRUE) ``` Currently, cluster analysis is performed using kmeans and factor analysis is a principal component analysis. Setting the parameters to true, automatically generates scree plots and selects the number of factors or clusters. Alternatively, you can explicitly specify the numbers. If you want to work with the results, use `add_factors()` and `add_clusters()` respectively. For factor analysis, new columns prefixed with "fct_" are created to store the factor loadings based on the specified number of factors. For clustering, an additional column prefixed with "cls_" is added that assigns each observation to a cluster number. In the next step, you can use the new columns as shown below. To automatically determine the optimal number of factors or clusters based on diagnostics, set k = NULL. **Add factor analysis results** ```{r} ds |> add_factors(starts_with("cg_adoption"), k = 3) |> report_metrics(fct_cg_adoption_1, fct_cg_adoption_2, metric = TRUE) ``` **Automatically determine the number of factors** ```{r} ds |> add_factors(starts_with("cg_adoption"), k = NULL) |> factor_tab(starts_with("fct_cg_adoption")) ``` **Compare values by cluster** ```{r} ds |> add_clusters(starts_with("cg_adoption"), k = 3) |> report_counts(sd_gender, cls_cg_adoption, prop = "cols") ``` # What's behind the scenes? The volker-package is based on standard methods for data handling and visualisation. You can produce all outputs with a handful of functions. The package just makes your code dry - don't repeat yourself - and wraps often used snippets into a simple interface. The package provides print- and knit-functions that pimp console and markdown output. To make this work, the cleaned data, produced plots, tables and markdown snippets gain new classes (`vlkr_df`, `vlkr_plt`, `vlkr_tbl`, `vlkr_list`, `vlkr_rprt`). Basically, all table values are calculated two tidyverse functions: - `count()` is used to produce counts - `skim()` is used to produce metrics To shape the data frames, two essential functions come into play: - `group_by()` is used to calculate grouped outputs - `pivot_longer()` brings multiple items into a format where the item name becomes a grouping variable. Plots are generated by `ggplot()`. Statistical tests, clustering and factor analysis are largely based on the stats, psych, car and effectsize packages. Thanks to all the maintainers, authors and contributors of the packages that make the world of data a magical place.