Getting started
df2yaml
is an R package distributed as part of the CRAN. To install the package, start R and enter:
# install via CRAN
install.package("df2yaml")
# install via Github
# install.package("remotes") #In case you have not installed it.
::install_github("showteeth/df2yaml") remotes
In general, it is recommended to install from Github repository (update more timely).
Once df2yaml
is installed, it can be loaded by the following command.
library(df2yaml)
Introduction
The goal of df2yaml
is simplify the process of converting dataframe to YAML. The dataframe with multiple key columns and one value column (this column can also contain key-value pair(s)) will be converted to multi-level hierarchy.
Usage
Load the test data, this test data contains two key columns (paras
and subcmd
) and one value column, the value column also contains key and value pair(s) separated by “:”.
# library
library(df2yaml)
# load test file
<- system.file("extdata", "df2yaml_l3.txt", package = "df2yaml")
test_file = read.table(file = test_file, header = T, sep = "\t")
test_data head(test_data)
#> paras subcmd values
#> 1 picard insert_size MINIMUM_PCT: 0.5
#> 2 picard markdup CREATE_INDEX: true; VALIDATION_STRINGENCY: SILENT
#> 3 preseq -r 100 -seg_len 100000000
#> 4 qualimap --java-mem-size=20G -outformat HTML
#> 5 rseqc mapq: 30; percentile-floor: 5; percentile-step: 5
# output yaml string
= df2yaml(df = test_data, key_col = c("paras", "subcmd"), val_col = "values")
yaml_res cat(yaml_res)
#> preseq: -r 100 -seg_len 100000000
#> qualimap: --java-mem-size=20G -outformat HTML
#> rseqc:
#> mapq: 30
#> percentile-floor: 5
#> percentile-step: 5
#> picard:
#> insert_size:
#> MINIMUM_PCT: 0.5
#> markdup:
#> CREATE_INDEX: true
#> VALIDATION_STRINGENCY: SILENT
Convert above dataframe to YAML:
= df2yaml(df = test_data, key_col = c("paras", "subcmd"), val_col = "values")
yaml_res cat(yaml_res)
#> preseq: -r 100 -seg_len 100000000
#> qualimap: --java-mem-size=20G -outformat HTML
#> rseqc:
#> mapq: 30
#> percentile-floor: 5
#> percentile-step: 5
#> picard:
#> insert_size:
#> MINIMUM_PCT: 0.5
#> markdup:
#> CREATE_INDEX: true
#> VALIDATION_STRINGENCY: SILENT
There is no limit to the number of key columns used to convert.
Session info
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-conda-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /home/softwares/anaconda3/envs/r4.0/lib/libopenblasp-r0.3.12.so
#>
#> locale:
#> [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
#> [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] df2yaml_0.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.37 magrittr_2.0.1 tidyselect_1.1.0
#> [5] R6_2.5.0 rlang_1.0.3 fastmap_1.1.0 fansi_0.4.2
#> [9] stringr_1.4.0 dplyr_1.0.5 tools_4.0.3 xfun_0.30
#> [13] utf8_1.2.1 rrapply_1.2.6 DBI_1.1.1 cli_3.3.0
#> [17] jquerylib_0.1.3 ellipsis_0.3.2 htmltools_0.5.2 assertthat_0.2.1
#> [21] yaml_2.2.1 digest_0.6.27 tibble_3.1.0 lifecycle_1.0.0
#> [25] crayon_1.4.1 purrr_0.3.4 sass_0.4.1 prettydoc_0.4.1
#> [29] vctrs_0.4.1 glue_1.6.2 evaluate_0.14 rmarkdown_2.14
#> [33] stringi_1.5.3 pillar_1.5.1 compiler_4.0.3 bslib_0.3.1
#> [37] generics_0.1.0 jsonlite_1.7.2 pkgconfig_2.0.3