Describes how to use the python package xptcleaner to apply JSON ontology terms to clean SEND xpt files.
Before we are ready to use the functions in the package, we must ensure that a minimum of prerequisites are fulfilled.
R version 4.1.2 and above, Python 3.9.6 and above were the packages used to develop and test the code. Other versions can be used, but some issues may arise depending on versions.
Probably the easiest way: from your conda, virtualenv or just base installation do:
pip install xptcleaner
If you are running on a machine without admin rights, and you want to install against your base installation you can do:
pip install xptcleaner --user
In addtional to install from Python Package Index(PyPI), the source archive and the wheel archive can also be used for installation.
The source archive and the wheel for xptcleaner can be obtained from sendigR Github sendigR- xptcleaner
$ py -m pip install ./dist/xptcleaner-{version}.tar.gz
$ py -m pip install ./dist/xptcleaner-{version}-py3-none-any.whl
The following required python packages will be installed during the xptcleaner package installation:
* pandas
* pyreadstat
Install sendigR packages, refer to README for more details.
# Get CRAN version
install.packages("sendigR")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github('phuse-org/sendigR')
sendigR is located at: https://github.com/phuse-org/sendigR/
The importStudies.R script is located at: https://github.com/phuse-org/sendigR/blob/main/importStudies.R
The Python code to generate a JSON file for the XPT cleanup is located at: https://github.com/phuse-org/sendigR/tree/main/python/xptcleaner
The sample CDISC CT file and the extensible CT file are located under the ‘data-raw’ subfolder of the sendigR package.
library(reticulate)
library(sendigR)
#input CDISC and Extensible CT files.
<- "{path to CT file}/SEND_Terminology_EXTENSIBLE.txt"
infile1 <- "{path to CT file}/SEND Terminology_2021_12_17.txt"
infile2 #output JSON file
<- "{path to CT file to be created}/SENDct.json"
jsonfile #Call the gen_vocab function with the input and output files
::gen_vocab(list(infile1, infile2),jsonfile ) sendigR
library(reticulate)
library(sendigR)
#JSON file used for the xpt cleaning
<- "{path to CT file to be created}/SENDct.json"
jsonfile
#folder containing the source xpt files
<- "{path to xpt files}/96298/"
rawXptFolder #folder containing the cleaned target xpt files
<- "{path to cleaned xpt files}/96298/"
cleanXptFolder #Call the standardize_file function to clean the xpt file
::standardize_file(rawXptFolder, cleanXptFolder, jsonfile ) sendigR