The cdata
package
is a demonstration of the “coordinatized
data” theory and includes an implementation of the “fluid data”
methodology.
Briefly cdata
supplies data transform operators
that:
DBI
data source.pivot
and un-pivot
.A quick example:
library("cdata")
# first few rows of the iris data as an example
d <- wrapr::build_frame(
"Sepal.Length" , "Sepal.Width", "Petal.Length", "Petal.Width", "Species" |
5.1 , 3.5 , 1.4 , 0.2 , "setosa" |
4.9 , 3 , 1.4 , 0.2 , "setosa" |
4.7 , 3.2 , 1.3 , 0.2 , "setosa" |
4.6 , 3.1 , 1.5 , 0.2 , "setosa" |
5 , 3.6 , 1.4 , 0.2 , "setosa" |
5.4 , 3.9 , 1.7 , 0.4 , "setosa" )
d$iris_id <- seq_len(nrow(d))
knitr::kable(d)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | iris_id |
---|---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa | 1 |
4.9 | 3.0 | 1.4 | 0.2 | setosa | 2 |
4.7 | 3.2 | 1.3 | 0.2 | setosa | 3 |
4.6 | 3.1 | 1.5 | 0.2 | setosa | 4 |
5.0 | 3.6 | 1.4 | 0.2 | setosa | 5 |
5.4 | 3.9 | 1.7 | 0.4 | setosa | 6 |
Now suppose we want to take the above “all facts about each iris are in a single row” representation and convert it into a per-iris record block with the following structure.
record_example <- wrapr::qchar_frame(
"plant_part" , "measurement", "value" |
"sepal" , "width" , Sepal.Width |
"sepal" , "length" , Sepal.Length |
"petal" , "width" , Petal.Width |
"petal" , "length" , Petal.Length )
knitr::kable(record_example)
plant_part | measurement | value |
---|---|---|
sepal | width | Sepal.Width |
sepal | length | Sepal.Length |
petal | width | Petal.Width |
petal | length | Petal.Length |
The above sort of transformation may seem exotic, but it is fairly common when we want to plot many aspects of a record at the same time.
To specify our transformation we combine the record example with information about how records are keyed (recordKeys showing which rows go together to form a record, and controlTableKeys specifying the internal structure of a data record).
layout <- rowrecs_to_blocks_spec(
record_example,
controlTableKeys = c("plant_part", "measurement"),
recordKeys = c("iris_id", "Species"))
print(layout)
## {
## row_record <- wrapr::qchar_frame(
## "iris_id" , "Species", "Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length" |
## . , . , Sepal.Width , Sepal.Length , Petal.Width , Petal.Length )
## row_keys <- c('iris_id', 'Species')
##
## # becomes
##
## block_record <- wrapr::qchar_frame(
## "iris_id" , "Species", "plant_part", "measurement", "value" |
## . , . , "sepal" , "width" , Sepal.Width |
## . , . , "sepal" , "length" , Sepal.Length |
## . , . , "petal" , "width" , Petal.Width |
## . , . , "petal" , "length" , Petal.Length )
## block_keys <- c('iris_id', 'Species', 'plant_part', 'measurement')
##
## # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = FALSE)
## }
In the above we have used the common useful data organizing trick of specifying a dependent column (Species being a function of iris_id) as an additional key.
This layout then specifies and implements the data transform. We can transform the data by sending it to the layout.
iris_id | Species | plant_part | measurement | value |
---|---|---|---|---|
1 | setosa | sepal | width | 3.5 |
1 | setosa | sepal | length | 5.1 |
1 | setosa | petal | width | 0.2 |
1 | setosa | petal | length | 1.4 |
2 | setosa | sepal | width | 3.0 |
2 | setosa | sepal | length | 4.9 |
2 | setosa | petal | width | 0.2 |
2 | setosa | petal | length | 1.4 |
3 | setosa | sepal | width | 3.2 |
3 | setosa | sepal | length | 4.7 |
3 | setosa | petal | width | 0.2 |
3 | setosa | petal | length | 1.3 |
4 | setosa | sepal | width | 3.1 |
4 | setosa | sepal | length | 4.6 |
4 | setosa | petal | width | 0.2 |
4 | setosa | petal | length | 1.5 |
5 | setosa | sepal | width | 3.6 |
5 | setosa | sepal | length | 5.0 |
5 | setosa | petal | width | 0.2 |
5 | setosa | petal | length | 1.4 |
6 | setosa | sepal | width | 3.9 |
6 | setosa | sepal | length | 5.4 |
6 | setosa | petal | width | 0.4 |
6 | setosa | petal | length | 1.7 |
And it is easy to invert these transforms using the t()
transpose/adjoint notation.
## {
## block_record <- wrapr::qchar_frame(
## "iris_id" , "Species", "plant_part", "measurement", "value" |
## . , . , "sepal" , "width" , Sepal.Width |
## . , . , "sepal" , "length" , Sepal.Length |
## . , . , "petal" , "width" , Petal.Width |
## . , . , "petal" , "length" , Petal.Length )
## block_keys <- c('iris_id', 'Species', 'plant_part', 'measurement')
##
## # becomes
##
## row_record <- wrapr::qchar_frame(
## "iris_id" , "Species", "Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length" |
## . , . , Sepal.Width , Sepal.Length , Petal.Width , Petal.Length )
## row_keys <- c('iris_id', 'Species')
##
## # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = FALSE)
## }
iris_id | Species | Sepal.Width | Sepal.Length | Petal.Width | Petal.Length |
---|---|---|---|---|---|
1 | setosa | 3.5 | 5.1 | 0.2 | 1.4 |
2 | setosa | 3.0 | 4.9 | 0.2 | 1.4 |
3 | setosa | 3.2 | 4.7 | 0.2 | 1.3 |
4 | setosa | 3.1 | 4.6 | 0.2 | 1.5 |
5 | setosa | 3.6 | 5.0 | 0.2 | 1.4 |
6 | setosa | 3.9 | 5.4 | 0.4 | 1.7 |
The layout specifications themselves are just simple lists with “pretty print methods” (the control table being simply and example record in the form of a data.frame).
## $controlTable
## plant_part measurement value
## 1 sepal width Sepal.Width
## 2 sepal length Sepal.Length
## 3 petal width Petal.Width
## 4 petal length Petal.Length
##
## $recordKeys
## [1] "iris_id" "Species"
##
## $controlTableKeys
## [1] "plant_part" "measurement"
##
## $checkNames
## [1] TRUE
##
## $checkKeys
## [1] FALSE
##
## $strict
## [1] FALSE
##
## $allow_rqdatatable
## [1] FALSE
Notice that almost all of the time and space in using cdata is spent in specifying how your data is structured and is to be structured.
The main cdata
interfaces are given by the following set
of methods:
rowrecs_to_blocks_spec()
,
for specifying how single row records map to general multi-row (or
block) records.blocks_to_rowrecs_spec()
,
for specifying how multi-row block records map to single-row
records.layout_specification()
,
for specifying transforms from multi-row records to other multi-row
records.layout_by()
or the wrapr
dot arrow pipe for applying a layout to re-arrange data.t()
(transpose/adjoint) to invert or reverse layout
specifications.Some convenience functions include:
pivot_to_rowrecs()
,
for moving data from multi-row block records with one value per row (a
single column of values) to single-row records [spread
or
dcast
].pivot_to_blocks()
/unpivot_to_blocks()
,
for moving data from single-row records to possibly multi row block
records with one row per value (a single column of values)
[gather
or melt
].wrapr::qchar_frame()
a helper function for specifying record control table layout
specifications.wrapr::build_frame()
a helper function for specifying data frames.The package vignettes can be found in the “Articles” tab of the cdata
documentation site.
The (older) recommended tutorial is: Fluid data reshaping with cdata. We also have an (older) short free cdata screencast (and another example can be found here).