In our cdata
R
package and training
materials we emphasize the record-oriented
thinking and how to
design a transform control table. We now have an additional exciting
new feature: control table keys.
The user can now control which columns of a cdata
control table are the keys, including now using composite keys (that is
keys that are spread across more than one column). This is easiest to
demonstrate with an example.
Consider the simple data frame (the first couple of rows of the
famous iris
data set).
library("cdata")
d <- wrapr::build_frame(
"Sepal.Length" , "Sepal.Width", "Petal.Length", "Petal.Width", "Species" |
5.1 , 3.5 , 1.4 , 0.2 , "setosa" |
4.9 , 3 , 1.4 , 0.2 , "setosa" )
d$id <- seq_len(nrow(d))
knitr::kable(d)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | id |
---|---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa | 1 |
4.9 | 3.0 | 1.4 | 0.2 | setosa | 2 |
Suppose we wish to land the dimensions of different irises in rows
keyed by two columns: Part
and Measure
. That
is we want the data to look like the following.
expect <- wrapr::build_frame(
"id" , "Species", "Part" , "Measure", "Value" |
1L , "setosa" , "Sepal", "Length" , 5.1 |
1L , "setosa" , "Sepal", "Width" , 3.5 |
1L , "setosa" , "Petal", "Length" , 1.4 |
1L , "setosa" , "Petal", "Width" , 0.2 |
2L , "setosa" , "Sepal", "Length" , 4.9 |
2L , "setosa" , "Sepal", "Width" , 3 |
2L , "setosa" , "Petal", "Length" , 1.4 |
2L , "setosa" , "Petal", "Width" , 0.2 )
knitr::kable(expect)
id | Species | Part | Measure | Value |
---|---|---|---|---|
1 | setosa | Sepal | Length | 5.1 |
1 | setosa | Sepal | Width | 3.5 |
1 | setosa | Petal | Length | 1.4 |
1 | setosa | Petal | Width | 0.2 |
2 | setosa | Sepal | Length | 4.9 |
2 | setosa | Sepal | Width | 3.0 |
2 | setosa | Petal | Length | 1.4 |
2 | setosa | Petal | Width | 0.2 |
This may seem like an exotic transform, but it seems to be common when you want to use multiple display aesthetics in a plot (such as x, y, color, and facet).
With multiple control table keys specifying these sorts of transforms can be made easy.
First define the control table specifying what a single row-record looks like after transformation to a multi-row block-record.
control_table <- wrapr::qchar_frame(
"Part" , "Measure", "Value" |
"Sepal", "Length" , Sepal.Length |
"Sepal", "Width" , Sepal.Width |
"Petal", "Length" , Petal.Length |
"Petal", "Width" , Petal.Width )
layout <- rowrecs_to_blocks_spec(
control_table,
controlTableKeys = c("Part", "Measure"),
recordKeys = c("id", "Species"))
print(layout)
#> {
#> row_record <- wrapr::qchar_frame(
#> "id" , "Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width" |
#> . , . , Sepal.Length , Sepal.Width , Petal.Length , Petal.Width )
#> row_keys <- c('id', 'Species')
#>
#> # becomes
#>
#> block_record <- wrapr::qchar_frame(
#> "id" , "Species", "Part" , "Measure", "Value" |
#> . , . , "Sepal", "Length" , Sepal.Length |
#> . , . , "Sepal", "Width" , Sepal.Width |
#> . , . , "Petal", "Length" , Petal.Length |
#> . , . , "Petal", "Width" , Petal.Width )
#> block_keys <- c('id', 'Species', 'Part', 'Measure')
#>
#> # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = FALSE)
#> }
Notice we are specifying that the columns Part
and
Measure
are the control table row-keys (identifying rows
within the block record) and “id” and “Species” are record keys
(specifying which rows are in the same record).
We have designed our desired transform above (as taught here).
We are introducing a new convention of putting the key values in quotes
(even though wrapr::qcar_frame()
does not need this).
Notice the quoted fields are column-names of the original row-oriented
data, and the quote is indicating that these are names standing in for
values.
We can now perform the transform.
id | Species | Part | Measure | Value |
---|---|---|---|---|
1 | setosa | Sepal | Length | 5.1 |
1 | setosa | Sepal | Width | 3.5 |
1 | setosa | Petal | Length | 1.4 |
1 | setosa | Petal | Width | 0.2 |
2 | setosa | Sepal | Length | 4.9 |
2 | setosa | Sepal | Width | 3.0 |
2 | setosa | Petal | Length | 1.4 |
2 | setosa | Petal | Width | 0.2 |
And, as one comes to expect with coordinatized/fluid-data transforms, the process is easy to reverse (modulo column and row order).
inv_layout <- t(layout)
print(inv_layout)
#> {
#> block_record <- wrapr::qchar_frame(
#> "id" , "Species", "Part" , "Measure", "Value" |
#> . , . , "Sepal", "Length" , Sepal.Length |
#> . , . , "Sepal", "Width" , Sepal.Width |
#> . , . , "Petal", "Length" , Petal.Length |
#> . , . , "Petal", "Width" , Petal.Width )
#> block_keys <- c('id', 'Species', 'Part', 'Measure')
#>
#> # becomes
#>
#> row_record <- wrapr::qchar_frame(
#> "id" , "Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width" |
#> . , . , Sepal.Length , Sepal.Width , Petal.Length , Petal.Width )
#> row_keys <- c('id', 'Species')
#>
#> # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = FALSE)
#> }
back <- res %.>% inv_layout
knitr::kable(back)
id | Species | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
---|---|---|---|---|---|
1 | setosa | 5.1 | 3.5 | 1.4 | 0.2 |
2 | setosa | 4.9 | 3.0 | 1.4 | 0.2 |
Recle Etino Vibal (who asked for this
feature in an issue) has an interesting article
https://amateurdatasci.rbind.io/post/table-another-back-again-cdata/
trying some variations on the data shaping concepts.
The cdata
unit test include the following variations of
the above example:
We think cdata
(and the accompanying fluid data
methodology, plus extensions)
is a very deep and powerful way of wrangling data. Once you take the
time to learn the methodology (which is “draw what you want to happen to
one record, type that in as your control table, and you are done!”) it
is very easy to use.