BEDMatrix
object) caused popkin
to die. Now popkin behaves as expected. New test unit cases were added
to test function inputs (previously this case was untested).All doc examples are now run (all used to be
dontrun
).
Other minor non-code changes for first CRAN submission.
lfa
is not available (needed for CRAN tests). This change
is not visible in rendered vignette included in package.plotPopkin
now allows NULL elements in input list x,
makes empty plots with titles (good for placeholders or other
non-existent data)
Clarified plotPopkin
documentation (that
marPad
is added to xMar
values if
set)
README.md
now contains instructions for installing
from CRAN as well as from GitHub.
printLabs
(used by
plotPopkin
) is now more flexible in where it places its
labels (new args side1
and side2
)RColorBrewer
.fst
, inbr
,
plotPopkin
).neff
function (estimates effective sample size
given a kinship matrix and weights; can find optimal weights that are
non-negative or sign-unconstrained, yielding maximum neff values)Now the popkin
function preserves the individual
names if they are present in the input genotype matrix. These names get
copied to the rows and columns of the output kinship matrix.
Converted the vignette from PDF to HTML
inbrDiag
-> inbr_diag
neff
-> n_eff
plotPopkin
-> plot_popkin
rescalePopkin
-> rescale_popkin
weightsSubpops
-> weights_subpops
plot_popkin
).plotPopkin
retains the
older argument names.inbr_diag
now accepts lists of kinship matrices to
transform (for easier plotting of multiple matrices).plot_popkin
now requires its non-NULL inputs to be
proper kinship matrices. Previously, the code used to somewhat allow for
non-square matrices to be visualized, but this case had no guarantees to
work. The code is cleaner under the assumption of symmetric square
matrices.validate_kinship
,
mean_kinship
plot_popkin
bug fixes and enhancements!
plot_popkin
now resets graphical parameters when done
and after every panel as needed.
NULL
(default) for subsequent panels, the original margins
were not reset (instead, the last values were incorrectly
propagated).par
values) is now
reset after plotting is complete.plot_popkin
option
panel_letters
(default is A-Z, so the default remains to
not show letters for a single panel).leg_cex
option to plot_popkin
.popkin
function the deprecated parameter
names lociOnCols
and memLim
alongside the new
names, to prevent breaking existing code (generate warnings).inbr_diag
now handles NULL
inputs
correctly (preserves them as NULL
without throwing
errors).plot_popkin
has a new logical option
null_panel_data
, to change behavior in the presence of
NULL
kinship matrices (whether they must or must not have
titles and other parameters).
NULL
panels.Rbuildignore
to stop ignoring
README
; also removed non-existent files from listpopkin.Rproj
filesolve_m_mem_lim
, which
generalizes previous behavior to estimate chunk sizes (in number of
loci) given a limited memory and number of individuals for various
numbers of matrices (of dimensions (m,n) or (n,n)) and vectors (lengths
m or n). This function is shared with related projects (such as
popkinsuppl
on GitHub).solve_m_mem_lim
always returns integer chunk sizes
(number of loci). Previously the function returned non-integers only if
the total matrix size m
was not provided.solve_m_mem_lim
in other dependent packages. In
particular, the internal function get_mem_lim_m
was
removed.popkin
function accepts the new parameter
mem_factor
.plot_popkin
updates:
labs_even = TRUE
were not placed correctly. The error was
most evident for very small samples (i.e. n = 3
individuals), and was imperceptible otherwise (i.e. n = 100
or more).diag_line = TRUE
did not extend fully to extremes. This
error was again most evident for very small samples, and was
imperceptible otherwise.weights
option, to change width of every
individual to highlight individuals with more weight.raster
option, equivalent to
useRaster
option in the image
function used
internally. If weights
are not NULL
,
raster
is forced to FALSE
(required for
image
to work in this setting). So its only use is to set
it when weights
are null, as needed.Memory control bugfixes
BEDMatrix
object is analyzed
solve_m_mem_lim
now returns memory
limit from get_mem_lim
or user, in addition to the chunk
size in both number of loci and in expected memory usage.Other enhancements
n_eff
function now ensures output n_eff
estimates are in the theoretically valid range of [ 1, 2*n ]. Numerical
issues in small and noisy kinship matrix estimates could lead to
out-of-bounds estimates, which are now replaced with their closest
boundary values.plot_popkin
, added option
names_las
plot_popkin_single
:
kinship_range
to agree
with the default of plot_popkin
when a single kinship
matrix is plotted (as a result, default colors now agree in that case
too).breaks
is now invisible.plot_popkin
are visible, differences
are only noticeable calling this internal function
plot_popkin_single
directly.Improvements to function plot_popkin
:
leg_per_panel
, which if true allows each
kinship panel to have a different scale (each gets its own legend/color
key).leg_*
options to be able to take on
different values per panel.leg_width
to control the width of the
legend panels. Increased the default width of this legend/color key
(from 0.1 to 0.3, as a fraction of the width of the kinship panels),
which changes the behavior in the original case when this legend is
shared across kinship panels. Now the full legend fits in the panel,
without needing an outer margin to the right.leg_mar
behavior changed. Now
leg_mar
can be a scalar, which sets the right margin of the
legend panel. New default is leg_mar = 3
, again necessary
so the label of the legend fits in the panel. Previous behaviors of
leg_mar = NULL
and a full margin specification are
retained.More improvements to function plot_popkin
:
oma
, which sets outer margins via
par(oma)
but provides additional useful shortcuts and
defaults. This changes the default behavior of plot_popkin
by setting the left outer margin to 1.5 (all other values are zero),
whereas before plot_popkin
did not set any outer margins.
This new default behavior makes the “Individuals” outer label appear
automatically in plots (whereas before, simply calling
plot_popkin
without setting outer margins resulted in this
outer-margin y-axis label being hidden from view).mar
to accept various
shortcuts (scalar values set only bottom and left margin, whereas the
second value of a vector of length 2 sets the top margin, which is
otherwise zero; in these two cases the right margin is zero). Default
behavior remains to not change existing margins.inbrDiag
,
neff
, plotPopkin
, rescalePopkin
,
weightsSubpops
.
popkin
function:
lociOnCols
, memLim
.class
usage now that matrices return a
two-element array in R-devel (required by CRAN).calc_leg_width_min
internal function, though it
is unfinished and unused.man/figures/
x_local
parameter to function fst
,
which permits estimation of FST when there is known local inbreeding
(estimated from a pedigree or IBD blocks).validate_kinship
now tests for symmetry in input
kinship matrices too.solve_m_mem_lim
now
avoids a rare integer overflow caused when input number of individuals
n
was encoded as an integer and was greater than
sqrt(.Machine$integer.max)
, or 46340.95.validate_kinship
now has sym
option that,
if FALSE
, skips symmetry test (defaults to
TRUE
).plot_popkin
has the same sym
option passed
to validate_kinship
, but here it defaults to
FALSE
(there is no inherent error caused by plotting
non-symmetric matrices).popkin
want_M
option, which if TRUE
returns
a list containing the kinship
matrix as well as the
pairwise complete count matrix M
.m_chunk_max
option (default 1000), which sets the
maximum number of loci to process at the time. The new default behavior
reduces memory usage a lot, especially on machines with large memory,
without sacrificing speed. Original version would use a lot of memory
just because it was available, which could be inconvenient when trying
to run other processes, and did not result in increased speed, so it was
unnecessary at best.popkin_A
(used to be
unexported get_A
) and popkin_A_min_subpops
(used to be unexported min_mean_subpops
)
popkin
function.popkin
methodvalidate_kinship
added option
name
(default “kinship”) for clear error reports when the
matrix being tested is not actually a kinship matrix
name = "A"
to validate
A
in popkin_A_min_subpops
.DESCRIPTION
,
README.md
and the vignette, to point to the published
method in PLoS Genetics, and also a related preprint of human analysis
on bioRxiv.popkin
function is run.
Free memory is not calculated in these systems and defaults to 1GB,
which threw a warning since could cause problems if the actual memory
available is less. However, since free memory is rarely below 1GB on
reasonable systems, throwing this warning had become more problematic
than it was useful (it interfered with internal unit testing), so I
decided to remove the warning.popkin_af
, which is the analog of
popkin
but for allele frequency matrices instead of
genotypes, and as a consequence it estimates coancestry instead of
kinship.Overall added tree plotting capabilities and more plotting fine control.
plot_phylo
for plotting
phylo
trees. This is a wrapper around
ape::plot.phylo
that makes several adjustments so plots
agree more with accompanying kinship matrices (package ape
is now a dependency for this feature).plot_popkin
had the following updates:
phylo
and function
are now
accepted elements in input list kinship
(first argument).
If phylo
, these trees are plotted via
plot_phylo
. If function
, its code is executed
without arguments, which is expected to plot a single panel.ylab_side
to allow placing labels on
x-axis (bottom, but also top, and right side) instead of the default
y-axis (left side).leg_column
for placing legend/color key in
any column (default last column, which was the only choice before).panel_letters_adj
for positioning panel
letters more finely, farther into the margin. Also, previous hardcoded
default of 0
(inside x-axis range) was changed now to
-0.1
(just outside the x-axis range in most cases).names = TRUE
) are now
always plotted entirely, even if overlapping. The old behavior (R’s
default) plotted names in order and skipped overlapping labels (see
?axis
), which looks prettier but was confusing for this
plot as it suggested incorrectly that some individuals or subpopulations
were not present. The solution is unfortunately a hack, to pass
gap.axis = -1
to axis
(suggested in
?axis
), which hopefully does not break in the future.validate_kinship
now has option
logical = TRUE
to return a logical value instead of
throwing errors.inst/CITATION
(missed last time I
updated them in other locations).weights_subpops
updates:
subsubpops
for
calculating weights on two levels.table
).LazyData: true
from DESCRIPTION (to avoid a new
“NOTE” on CRAN).NEWS.md
slightly to improve its
automatic parsing.avg_kinship_subpops
.popkin_A_min_subpops
:
avg_kinship_subpops
internally to perform the
bulk of the calculationssubpops = NULL
, calculation now returns minimum
A
among off-diagonal elements only (excluding diagonal)
rather than the overall minimum of A
. There’s no difference
when A
is calculated from genotypes (diagonal values are
much greater than off-diagonal values), but made the change for
consistency when it might differ for arbitrary inputs.README
updated GitHub install instructions for building
vignettes.plot_popkin
fixed a bug when
null_panel_data = TRUE
in which titles that went over
panels with NULL
kinship were incorrectly omitted.Old: retrieved MemFree
(from
/proc/meminfo
). This could underestimate available memory
when Buffers
and Cached
memory are large
(these count as available memory!), and in some cases cause this
error:
Error in solve_m_mem_lim :
The resulting `m_chunk` was negative! This is because either `mat_n_n` or `vec_n` are non-zero and `n` alone is too large for the available memory (even for `m_chunk == 0`). The solution is to free more memory (ideal) or to reduce `n` if possible.
New: retrieve MemAvailable
(still from
/proc/meminfo
), which is ideal but is absent in older linux
kernels (<3.14), otherwise fallback into retrieving and returning the
sum of MemFree
, Buffers
, and
Cached
. Either way available memory is greater than
MemFree
alone and is also more accurate.
Under the hood, cleaned parser considerably and check for several trouble scenarios that were previously taken for granted.
plot_admix
for making
admixture/structure plots with most of the same options as
plot_popkin
!
print_labels_multi
, print_labels
.plot_admix
added options
leg_title_line
and leg_las
, and changed the
default of leg_mar
, to better accommodate numerous long
ancestry labels.plot_admix
:
admix_order_cols
: to automatically order ancestries
given ordered individuals.admix_label_cols
: to automatically assign labels to
ancestries given labels to individuals.popkin
, popkin_A
:
mean_of_ratios
, default FALSE
is original estimator, TRUE
gives a new estimator that
upweighs rare variants, which resembles in this way the standard kinship
estimator, and which appears to improve performance in association
testing.M
(one of the return values when
want_M = TRUE
) did not inherit individual names from
X
even though A
and kinship
did,
and similarly all inherit names when X
is a function (fixed
accidentally when replacing Rcpp
code with pure R).mean_of_ratios = FALSE
)
replaced Rcpp
code with pure R version, which results in
large speedups, at a cost of higher memory use (despite my best attempts
at improving the original Rcpp
code, the simpler R code is
doing something magically fast I don’t understand). Rcpp
,
RcppEigen
dependencies have been dropped as a
consequence.print_labels
fixed bug when
even = TRUE
and the minimum xb_ind
is not
zero, which caused the maximum to be off by xb_ind
.
plot_popkin
or plot_admix
) because the
minimum xb_ind
was always zero in those cases.plot_popkin
ylab_per_panel
to allow single-panel
figures to place y-axis label in inner margin (before that case was
forced to use outer margin).oma
and
layout_add
, as in some cases you may want to turn off both
features to avoid unexpected behaviors (though there are cases where
turning off one but not the other also makes sense).plot_phylo
added option
edge_width
, which defaults to 1.
ape
version 5.5 and prior, where its function
plot.phylo
(which popkin::plot_phylo
wraps)
had its parameter edge.width
default to 1.ape
version 5.6 (2021-12-20),
edge.width
defaults to NULL
, with results in
setting it to par('lwd')
, which had undesirable
consequences in my use cases and which is why the old default is
overridden in popkin.plot_popkin
’s old default edge widths for
trees of class phylo
is also restored.hgdp_subset
sample data, copied from
lfa
.
lfa
dependency, which was only used for this
sample data. lfa
has become unreliable in external testing
servers, particularly as it is on Bioconductor and sometimes hard to
install on R-devel, so its removal simplifies automatic testing
considerably.popkin
, popkin_A
, and
popkin_af
plot_popkin
clarified documentation.