Compute redundancy analysis, a type of canonical analysis.
State: Experimental as of 0.4.0.
It is related to PCA and multiple regression because the explained
variables y are fitted to the explanatory variables x and PCA
is then performed on the fitted values. A similar process is
performed on the residuals.
RDA should be chosen if the studied gradient is small, and CCA
when it’s large, so that the contingency table is sparse.
Parameters: | y : pd.DataFrame
\(n \times p\) response matrix, where \(n\) is the number
of samples and \(p\) is the number of features. Its columns
need be dimensionally homogeneous (or you can set scale_Y=True).
This matrix is also referred to as the community matrix that
commonly stores information about species abundances
x : pd.DataFrame
\(n \times m, n \geq m\) matrix of explanatory
variables, where \(n\) is the number of samples and
\(m\) is the number of metadata variables. Its columns
need not be standardized, but doing so turns regression
coefficients into standard regression coefficients.
scale_Y : bool, optional
Controls whether the response matrix columns are scaled to
have unit standard deviation. Defaults to False.
scaling : int
Scaling type 1 produces a distance biplot. It focuses on
the ordination of rows (samples) because their transformed
distances approximate their original euclidean
distances. Especially interesting when most explanatory
variables are binary.
Scaling type 2 produces a correlation biplot. It focuses
on the relationships among explained variables (y). It
is interpreted like scaling type 1, but taking into
account that distances between objects don’t approximate
their euclidean distances.
See more details about distance and correlation biplots in
[R91], S 9.1.4.
|
Returns: | OrdinationResults
Object that stores the computed eigenvalues, the
proportion explained by each of them (per unit),
transformed coordinates for feature and samples, biplot
scores, sample constraints, etc.
|
Notes
The algorithm is based on [R91], S 11.1, and is expected to
give the same results as rda(y, x)
in R’s package vegan.
The eigenvalues reported in vegan are re-normalized to
\(\sqrt{\frac{s}{n-1}}\) n is the number of samples,
and s is the original eigenvalues. Here we will only return
the original eigenvalues, as recommended in [R91].
References
[R91] | (1, 2, 3, 4) Legendre P. and Legendre L. 1998. Numerical
Ecology. Elsevier, Amsterdam. |