bpca(pcaMethods)
bpca()所属R语言包:pcaMethods
Bayesian PCA Missing Value Estimator
贝叶斯PCA的缺少价值估算
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Implements a Bayesian PCA missing value estimator. The script is a port of the Matlab version provided by Shigeyuki OBA. See also http://hawaii.aist-nara.ac.jp/%7Eshige-o/tools/. BPCA combines an EM approach for PCA with a Bayesian model. In standard PCA data far from the training set but close to the principal subspace may have the same reconstruction error. BPCA defines a likelihood function such that the likelihood for data far from the training set is much lower, even if they are close to
实现贝叶斯PCA丢失的价值估计的。脚本的Shigeyuki的OBA提供的MATLAB版本的端口。也看到http://hawaii.aist-nara.ac.jp/%7Eshige-o/tools /。 BPCA结合EM的PCA方法与贝叶斯模型。远从训练集,但接近的主要子空间标准PCA数据可能具有相同的重建误差。 BPCA定义了一个似然函数,远离训练集数据的可能性要低得多,即使他们已经接近
用法----------Usage----------
参数----------Arguments----------
参数:Matrix
matrix – Pre-processed matrix (centered, scaled) with variables in columns and observations in rows. The data may contain missing values, denoted as NA.
matrix - 预加工矩阵列和行中的观测变量(居中,缩放)。数据可能包含缺失值,记为NA。
参数:nPcs
numeric – Number of components used for re-estimation. Choosing few components may decrease the estimation precision.
numeric - 用于重新估计的组件数量。选择几个组件,可能会降低估计精度。
参数:maxSteps
numeric – Maximum number of estimation steps.
numeric - 最大数量的估计步骤。
参数:verbose
boolean – BPCA prints the number of steps and the increase in precision if set to TRUE. Default is interactive().
boolean - BPCA打印的步数和精度的提高,如果设置为TRUE。默认是交互式()。
参数:threshold
convergence threshold
收敛阈值
参数:...
Reserved for future use. Currently no further parameters are used </table>
保留供将来使用。目前,没有进一步的参数</ TABLE>
Details
详情----------Details----------
Scores and loadings obtained with Bayesian PCA slightly differ from those obtained with conventional PCA. This is because BPCA was developed especially for missing value estimation. The algorithm does not force orthogonality between factor loadings, as a result factor loadings are not necessarily orthogonal. However, the BPCA authors found that including an orthogonality criterion made the predictions worse.
获得分数和贝叶斯PCA的负荷略有不同与传统的主成分分析获得的。这是因为BPCA发展,尤其是缺失值估计。该算法不强制正交因子载荷之间,作为结果的因素负荷量并不一定正交。然而,BPCA作者发现,包括正交标准的预测更糟糕。
The authors also state that the difference between real and predicted Eigenvalues becomes larger when the number of observation is smaller, because it reflects the lack of information to accurately determine true factor loadings from the limited and noisy data. As a result, weights of factors to predict missing values are not the same as with conventional PCA, but the missing value estimation is improved.
作者还状态,实时和预测的特征值之间的差异时观察小变大,因为它反映的信息缺乏准确确定真正的因素负荷量,从有限的,嘈杂的数据。因此,权重的因素,预测缺失值不作为与传统的主成分相同,但失踪的估计值提高。
BPCA works iteratively, the complexity is growing with O(n^3) because several matrix inversions are required. The size of the matrices to invert depends on the number of components used for re-estimation.
BPCA工作反复,复杂性越来越大O(n^3)因为需要几个矩阵倒置。反转矩阵的大小取决于使用的组件进行重新估计的数目。
Finding the optimal number of components for estimation is not a trivial task; the best choice depends on the internal structure of the data. A method called kEstimate is provided to estimate the optimal number of components via cross validation. In general few components are sufficient for reasonable estimation accuracy. See also the package documentation for further discussion about on what data PCA-based missing value estimation makes sense.
估计元件的最佳人数是不是一项简单的任务,最好的选择取决于对数据的内部结构。一个名为kEstimate的方法提供通过交叉验证组件的最佳估计。几个组件,一般是充分合理的估计精度。也看到包文件作进一步讨论有关数据基于PCA缺失值估计很有意义。
It is not recommended to use this function directely but rather to use the pca() wrapper function.
这是不建议使用此功能directely而是使用PCA()包装功能。
Details about the probabilistic model underlying BPCA are found in Oba et. al 2003. The algorithm uses an expectation maximation approach together with a Bayesian model to approximate the principal axes (eigenvectors of the covariance matrix in PCA). The estimation is done iteratively, the algorithm terminates if either the maximum number of iterations was reached or if the estimated increase in precision falls below 1e^-4.
关于概率模型基础BPCA的细节被发现在大羽等。人,2003年。该算法使用期望maximation方法与贝叶斯模型来近似(PCA协方差矩阵的特征向量)主轴。反复做估计,算法终止,如果其中一个最大迭代次数达到或如果在精确的估计增幅低于1e^-4。
Complexity: The relatively high complexity of the method is a result of several matrix inversions required in each step. Considering the case that the maximum number of iteration steps is needed, the approximate complexity is given by the term
复杂:该方法的复杂性相对较高,是一个需要在每一步中的几个矩阵倒置的结果。考虑到需要的情况下,最大迭代步数,近似的复杂性是由术语
Where row_miss is the number of rows containing missing values and O(n^3) is the complexity for inverting a matrix of size components. Components is the number of
其中row_miss是含有缺失值和O(n^3)是反相的大小components矩阵的复杂性的行的数目。组件是数
值----------Value----------
Standard PCA result object used by all PCA-based methods of this package. Contains scores, loadings, data mean and
使用这个包的所有基于PCA方法的标准PCA结果对象。包含分数,载荷,数据的意思,
注意----------Note----------
Requires MASS.
需要MASS。
作者(S)----------Author(s)----------
Wolfram Stacklies
参考文献----------References----------
Morito Monden, Ken-ichi Matsubara and Shin Ishii. A Bayesian missing value estimation method for gene expression profile
参见----------See Also----------
ppca, svdImpute, prcomp, nipalsPca, pca,
ppca,svdImpute,prcomp,nipalsPca,pca
举例----------Examples----------
data(metaboliteData)
## Perform Bayesian PCA with 2 components[#2组件PCA的贝叶斯]
pc <- pca(t(metaboliteData), method="bpca", nPcs=2)
## Get the estimated principal axes (loadings)[#获取估计主轴(负荷)]
loadings <- loadings(pc)
## Get the estimated scores[#获取的估计分数。]
scores <- scores(pc)
## Get the estimated complete observations[#获取估计完整的意见。]
cObs <- completeObs(pc)
## Now make a scores and loadings plot[#现在做出成绩和负荷图]
slplot(pc)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|