nearestCentroidPredictor(WGCNA)
nearestCentroidPredictor()所属R语言包:WGCNA
Nearest centroid predictor
最近的重心预测
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Nearest centroid predictor for binary (i.e., two-outcome) data. Implements a whole host of options and improvements such as accounting for within-class heterogeneity using sample networks, various ways of feature selection and weighing etc.
最近的质心二进制文件(即两个结果)数据的预测。实现整个主机的选择和改进,如会计类内的异质性样本网络,特征选择不同的方式和体重等
用法----------Usage----------
nearestCentroidPredictor(
# Input training and test data
x, y,
xtest = NULL,
# Feature weights and selection criteria
featureSignificance = NULL,
assocFnc = "cor", assocOptions = "use = 'p'",
assocCut.hi = NULL, assocCut.lo = NULL,
nFeatures.hi = 10, nFeatures.lo = 10,
weighFeaturesByAssociation = 0,
scaleFeatureMean = TRUE, scaleFeatureVar = TRUE,
# Predictor options
centroidMethod = c("mean", "eigensample"),
simFnc = "cor", simOptions = "use = 'p'",
useQuantile = NULL,
sampleWeights = NULL,
weighSimByPrediction = 0,
# What should be returned
CVfold = 0, returnFactor = FALSE,
# General options
randomSeed = 12345,
verbose = 2, indent = 0)
参数----------Arguments----------
参数:x
Training features (predictive variables). Each column corresponds to a feature and each row to an observation.
培训功能(预测变量)。每一列对应于一个功能和每一行的一个观察。
参数:y
The response variable. Can be a single vector or a matrix with arbitrary many columns. Number of rows (observations) must equal to the number of rows (observations) in x.
响应变量。可以是一个向量或矩阵与任意多个列。的行数(观测值)必须等于(观测值)x中的行数。
参数:xtest
Optional test set data. A matrix of the same number of columns (i.e., features) as x. If test set data are not given, only the prediction on training data will be returned.
选用的测试数据。 A矩阵的列数相同(即功能)x。如果没有给出测试数据集,训练数据的预测将被退回。
参数:featureSignificance
Optional vector of feature significance for the response variable. If given, it is used for feature selection (see details). Should preferably be signed, that is features can have high negative significance.
可选的特征向量的响应变量的意义。如果给定的,它是用于功能选择(见详情)。最好应签署,该功能可以有较高的负面意义。
参数:assocFnc
Character string specifying the association function. The association function should behave roughly as link{cor} in that it takes two arguments (a matrix and a vector) plus options and returns the vector of associations between the columns of the matrix and the vector. The associations may be signed (i.e., negative or positive).
字符串指定的关联函数。关联函数的行为应该大致为link{cor},它需要两个参数(一个矩阵和一个向量)附加选项和返回之间的关联的列的矩阵和矢量的矢量。该协会签署(即正或负)。
参数:assocOptions
Character string specifying options to the association function.
字符串指定选项联想功能。
参数:assocCut.hi
Association (or featureSignificance) threshold for including features in the predictor. Features with associtation higher than assocCut.hi will be included. If not given, the threshold method will not be used; instead, a fixed number of features will be included as specified by nFeatures.hi and nFeatures.lo.
协会(或featureSignificance)阈值的预测主要功能包括。特点associtation高于assocCut.hi将包括在内。如果没有给定的阈值方法将不被使用,而是一个固定数量的功能将被列为指定nFeatures.hi和nFeatures.lo。
参数:assocCut.lo
Association (or featureSignificance) threshold for including features in the predictor. Features with associtation lower than assocCut.lo will be included. If not given, defaults to -assocCut.hi. If assocCut.hi is NULL, the threshold method will not be used; instead, a fixed number of features will be included as specified by nFeatures.hi and nFeatures.lo.
协会(或featureSignificance)阈值的预测主要功能包括。特点associtation低于assocCut.lo将被纳入。如果没有给出,默认为-assocCut.hi。 assocCut.hi如果是NULL,阈值法将不会被使用,而是一个固定数量的功能将被列为指定nFeatures.hi和nFeatures.lo。
参数:nFeatures.hi
Number of highest-associated features (or features with highest featureSignificance) to include in the predictor. Only used if assocCut.hi is NULL.
最高相关联的功能(或功能与最高的featureSignificance)的数量以包括在预测。仅用于assocCut.hi是NULL。
参数:nFeatures.lo
Number of lowest-associated features (or features with highest featureSignificance) to include in the predictor. Only used if assocCut.hi is NULL.
最低相关的功能(或功能与最高的的featureSignificance)包括在预测。仅用于assocCut.hi是NULL。
参数:weighFeaturesByAssociation
(Optional) power to downweigh features that are less associated with the response. See details.
(可选)电源与响应downweigh功能。查看详细信息。
参数:scaleFeatureMean
Logical: should the training features be scaled to mean zero? Unless there are good reasons not to scale, the features should be scaled.
逻辑:应培训功能扩展到零均值吗?除非有很好的理由不向规模化,功能进行调整。
参数:scaleFeatureVar
Logical: should the training features be scaled to unit variance? Again, unless there are good reasons not to scale, the features should be scaled.
逻辑:应培训功能扩展到单位方差?同样,除非有很好的理由不向规模化,功能进行调整。
参数:centroidMethod
One of "mean" and "eigensample", specifies how the centroid should be calculated. "mean" takes the mean across all samples (or all samples within a sample module, if sample networks are used), whereas "eigensample" calculates the first principal component of the feature matrix and uses that as the centroid.
之一"mean"和"eigensample",指定的重心应如何计算。 "mean"需要在所有样品的平均值(或示例模块内的所有样本,如果样本网络被使用),而"eigensample"计算第一主成分的特征矩阵,并使用它作为重心。
参数:simFnc
Character string giving the similarity function for measuring the similarity between test samples and centroids. This function should behave roughly like the function cor in that it takes two arguments (x, y) and calculates the pair-wise similarities between columns of x and y. For convenience, the value "dist" is treated specially: the Euclidean distance between the columns of x and y is calculated and its negative is returned (so that smallest distance corresponds to highest similarity). Since values of this function are only used for ranking centroids, its values are not restricted to be positive or within certain bounds.
字符字符串,给出了相似的功能,为测量测试样本之间的相似性和质心。此功能大体类似功能的行为cor,它需要两个参数(x,y)和计算列x和<成对之间的相似性, X>。为了方便起见,的值y被特殊处理:返回"dist"和x计算及其负的列之间的欧几里德距离(使最小距离对应于最高的相似性)。由于该函数的值仅是用来排序的质心,并非限于它的值是正的或在一定范围内。
参数:simOptions
Character string specifying the options to the similarity function.
字符串指定选项的相似功能。
参数:useQuantile
If non-NULL, the "nearest quantiloid" will be used instead of the nearest centroid. See details.
如果非NULL,“最接近quantiloid”将被使用,而不是最近的形心。查看详细信息。
参数:sampleWeights
Optional specification of sample weights. Useful for example if one wants to explore boosting.
可选规格的样本权重。有用的,例如,如果一个人想探索提高。
参数:weighSimByPrediction
(Optional) power to downweigh features that are not well predicted between training and test sets. See details.
(可选)电源,以downweigh的功能没有得到很好的训练和测试集之间的预测。查看详细信息。
参数:CVfold
Non-negative integer specifying cross-validation. Zero means no cross-validation will be performed. values above zero specify the number of samples to be considered test data for each step of cross-validation.
非负整数,指定交叉验证。零表示没有将进行交叉验证。大于零的值指定要考虑的测试数据,为每个交叉验证步骤的样本的数目。
参数:returnFactor
Logical: should a factor be returned?
逻辑:应返回的一个因素吗?
参数:randomSeed
Integere specifying the seed for the random number generator. If NULL, the seed will not be set. See set.seed.
integere指定的随机数发生器的种子。如果NULL,种子不会被设置。见set.seed。
参数:verbose
Integer controling how verbose the diagnostic messages should be. Zero means silent.
整数操纵如何详细的诊断消息。零表示沉默。
参数:indent
Indentation for the diagnostic messages. Zero means no indentation, each unit adds two spaces.
缩进的诊断消息。零表示无压痕,每个单元增加两个空格。
Details
详细信息----------Details----------
Nearest centroid predictor works by forming a representative profile (centroid) across features for each class from the training data, then assigning each test sample to the class of the nearest representative profile. The representative profile can be formed either as mean or as athe first principal component ("eigensample"; this choice is governed by the option centroidMethod).
最近的质心的预测从训练数据中的每个类的功能之间形成一个代表配置文件(心),然后分配给每个测试样品之类的最有代表性的个人资料。代表的个人资料,可以形成任何作为的意思,或七成的第一主成分(“eigensample”的选项,这种选择是受centroidMethod“)。
When the number of features is large and only a small fraction is likely to be associated with the outcome, feature selection can be used to restrict the features that actually enter the centroid. Feature selection can be based either on their association with the outcome calculated from the training data using assocFnc, or on user-supplied feature significance (e.g., derived from literature, argument featureSignificance). In either case, features can be selected by high and low association tresholds or by taking a fixed number of highest- and lowest-associated features.
当功能的数目是大的,而只有一小部分是可能的结果相关联,可以使用特征选择限制的功能,实际输入的质心。特征选择可以基于他们从训练数据计算的结果与assocFnc,或用户提供的功能意义(例如,来自文学,参数featureSignificance)。在任一情况下,特征可以选择由高和低的的关联tresholds或以固定数目的最高和最低的相关的功能。
As an alternative to centroids, the predictor can also assign test samples based on a given quantile of the distances from the training samples in each class (argument useQuantile). This may be advantageous if the samples in each class form irregular clusters. Note that setting useQuantile=0 (i.e., using minimum distance in each class) essentially gives a nearest neighbor predictor: each test sample will be assigned to the class of its nearest training neighbor.
预测器到质心作为一种替代方法,也可以分配试验样品的基础上在每个类的训练样本的距离从一个给定的分位数(参数useQuantile)。这可能是有利的,如果在每个类形式不规则群的样品。请注意,设置useQuantile=0(即,使用在每个类中的最小距离)基本上给出了一个近邻预测器:每个测试样品将被分配给其最接近的培训邻居的类。
If features exhibit non-trivial correlations among themselves (such as, for example, in gene expression data), one can attempt to down-weigh features that do not exhibit the same correlation in the test set. This is done by using essentially the same predictor to predict _features_ from all other features in the test data (using the training data to train the feature predictor). Because test features are known, the prediction accuracy can be evaluated. If a feature is predicted badly (meaning the error in the test set is much larger than the error in the cross-validation prediction in training data), it may mean that its quality in the training or test data is low (for example, due to excessive noise or outliers). Such features can be downweighed using the argument weighByPrediction. The extra factor is min(1, (root mean square prediction error in test set)/(root mean square cross-validation prediction error in the trainig data)^weighByPrediction), that is it is never bigger than 1.
如果功能表现出非平凡的彼此之间的相关性(诸如,例如,在基因表达数据),一个可以尝试下权衡的功能,不表现出的相同的相关性的测试集。这是通过使用本质上是相同的预测器预测_features_在测试数据(使用训练数据来训练特征预测)从所有其他功能。由于测试功能是已知的,可以评估预测精度。如果某个特性预测严重(这意味着在测试集的错误是远远大于在训练数据中的交叉验证预测中的错误),则这可能意味着其在训练或测试数据的质量是低的(例如,由于过度的噪音或异常值)。这些功能可以使用参数weighByPredictiondownweighed。的额外因素是min(1,(均方根在测试集的预测误差)/(根均方预测误差在trainig数据交叉验证)^ weighByPrediction),那就是它永远不会大于1。
Unless the features' mean and variance can be ascribed clear meaning, the (training) features should be scaled to mean 0 and variance 1 before the centroids are formed.
除非功能的均值和方差可以归因于明确的含义,(培训)功能应扩展均值为0,方差为1的质心前形成。
The function implements a basic option for removal of spurious effects in the training and test data, by removng a fixed number of leading principal components from the features. This sometimes leads to better prediction accuracy but should be used with caution.
该函数实现通过removng固定数目的领先的主成分的功能的去除寄生在训练和测试数据的影响,一个基本的选项。这有时会导致较好的预测精度,但应谨慎使用。
If samples within each class are heterogenous, a single centroid may not represent each class well. This function can deal with within-class heterogeneity by clustering samples (separately in each class), then using a one representative (mean, eigensample) or quantile for each cluster in each class to assign test samples. Various similarity measures, specified by adjFnc, can be used to construct the sample network adjacency. Similarly, the user can specify a clustering function using clusteringFnc. The requirements on the clustering function are described in a separate section below.
如果在每个类的样本是异质性的,可能没有一个单一的质心以及代表每个类。此功能可以通过聚类样本(分别在每个类中),然后使用一个代表(意思是说,eigensample)或在每个类中位数为每个聚类分配测试样本与类内的异质性。各种相似性措施,规定的adjFnc,可用于构建示例网络邻接。同样,用户可以指定一个聚类功能,使用clusteringFnc。一个单独的一节中描述的聚类功能的要求。
值----------Value----------
A list with the following components:
以下组件列表:
参数:predicted
The back-substitution prediction in the training set.
在训练集的背面替代预测。
参数:predictedTest
Prediction in the test set.
在测试集的预测。
参数:featureSignificance
A vector of feature significance calculated by assocFnc or a copy of the input featureSignificance if the latter is non-NULL.
一个向量计算的功能意义的assocFnc或副本的输入featureSignificance如果是后者的非NULL。
参数:selectedFeatures
A vector giving the indices of the features that were selected for the predictor.
一个向量给指数的预测被选中的功能。
参数:centroidProfile
The representative profiles of each class (or cluster). Only returned in useQuntile is NULL.
每一类代表的配置文件(或聚类)。只返回useQuntile是NULL。
参数:testSample2centroidSimilarities
A matrix of calculated similarities between the test samples and class/cluster centroids.
矩阵计算测试样本之间的相似性和类/聚类中心。
参数:featureValidationWeights
A vector of validation weights (see Details) for the selected features. If weighFeaturesByValidation is 0, a unit vector is used and returned.
验证权重的向量(见详情)选定的功能。如果weighFeaturesByValidation为0,使用一个单位矢量,并返回。
参数:CVpredicted
Cross-validation prediction on the training data. Present only if CVfold is non-zero.
训练数据的交叉验证预测。目前只有CVfold是非零。
参数:sampleClusterLabels
A list with two components (one per class). Each component is a vector of sample cluster labels for samples in the class.
两部分组成的列表(每类)。每个组件是一个向量类的样本,样本聚类标签。
(作者)----------Author(s)----------
Peter Langfelder
参见----------See Also----------
votingLinearPredictor
votingLinearPredictor
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|