randomGLMpredictor(WGCNA)
randomGLMpredictor()所属R语言包:WGCNA
Random generalized linear model predictor
随机广义线性模型预测
译者:生物统计家园网 机器人LoveR
描述----------Description----------
An ensemble predictor based on bootstrap aggregation (bagging) of generalized linear models whose covariates are selected using forward stepwise regression according to AIC criteria.
(装袋),广义线性模型的协变量使用向前逐步回归选择根据AIC的标准,引导聚集的集成预测的基础上。
用法----------Usage----------
randomGLMpredictor(
x, y, xtest = NULL,
classify = TRUE,
nBags = 100,
replace = TRUE,
nObsInBag = if (replace) nrow(x) else as.integer(0.632 * nrow(x)),
nFeaturesInBag = ceiling(ifelse(ncol(x)<=10, ncol(x),
ifelse(ncol(x)<=300, (0.68-0.0016*ncol(x))*ncol(x), ncol(x)/5))),
nCandidateCovariates=50,
candidateCorFnc= cor,
candidateCorOptions = list(method = "pearson", use="p"),
mandatoryCovariates = NULL,
randomSeed = 12345,
verbose =1)
参数----------Arguments----------
参数:x
a matrix with rows correspond to observations and columns corresponding to features (covariates).
矩阵的行对应相应的功能(协变量)的意见和列。
参数:y
class outcome (factor variable) or quantitative outcome (numeric variable).
类结果(因子变量)或定量结果(数值型变量)。
参数:xtest
an optional matrix (whose columns correspond to those in x) which contain test (validation) data. The number of rows will typically be different from those in x.
一个可选的矩阵(其列对应那些在x)其中包含试验(验证)的数据。的行数将通常是从那些在x不同。
参数:classify
logical: should the response be treated as a binary variable (TRUE) or as a continuous variable (FALSE)?
的响应逻辑:应被视为一个二值变量(TRUE)或作为一个连续变量(FALSE)?
参数:nBags
number of bags in the ensemble predictor.
袋的合奏预测数。
参数:replace
logical which deteremines whether the observations for the bag (bootstrap data) are sampled with or without replacement. The function randomly select bagging observations with or without replacement.
逻辑,这deteremines的观测的袋(引导数据)的采样,带或不带更换。随机选择的功能或无需更换装袋观察。
参数:nObsInBag
number of observations selected for each bag. Typically, a bootstrap sample (bag) has the same number of observations as in the original observed data (i.e. the rows of x).
袋为每个选择的观测数。通常情况下,自举样品(袋)具有相同数量的观察,在观察到的原始数据(即行x)。
参数:nFeaturesInBag
number of features selected into each bag. Features are randomly selected without replacement.
选入每包功能。特点是随机选择的,无需更换。
参数:nCandidateCovariates
top number of features selected with highest absolute correlation with the outcome in individual bag. These features/covariates become the candidates for forward stepwise regression.
上面的数字功能,选择具有绝对的相关性最高的结果在个别包。这些功能/协变量向前逐步回归成为候选人。
参数:candidateCorFnc
the correlation function used to select candidate covariates. Either cor or bicor.
相关函数选择候选人的协变量。无论是COR或BICOR。
参数:candidateCorOptions
list of arguments to correlation function. If bicor is chosen for class outcome, make sure to include "robustY=F".
相关函数的参数列表。 ,如果BICOR类结果选择,让一定要"robustY=F"。
参数:mandatoryCovariates
indices of features that forced into all regression models across bags. As default, no feature is mandatory.
指数的功能,迫使所有的回归模型跨越袋。由于默认情况下,没有特色是强制性的。
参数:randomSeed
NULL or integer. The seed for the random number generator. If NULL, the seed will not be set. If non-NULL and the random generator has been initialized prior to the function call, the latter's state is saved and restored upon exit.
NULL或整数。随机数发生器的种子。如果为NULL,种子将不会被设置。如果非NULL和随机生成的函数调用之前被初始化,退出时的状态保存和恢复。
参数:verbose
integer which determines the level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
整数,确定详细的水平。零表示沉默,较高的值使输出越来越多,更详细。
Details
详细信息----------Details----------
The randomGLMpredictor function requires the R package MASS since it makes use of the function stepAIC. Basically, randomGLMpredictor first selects bootstrapping samples and features randomly for each bag, and then restricts the analysis to features that are highly correlated with the outcome. Prediction in each bag is made based on forward stepwise regression (logistic for binary outcomes, linear for quantitative outcomes). An overall prediction is obtained by averaging results from all bags. Generally, nCandidateCovariates>100 is not recommended, because the forward selection process is time-consuming. If "nBags=1, replace=F, nObsInBag=nrow(x)" is used, the function becomes a stepwise generalized linear model predictor without bagging.
randomGLMpredictor功能需要的R包MASS,因为它使使用的功能stepAIC。基本上,randomGLMpredictor第一选择为每个包中随机引导样品和功能,然后分析的结果高度相关的功能限制。在每个袋子预测的基础上向前逐步回归(后勤二元结果,定量结果呈线性关系)。整体预测是通过所有包的平均结果。一般地说,nCandidateCovariates> 100是不推荐的,因为正向选择过程是耗时的。如果"nBags=1, replace=F, nObsInBag=nrow(x)"使用,该功能成为一个逐步的广义线性模型预测的不套袋。
值----------Value----------
The function returns a list with the following components:
该函数返回一个列表,有以下组件:
参数:predictedOOB
the predicted classification of the input data based on out-of-bag samples. Only for binary outcomes.
基于所输入的数据的预测分类对满分袋样品。只有二元结果。
参数:predictedOOB.cont
In case of a binary outcome, this is the predicted probability of each outcome specified by y based on out-of-bag samples. In case of a continous outcome, this is the predicted value based on out-of-bag samples.
在一个二元结果的情况下,这是指定的每个结果的预测概率y袋样本的基础上。在一个连续的结果的情况下,这是满分袋样品的基础上的预测值。
参数:predictedTest
if test set is given, the predicted classification for test data. Only for binary outcomes.
如果测试集,测试数据的预测分类。只有二元结果。
参数:predictedTest.cont
if test set is given, the predicted probability of each outcome specified by y for test data for binary outcomes. In case of a continous outcome, this is the test set predicted value.
如果测试仪每个指定的y二元结果的测试数据的结果的预测概率。这是在一个连续的结果的情况下,预测值的测试集。
参数:bagObsIndx
a matrix with nBags rows and nObsInBag columns, giving the indices of observations selected for each bag.
矩阵nBags行nObsInBag列,给为每个包选择的观测指标。
参数:datSelectedAsCandidates
a (0,1) matrix with nBags rows and columns corresponding to features, indicating which features are selected as candidate regression covariates in each bag.
(0,1)矩阵nBags行和列对应的功能,哪些功能被选为候选人回归协变量在每个袋子。
参数:datSelectedByForwardRegression
a (0,1) matrix with nBags rows and columns corresponding to features, indicating which features/covariates are selected into the final regression model in each bag.
(0,1)矩阵nBags行和列对应的功能,哪些功能/协变量选入最终回归模型在每个袋子。
参数:datCoefOfForwardRegression
a matrix with nBags rows and columns corresponding to features, giving the final generalized linear model coefficients for features in each bag.
矩阵nBags行和列对应的功能,最终广义线性模型系数的功能在每个袋子。
参数:timesSelectedByForwardRegression
a variable importance measure, giving the times each feature is selected into final models in all bags.
选入袋最终模型中的所有变量的重要措施,给次,每次功能。
(作者)----------Author(s)----------
Lin Song
参考文献----------References----------
实例----------Examples----------
## binary outcome prediction[#二进制结果预测]
# data generation[数据生成]
data(iris)
iris=iris[1:100,]
iris$Species = as.factor(as.character(iris$Species))
set.seed(1)
indx=sample(100, 67, replace=FALSE)
alldat1=iris[indx, ]
alldat2=iris[-indx,]
dat1=alldat1[,-5]
y1=alldat1[,5]
dat2=alldat2[,-5]
y2=alldat2[,5]
# predict with a small number of bags - normally nBags should be at least 100.[预测与少量的袋 - ,通常nBags应至少为100。]
RGLM = randomGLMpredictor(dat1, y1, dat2, nCandidateCovariates=ncol(dat1), nBags=30)
y2predict = RGLM$predictedTest
table(y2predict, y2)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|