R语言 GSEAlm包 dfbetasPerGene()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 21:17:55

dfbetasPerGene(GSEAlm)
dfbetasPerGene()所属R语言包：GSEAlm

                                    Linear-Model Deletion Diagnostics for Gene Expression (or similar)
                                       线性模型的基因表达缺失诊断（或类似）

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This is an extension of standard linear-model diagnostics for use with gene-expression datasets, in which the same model was run simultaneously on each row of a response matrix.
这是一个标准的线性模型的诊断与基因表达数据集，在相同的模型响应矩阵的每一行上运行，同时使用的延伸。

用法----------Usage----------

dfbetasPerGene(lmobj)

CooksDPerGene(lmobj)

dffitsPerGene(lmobj)

Leverage(lmobj)

参数----------Arguments----------

参数：lmobj
An object produced by lmPerGene.
lmPerGene对象。

Details

详情----------Details----------

Deletion diagnostics gauge the influence of each observation upon model fit, by calculating values after removal of the observation and comparing to the complete-data version.
删除诊断评估模型拟合后各观测的影响，通过计算值去除后的观察和比较完整的数据版本。

DFFITS_i measures the distance on the response scale, between fitted values with and without observation y\_i, at point i. The distance is normalized by the regression standard error and the point's leverage (see below).
响应规模DFFITS_i测量距离，并没有观察Ÿ\ _i之间的拟合值，在点我。距离标准化回归的标准误差和点的杠杆作用（见下文）。

Cook's D_i is the square of the distance, in parameter space, between parameter estimates witn and without observation y\_i, normalized and rescaled by standard errors and by a factor depending upon leverage.
库克的D_i是，在参数空间之间，参数估计witn及没有观察Ÿ\ _i，规范和标准错误重新调整和利用的因素后，根据距离的平方。

DFBETAS_{i,j} breaks the square root of Cook's D into its Euclidean components for each parameter j - but uses a somewhat different scaling function from Cook's D.
DFBETAS_{i,j}分解到每个参数J欧几里德组件的平方根库克的D  - 但是从库克D.使用有所不同的缩放功能

The leverage is the diagonal of the "hat matrix" X'(X'X)^{-1}X'. This measure provides the relative weight of observation y\_i in the fitted value y-hat\_i. Typically observations with extreme X values (or belonging to smaller groups if model variables are categorical) will have high leverage.
杠杆是对角线“的帽子矩阵”X'(X'X)^{-1}X'。这项措施规定在拟合值Y-帽子\ _i观察Ÿ\ _i的相对权重。通常情况下，极端的X值（或属于较小的群体，如果模型的变量是分类）的意见，将有高杠杆。

All these functions exist for standard regression, see influence.measures.
所有这些功能存在标准的回归，看到influence.measures。

The functions described here are extensions for the case in which the response is a matrix, and the same linear model is run on each row separately.
这里介绍的功能扩展的情况下，在该反应是一个矩阵，每一行上单独运行相同的线性模型。

For more details, see the references below.
有关详细信息，请参阅下面的参考资料。

All functions are implemented in matrix form, which means they run quite fast.
所有的功能都实现在矩阵形式，这意味着它们运行相当快。

值----------Value----------

dfbetasPerGene A G x n x p array, where G, n are the number of rows and columns in the input's expression matrix, respectively, and p the number of parameters in the linear model (including intercept)
dfbetasPerGene的公司xnxp阵列，其中G，n是输入的表达矩阵的行数和列数，分别与p线性模型中的参数的数量（包括截距）

CooksDPerGene A G x n matrix.
CooksDPerGene一个G x N的矩阵。

dffitsPerGene A G x n matrix.
dffitsPerGene一个G x N的矩阵。

Leverage A vector of length n, corresponding to the diagonal of the "hat matrix".
Leverage一个向量“的帽子矩阵对角线长度为n，对应。

注意----------Note----------

The commonly-cited reference alert thresholds for diagnostic measures such as Cook's $D$ and DFBETAS, found in older references, appear to be out of date. See LaMotte (1999) and Jensen (2001) for a more recent discussion. Our suggested practice is to
常用引参考文献的发现，在年龄较大的参考诊断措施，如库克$ d $和DFBETAS警报阈值，似乎是过时的。看到一个更近的讨论，LaMotte（1999年）和Jensen（2001）。我们建议的做法是

作者（S）----------Author(s)----------

Robert Gentleman, Assaf Oron

参考文献----------References----------

Diagnostics. New York: Wiley.
Regression. London: Chapman and Hall.
the deviance and single case deletions. Applied Statistics *36*, 181-191.
Methods. Sage.
regression analysis. Metrika 50, 109-119.
regression. Statistics and Probability Letters 51, 377-388.

参见----------See Also----------

influence.measures for the analogous simple
influence.measures类似于简单

举例----------Examples----------

data(sample.ExpressionSet)
layout(1)
lm1 = lmPerGene( sample.ExpressionSet,~score+type)
CD = CooksDPerGene(lm1)
### How does the distribution of mean Cook's distances across samples look?[＃如何库克的平均距离跨样本分布看？]

boxplot(log2(CD) ~ col(CD),names=colnames(CD),ylab="Log Cook's
Distance",xlab="Sample")
### There are a few gross individual-observation outliers (which is why we plot on the log[＃有几毛的个人观察离群（这就是为什么我们对log的图]
### scale), but otherwise no single sample pops out as problematic. Here's[＃规模），但除此之外，没有任何单一样本弹出作为问题。这里是]
### one commonly-used alert level for problems:[＃＃一个常用的警戒级别问题：]
lines(c(-5,30),rep(log2(2/sqrt(26)),2),col=2)

DFB = dfbetasPerGene(lm1)

### Looking for simultaneous two-effect outliers - 500 genes times 26[＃＃寻找两个效果同时离群 -  500基因时代26]
### samples makes 13000 data points on this plot[＃样品，使该图上的13000个数据点]

plot(DFB[,,2],DFB[,,3],main="DFBETAS for Score and Type (all genes)",xlab="Score Effect
Offset (normalized units)",ylab="Type Effect Offset (normalized units)",pch='+',cex=.5)
lines(c(-100,100),rep(0,2),col=2)
lines(rep(0,2),c(-100,100),col=2)

DFF = dffitsPerGene(lm1)
summary(apply(DFF,2,mean))

Lev = Leverage(lm1)
table(Lev)
### should have only two unique values because this is a dichotomous one-factor model[＃应该只有两个独特的价值，因为这是一个二分之一因子模型]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册