查看: 1871|回复: 0

R语言 vsn包 vsn.old()函数中文帮助文档(中英文对照)

发表于 2012-2-26 16:01:02 | 显示全部楼层 |阅读模式

                                        Variance stabilization and calibration for microarray data.

                                         译者:生物统计家园网 机器人LoveR


Robust estimation of variance-stabilizing and calibrating  transformations for microarray data. This function has been superseded by vsn2. The function vsn remains in the package for backward


    lts.quantile = 0.5,
    verbose      = interactive(),
    niter        = 10,
    cvg.check    = NULL,
    describe.preprocessing = TRUE,


An object that contains intensity values from a microarray experiment. The intensities are assumed to be the raw scanner data, summarized over the spots by an image analysis program, and possibly "background subtracted". The intensities must not be logarithmically or otherwise transformed, and not thresholded or "floored". NAs are not accepted. See details.

Numeric. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1. A value of 1 corresponds to ordinary least sum of squares regression.

Logical. If TRUE, some messages are printed.

Integer. The number of iterations to be used in the least trimmed sum of squares regression.

List. If non-NULL, this allows finer control of the iterative least trimmed sum of squares regression. See details.

Array. If not missing, user can specify start values for the iterative parameter estimation algorithm. See  vsnh for details.

Logical. If TRUE, calibration and transformation parameters, plus some other information are stored in the preprocessing slot of the returned object. See details.

Integer. If specified, the model parameters are estimated from a subsample of the data only, the transformation is then applied to all data. This can be useful for performance reasons.

Integer vector. Its length must be the same as nrow(intensities). This parameter allows for the calibration and error model parameters to be stratified within each array, e.g to take into account probe  sequence properties, print-tip or plate effects.   If strata is not specified, one pair of parameters is fitted for every sample (i.e. for every column of intensities). If strata is specified, a pair of parameters is fitted for every  stratum within every sample. The strata are coded for by the different integer values. The integer vector strata can be obtained from a factor fac through as.integer(fac), from a character vector str through as.integer(factor(fac)).



Overview:  The function calibrates for sample-to-sample variations through shifting and scaling, and transforms the intensities to a scale where the variance is approximately independent of the mean intensity. The variance stabilizing transformation is equivalent to the natural logarithm in the high-intensity range, and to a linear transformation in the low-intensity range. In an intermediate range, the arsinh function interpolates smoothly between the two. For details on the transformation, please see the help page for vsnh. The parameters are estimated through a robust variant of maximum likelihood. This assumes that for the majority of genes the expression levels are not much different across the samples, i.e., that only a minority of genes (less than a fraction 1-lts.quantile) is differentially expressed.

Even if most genes on an array are differentially expressed, it may still be possible to use the estimator: if a set of non-differentially expressed genes is known, e.g. because they are external controls or reliable 'house-keeping genes', the transformation parameters can be fitted with vsn from the data of these genes, then the transformation can be applied to all data with vsnh.

Format: The format of the matrix of intensities is as follows: for the two-color printed array technology, each row corresponds to one spot, and the columns to the different arrays and wave-lengths (usually red and green, but could be any number). For example, if there are 10 arrays, the matrix would have 20 columns, columns 1...10 containing the green intensities, and 11...20 the red ones. In fact, the ordering of the columns does not matter to vsn, but it is your responsibility to keep track of it for subsequent analyses. For one-color arrays, each row corresponds to a probe, and each column to an array.
格式:强度矩阵的格式如下:两色印刷阵列技术,每一行对应到一个地方,和不同的阵列和波长度(通常是红色和绿色的列,但可以是任何数)。例如,如果有10个阵列,矩阵将有20列,列1 ... 10包含绿色的强度,11 ... 20红色的。事实上,列的顺序并不重要到vsn,但它是你的责任,以保持它的后续分析的轨道。对于一个颜色数组,每一行对应一个探针,每到一个数组的列。

Performance: This function is slow. That is due to the nested iteration loops of the numerical optimization of the likelihood function and the heuristic that identifies the non-outlying data points in the least trimmed squares regression. For large arrays with many tens of thousands of probes, you may want to consider random subsetting: that is, only use a subset of the e.g. 10-20,000 rows of the data matrix intensities to fit the parameters, then apply the transformation to all the data, using vsnh. An example for this can be seen in the function normalize.AffyBatch.vsn, whose code you can inspect by typing normalize.AffyBatch.vsn on the R command line.

Iteration control:  By default, if cvg.check is NULL, the function will run the fixed number niter of iterations in the least trimmed sum of squares regression. More fine-grained control can be obtained by passing a list with elements eps and n. If the maximum change between transformed data values is smaller than eps for n subsequent iterations, then the iteration terminates.

Estimated transformation parameters:  If describe.preprocessing is TRUE, the transformation parameters are returned in the preprocessing slot of the experimentData slot of the resulting  ExpressionSet object, in the form  of a list with three elements

vsnParams: the parameter array (see vsnh  for details)

vsnParamsIter: an array with dimensions  c(dim(vsnParams, niter)) that contains the parameter  trajectory during the iterative fit process (see also  vsnPlotPar).
与尺寸vsnParamsIter:的阵列c(dim(vsnParams, niter))包含参数的轨迹,在迭代拟合过程(也见vsnPlotPar)。

vsnTrimSelection: a logical vector that for each row of the intensities matrix reports whether it was below (TRUE) or above (FALSE) the trimming threshold.

If intensities has class ExpressionSet,  and its experimentData slot has class MIAME, then this list is appended to any existing entries in the preprocessing slot. Otherwise, the experimentData object and its preprocessing slot are created.  


An object of class ExpressionSet. Differences between the columns of the transformed intensities are  "generalized log-ratios", which are shrinkage estimators of the natural logarithm of the fold change. For the transformation parameters, please see the Details.


Wolfgang Huber


calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.
of microarray data,  Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann,  Annemarie Poustka, and Martin Vingron;   Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3.

参见----------See Also----------

vsnh, vsnPlotPar,  ExpressionSet-class,  MIAME-class,


log.na = function(x) log(ifelse(x>0, x, NA))

plot(log.na(exprs(kidney)), pch=".", main="log-log")

vsnkid = vsn(kidney)   ## transform and calibrate[#转换和校准]
plot(exprs(vsnkid), pch=".", main="h-h")

## this should always hold true[#这应该始终坚持真]
params = preproc(description(vsnkid))$vsnParams
stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid)))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


使用道具 举报

您需要登录后才可以回帖 登录 | 注册


手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-22 18:49 , Processed in 0.020210 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表