chisq.stat(siggenes)
chisq.stat()所属R语言包:siggenes
SAM Analysis for Categorical Data
SAM分析分类数据
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Generates the required statistics for a Significance Analysis of Microarrays of categorical data such as SNP data.
生成意义的分类数据,如SNP数据的微阵列分析所需的统计资料。
Should not be called directly, but via sam(..., method = chisq.stat).
不应该被称为直接,但通过SAM(...,方法= chisq.stat)。
Replaces cat.stat
取代cat.stat
用法----------Usage----------
chisq.stat(data, cl, approx = NULL, B = 100, n.split = 1,
check.for.NN = FALSE, lev = NULL, B.more = 0.1,
B.max = 50000, n.subset = 10, rand = NA)
参数----------Arguments----------
参数:data
a matrix, data frame, or list. If a matrix or data frame, then each row must correspond to a variable (e.g., a SNP), and each column to a sample (i.e.\ an observation). If the number of observations is huge it is better to specify data as a list consisting of matrices, where each matrix represents one group and summarizes how many observations in this group show which level at which variable. These matrices can be generated using the function rowTables from the package scrime. For details on how to specify this list, see the examples section on this man page, and the help for rowChisqMultiClass in the package scrime.
矩阵,数据框,或列表。如果一个矩阵或数据框,然后每行必须对应一个变量(例如,一个SNP),每个样品的列(即\观察)。如果观测的数量是巨大的,最好是指定data作为一个列表组成的矩阵,每个矩阵代表一组,并总结在这组节目多少观测水平的变量。这些矩阵可以使用的功能包rowTablesscrime产生。有关如何指定此列表的详细信息,看到这名男子页面上的“一节的例子,帮助rowChisqMultiClass包scrime。
参数:cl
a numeric vector of length ncol(data) indicating to which class a sample belongs. Must consist of the integers between 1 and c, where c is the number of different groups. Needs only to be specified if data is a matrix or a data frame.
数值向量的长度ncol(data)表明,样品属于哪一类。必须由整数1至c,其中c是不同群体的数量。需要只被指定data如果是一个矩阵或一个数据框。
参数:approx
should the null distribution be approximated by a ChiSquare-distribution? Currently only available if data is a matrix or data frame. If not specified, approx = FALSE is used, and the null distribution is estimated by employing a permutation method.
空分布近似ChiSquare分布?目前仅当data是一个矩阵或数据框。如果没有指定,approx = FALSE使用,空分布的估计,采用置换的方法。
参数:B
the number of permutations used in the estimation of the null distribution, and hence, in the computation of the expected d-values.
排列在空分布的估计使用人数,因此,在预期d值的计算。
参数:n.split
number of chunks in which the variables are splitted in the computation of the values of the test statistic. Currently, only available if approx = TRUE and data is a matrix or data frame. By default, the test scores of all variables are calculated simultaneously. If the number of variables or observations is large, setting n.split to a larger value than 1 can help to avoid memory problems.
在分拆的检验统计量的值计算变量的块数。目前,仅在approx = TRUE和data是一个矩阵或数据框。默认情况下,所有变量的测试成绩,同时计算。如果变量或意见的数量大,设置n.split一个大于1的值,可以帮助避免内存问题。
参数:check.for.NN
if TRUE, it will be checked if any of the genotypes is equal to "NN". Can be very time-consuming when the data set is high-dimensional.
如果TRUE,它会进行检查,如果有任何的基因型是平等的“神经网络”。高维数据集时可以非常费时。
参数:lev
numeric or character vector specifying the codings of the levels of the variables/SNPs. Can only be specified if data is a matrix or a data frame. Must only be specified if the variables are not coded by the integers between 1 and the number of levels. Can also be a list. In this case, each element of this list must be a numeric or character vector specifying the codings, where all elements must have the same length.
指定的变量/单核苷酸多态性的水平编码中的数字或字符向量。如果data是一个矩阵或一个数据框只能被指定。如果变量不是由1和层次之间的整数编码必须只指定。也可以是一个列表。在这种情况下,这个名单中的每个元素必须是一个数字或字符向量指定编码中,所有元素都必须具有相同的长度。
参数:B.more
a numeric value. If the number of all possible permutations is smaller than or equal to (1+B.more)*B, full permutation will be done. Otherwise, B permutations are used.
一个数值。如果所有可能的排列数小于或等于(1 +B.more)*B,全置换将完成。否则,使用B排列。
参数:B.max
a numeric value. If the number of all possible permutations is smaller than or equal to B.max, B randomly selected permutations will be used in the computation of the null distribution. Otherwise, B random draws of the group labels are used.
一个数值。如果所有可能的排列数小于或等于B.max的,B随机选择的排列将在空分布的计算。否则,B随机组标签提请使用。
参数:n.subset
a numeric value indicating how many permutations are considered simultaneously when computing the expected d-values.
一个数值,表示多少排列计算预期d值时,同时考虑。
参数:rand
numeric value. If specified, i.e. not NA, the random number generator will be set into a reproducible state.
数值。如果指定,即不NA,随机数发生器将被设置成一个可重复的状态。
Details
详情----------Details----------
For each SNP (or more general, categorical variable), Pearson's Chi-Square statistic is computed to test if the distribution of the SNP differs between several groups. Since only one null distribution is estimated for all SNPs as proposed in the original SAM procedure of Tusher et al. (2001) all SNPs must have the same number of levels/categories.
对于每个SNP(或更一般,分类变量),Pearson的卡方统计计算测试几组之间的SNP的分布不同。由于只有一个空分布估计所有的SNPs在原来的SAM Tusher等程序提出。 (2001)所有的SNPs必须有相同数量的级别/类别。
值----------Value----------
A list containing statistics required by sam.
列表包含sam需要的统计数据。
警告----------Warning----------
This procedure will only work correctly if all SNPs/variables have the same number of levels/categories. Therefore, it is stopped when the number of levels differ between the variables.
此过程将正常工作,如果所有的SNPs /变量有相同数量的级别/类别。因此,它停止时的水平变量之间的不同。
作者(S)----------Author(s)----------
Holger Schwender, <a href="mailto:holger.schw@gmx.de">holger.schw@gmx.de</a>
参考文献----------References----------
and PAM for SNPs. In Weihs, C. and Gaul, W. (eds.), Classification – The Ubiquitous Challenge. Springer, Heidelberg, 370-377.
applied to the ionizing radiation response. PNAS, 98, 5116-5121.
参见----------See Also----------
SAM-class,sam, chisq.ebam, trend.stat
SAM-class,sam,chisq.ebam,trend.stat
举例----------Examples----------
# Generate a random 1000 x 40 matrix consisting of the values[生成一个随机的1000×40的值组成的矩阵]
# 1, 2, and 3, and representing 1000 variables and 40 observations.[1,2,3,相当于1000个变量和40意见。]
mat <- matrix(sample(3, 40000, TRUE), 1000)
# Assume that the first 20 observations are cases, and the[假设前20个观测情况下,和]
# remaining 20 are controls.[其余20控制。]
cl <- rep(1:2, e=20)
# Then an SAM analysis for categorical data can be done by[然后可以通过SAM的分类数据分析]
out <- sam(mat, cl, method=chisq.stat, approx=TRUE)
out
# approx is set to TRUE to approximate the null distribution[约设置为TRUE,以近似的空分布]
# by the ChiSquare-distribution (usually, for such a small[由卡方分布(通常情况下,这样一个小]
# number of observations this might not be a good idea[的若干意见,这可能不是一个好主意]
# as the assumptions behind this approximation might not[这种近似背后的假设可能不]
# be fulfilled).[履行)。]
# The same results can also be obtained by employing[也可以得到相同的结果,由用人]
# contingency tables, i.e. by specifying data as a list.[应急表,即由一个列表指定的数据。]
# For this, we need to generate the tables summarizing[对于这一点,我们需要生成的表总结]
# groupwise how many observations show which level at[GroupWise的观测表明多少级]
# which variable. These tables can be obtained by[哪些变量。这些表可以得到]
library(scrime)
cases <- rowTables(mat[, cl==1])
controls <- rowTables(mat[, cl==2])
ltabs <- list(cases, controls)
# And the same SAM analysis as above can then be [与上述相同的SAM分析,然后可以]
# performed by [执行由]
out2 <- sam(ltabs, method=chisq.stat, approx=TRUE)
out2
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|