samr(samr)
samr()所属R语言包:samr
Significance analysis of microarrays
显着性分析的基因芯片
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Correlates a large number of features (eg genes) with an outcome variable, such as a group indicator, quantitative variable or survival time. NOTE: for most users, the interface function SAM— which calls samr– will be more convenient for array data, and the interface function SAMseq– which also calls samr– will be more convenient for sequencing data.
关联了大量的功能(如基因)与结果变量,比如一组指标,定量变量或生存时间。注意:对于大多数用户来说,接口功能SAM-调用SAMR将是更方便的数组数据,和接口功能SAMseq还要求SAMR-测序数据将更加方便。
用法----------Usage----------
samr(data, resp.type=c("Quantitative","Two class unpaired",
"Survival","Multiclass", "One class", "Two class paired",
"Two class unpaired timecourse", "One class timecourse",
"Two class paired timecourse", "Pattern discovery"),
assay.type=c("array","seq"), s0=NULL, s0.perc=NULL, nperms=100,
center.arrays=FALSE, testStatistic=c("standard","wilcoxon"),
time.summary.type=c("slope","signed.area"),
regression.method=c("standard","ranks"), return.x=FALSE,
knn.neighbors=10, random.seed=NULL, nresamp=20,nresamp.perm=NULL,
xl.mode=c("regular","firsttime","next20","lasttime"),
xl.time=NULL, xl.prevfit=NULL)
参数----------Arguments----------
参数:data
Data object with components x- p by n matrix of features, one observation per column (missing values allowed); y- n-vector of outcome measurements; censoring.status- n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome
数据对象的组件X-P×n矩阵的功能,观察每列(缺失值),Y-N-矢量结果测量; censoring.status-N-矢量的审查censoring.status(1 =死亡或事件发生,0 =活了下来,审查或事件),需要审查的生存结果
参数:resp.type
Problem type: "Quantitative" for a continuous parameter (Available for both array and sequencing data); "Two class unpaired" (for both array and sequencing data); "Survival" for censored survival outcome (for both array and sequencing data); "Multiclass": more than 2 groups (for both array and sequencing data); "One class" for a single group (only for array data); "Two class paired" for two classes with paired observations (for both array and sequencing data); "Two class unpaired timecourse" (only for array data), "One class time course" (only for array data), "Two class.paired timecourse" (only for array data), or "Pattern discovery" (only for array data)
问题类型:“量化”为一个连续的参数(阵列和测序数据);“两课不成对的”(两个数组和序列化数据),“生存”删失的生存结果(两个数组和序列化数据); “多用户”:2组以上(两个数组和序列化数据);“一类”为一个组(仅适用于数组数据);“两课配对”成对观测值的两班(两个数组和序列数据),“两班不成对的时间过程”(仅适用于阵列的数据),“一类的时间过程”(仅适用于数组数据),“两个class.paired的时间过程”(仅适用于数组数据),或“模式发现”(仅适用于阵列数据)
参数:assay.type
Assay type: "array" for microarray data, "seq" for counts from sequencing
检测类型:微阵列数据的“阵列”,“起”计数排序
参数:s0
Exchangeability factor for denominator of test statistic; Default is automatic choice. Only used for array data.
互换性检验统计量分母因素,默认是自动的选择。仅用于阵列的数据。
参数:s0.perc
Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values. Only used for array data.
百分使用的标准偏差值S0;默认为自动选择,-1表示S0 = 0(不同的从s0.perc = 0,S0 =第0个百分位数的SD值的标准偏差值=分钟。仅用于阵列数据。
参数:nperms
Number of permutations used to estimate false discovery rates
估计假发现率的排列数
参数:center.arrays
Should the data for each sample (array) be median centered at the outset? Default =FALSE. Only used for array data.,
如果对每个样品的数据(阵列)是在开始时的中位数为中心的?默认值= FALSE。仅用于阵列的数据。
参数:testStatistic
Test statistic to use in two class unpaired case.Either "standard" (t-statistic) or ,"wilcoxon" (Two-sample wilcoxon or Mann-Whitney test). Only used for array data.
使用两个类未配对的case.Either“标准”(t统计量)或“魏氏”(样本Wilcoxon或Mann-Whitney检验)检验统计量。仅用于阵列的数据。
参数:time.summary.type
Summary measure for each time course: "slope", or "signed.area"). Only used for array data.
每个时间过程:“斜率”,或“signed.area”)简要措施。仅用于阵列的数据。
参数:regression.method
Regression method for quantitative case: "standard", (linear least squares) or "ranks" (linear least squares on ranked data). Only used for array data.
回归方法进行定量的情况下:“标准”,(非线性最小二乘)或“行列”(线性最小二乘排名数据)。仅用于阵列的数据。
参数:return.x
Should the matrix of feature values be returned? Only useful for time course data, where x contains summaries of the features over time. Otherwise x is the same as the input data data\$x
矩阵的特征值回来了吗?只适用于时间过程,其中x包含摘要的功能随着时间的推移。否则,x是相同的数据作为输入数据\ $ x的
参数:knn.neighbors
Number of nearest neighbors to use for imputation of missing features values. Only used for array data.
最近的邻居,用于归集缺少的特性值的数目。仅用于阵列的数据。
参数:random.seed
Optional initial seed for random number generator (integer)
可选的初始种子的随机数发生器(整数)
参数:nresamp
For assay.type="seq", number of resamples used to construct test statistic. Default 20. Only used for sequencing data.
对于assay.type =“序列”,用于构造检验统计量的重采样的数量。默认为20。仅用于测序数据。
参数:nresamp.perm
For assay.type="seq", number of resamples used to construct test statistic for permutations. Default is equal to nresamp and it must be at most nresamp. Only used for sequencing data.
对于assay.type =“序列”,用于构造检验统计量进行排列的重新采样的数量。默认值是等于nresamp并它必须至多nresamp。仅用于测序数据。
参数:xl.mode
Used by Excel interface
使用Excel界面
参数:xl.time
Used by Excel interface
使用Excel界面
参数:xl.prevfit
Used by Excel interface
使用Excel界面
Details
详细信息----------Details----------
Carries out a SAM analysis. Applicable to microarray data, sequencing data, and other data with a large number of features. This is the R package that is called by the "official" SAM Excel package v2.0. The format of the response vector y and the calling sequence is illustrated in the examples below. A more complete description is given in the SAM manual
一个SAM分析。适用于具有大量功能微阵列数据,测序数据和其他数据。这是R包,被称为SAM Excel中的“官方”包V2.0。在下面的实施例中示出的格式的响应矢量y和调用序列。更完整的说明中给出的SAM手册
值----------Value----------
A list with components <table summary="R valueblock"> <tr valign="top"><td>n</td> <td> Number of observations</td></tr> <tr valign="top"><td>x</td> <td> Data matrix p by n (p=\# genes or features). Equal to the matrix data\$x in the original call to samr except for (1) time course analysis, where is contains the summarized data or (2) quantitative outcome with rank regression, where it contains the data transformed to ranks. Hence it is null except for in time course analysis. </td></tr> <tr valign="top"><td>y</td> <td> Vector of n outcome values. equal the values data\$y in the original call to samr, except for (1) time course analysis, where is contains the summarized y or (2) quantitative outcome with rank regression, where it contains the y values transformed to ranks</td></tr> <tr valign="top"><td>argy</td> <td> The values data\$y in the original call to samr</td></tr> <tr valign="top"><td>censoring.status</td> <td> Censoring status indicators if applicable</td></tr> <tr valign="top"><td>testStatistic</td> <td> Test Statistic used</td></tr></table> , <table summary="R valueblock"> <tr valign="top"><td>nperms</td> <td> Number of permutations requested</td></tr> <tr valign="top"><td>nperms.act</td> <td> Number of permutations actually used. Will be < nperms when \# of possible permutations <= nperms (in which case all permutations are done)</td></tr> <tr valign="top"><td>tt</td> <td> tt=numer/sd, the vector of p test statistics for original data</td></tr> <tr valign="top"><td>numer</td> <td> Numerators for tt</td></tr> <tr valign="top"><td>sd</td> <td> Denominators for tt. Equal to standard deviation for feature plus s0</td></tr> <tr valign="top"><td>s0</td> <td> Computed exchangeability factor</td></tr> <tr valign="top"><td>s0.perc</td> <td> Computed percentile of standard deviation values. s0= s0.perc percentile of the gene standard deviations</td></tr> <tr valign="top"><td>eva</td> <td> p-vector of expected values for tt under permutation sampling</td></tr> <tr valign="top"><td>perms</td> <td> nperms.act by n matrix of permutations used. Each row is a permutation of 1,2...n</td></tr> <tr valign="top"><td>permsy</td> <td> nperms.act by n matrix of permutations used. Each row is a permutation of y1,y2,...yn. Only one of perms or permys is non-Null, depending on resp.type</td></tr> <tr valign="top"><td>all.perms.flag</td> <td> Were all possible permutations used?</td></tr> <tr valign="top"><td>ttstar</td> <td> p by nperms.aca matrix t of test statistics from permuted data. Each column if sorted in descending order</td></tr> <tr valign="top"><td>ttstar0</td> <td> p by nperms.act matrix of test statistics from permuted data. Columns are in order of data</td></tr> <tr valign="top"><td>eigengene.number</td> <td> The number of the eigengene (eg 1,2,..) that was requested for Pattern discovery</td></tr> <tr valign="top"><td>eigengene</td> <td> Computed eigengene</td></tr> <tr valign="top"><td>pi0</td> <td> Estimated proportion of non-null features (genes)</td></tr> <tr valign="top"><td>foldchange</td> <td> p-vector of foldchanges for original data</td></tr> <tr valign="top"><td>foldchange.star</td> <td> p by nperms.act matrix estimated foldchanges from permuted data</td></tr> <tr valign="top"><td>sdstar.keep</td> <td> n by nperms.act matrix of standard deviations from each permutation</td></tr> <tr valign="top"><td>censoring.status.star.keep</td> <td> n by nperms.act matrix of censoring.status indicators from each permutation</td></tr> <tr valign="top"><td>resp.type</td> <td> The response type used. Same as resp.type.arg, except for time course data, where time data is summarized and then treated as non-time course. Eg if resp.type.arg="oneclass.timecourse" then resp.type="oneclass"</td></tr> <tr valign="top"><td>resp.type.arg</td> <td> The response type requested in the call to samr</td></tr> <tr valign="top"><td>stand.contrasts</td> <td> For multiclass data, p by nclass matrix of standardized differences between the class mean and the overall mean</td></tr> <tr valign="top"><td>stand.contrasts.star</td> <td> For multiclass data, p by nclass by nperms.act array of standardized contrasts for permuted datasets</td></tr> <tr valign="top"><td>stand.contrasts.95</td> <td> For multiclass data, 2.5 of standardized contrasts. Useful for determining which class contrast for significant genes, are large</td></tr> <tr valign="top"><td>depth</td> <td> For array.type="seq", estimated sequencing depth for each sample.</td></tr> <tr valign="top"><td>call</td> <td> calling sequence</td></tr> </table>
组件的列表<table summary="R valueblock"> <tr valign="top"> <TD>n </ TD> <TD>的观测数</ TD> </ TR> <TR VALIGN =“”> <TD> x </ TD> <TD>数据矩阵P由n(P = \#基因或特征)。等于矩阵数据\ $ x在原来的呼叫SAMR除了(1)的时间历程分析,其中包含汇总数据或(2)定量与秩回归的结果,它包含的数据转化成行列。因此,它是空的,除了在时间过程分析。 </ TD> </ TR> <tr valign="top"> <TD>y</ TD> <TD>向量n的结果值。值数据\ $ Y等于在原来的调用SAMR,除了(1)的时间历程分析,其中包含汇总y或(2)定量与秩回归的结果,它的y值转换为等级</ TD> </ TR> <tr valign="top"> <TD> argy </ TD> <TD>值数据\ $ y在原来的呼叫SAMR </ TD> </ TR> < TR VALIGN =“”> <TD>censoring.status </ TD> <TD>截尾状态指示灯(如适用)</ TD> </ TR> <tr valign="top"> <TD>testStatistic </ TD> <TD>检验统计量</ TD> </ TR> </ TABLE>,<表summary="R valueblock"> <tr valign="top"> <TD>nperms <TD>号码的排列要求/ TD> </ TD> </ TR> <tr valign="top"> <TD>nperms.act</ TD> <TD>实际使用的号码排列。的将<nperms \#可能的排列<= nperms的(在这种情况下,所有的排列组合)</ TD> </ TR> <tr valign="top"> <TD>tt</ TD> <TD> TT =大量/ SD,矢量原始数据的检验统计量的p </ TD> </ TR> <tr valign="top"> <TD>numer </ TD> <TD >分子TT </ TD> </ TR> <tr valign="top"> <TD>sd </ TD> <TD>分母TT。等于标准偏差为功能加上S0 </ TD> </ TR> <tr valign="top"> <TD> s0 </ TD> <TD>计算可交换性因素</ TD> </ TR> <tr valign="top"> <TD> s0.perc </ TD> <TD>计算百分位的标准偏差值。 S0 = s0.perc百分的基因标准差</ TD> </ TR> <tr valign="top"> <TD>eva </ TD> <TD> P-矢量预期值的TT根据排列抽样</ TD> </ TR> <tr valign="top"> <TD>perms </ TD> <TD> nperms.act×n矩阵的排列使用。每一行是一个排列1,2,...,N </ TD> </ TR> <tr valign="top"> <TD>permsy </ TD> <TD> nperms.act×n矩阵排列。每行是一个置换的y1,Y2,... yn的。只有一个烫发或permys的非Null,根据resp.type </ TD> </ TR> <tr valign="top"> <TD> all.perms.flag</ TD> <TD>了所有可能的排列使用?</ TD> </ TR> <tr valign="top"> <TD> ttstar</ TD> <TD> nperms.aca p的矩阵t检验统计量排列的数据。每列降序排序</ TD> </ TR> <tr valign="top"> <TD> ttstar0 </ TD> <TD>的测试统计nperms.act矩阵P的排列的数据。列中的数据顺序</ TD> </ TR> <tr valign="top"> <TD>eigengene.number </ TD> <TD>的eigengene的数量(例如1,2,... )被请求的模式发现</ TD> </ TR> <tr valign="top"> <TD>eigengene </ TD> <TD>计算eigengene </ TD> </ TR> <TR VALIGN =“”> <TD>pi0 </ TD> <TD>估计比例的非空(基因)</ TD> </ TR> <tr valign="top"> <TD> foldchange </ TD> <TD> P-矢量foldchanges对原始数据</ TD> </ TR> <tr valign="top"> <TD>foldchange.star </ TD> <TD > P nperms.act矩阵估计foldchanges </ TD> </ TR> <tr valign="top"> <TD> sdstar.keep </ TD> <TD> n的nperms.act矩阵排列的数据标准差从每个置换</ TD> </ TR> <tr valign="top"> <TD>的censoring.status.star.keep </ TD> <TD> n的nperms.act矩阵censoring.status指标从每个置换</ TD> </ TR> <tr valign="top"> <TD> resp.type</ TD> <TD>的响应类型。同resp.type.arg,除了时间过程数据,时间数据汇总,然后用非全日制课程。例如,如果resp.type.arg =“oneclass.timecourse”resp.type =“oneclass”</ TD> </ TR> <tr valign="top"> <TD> resp.type.arg</ TD> < TD>响应请求的类型调用SAMR </ TD> </ TR> <tr valign="top"> <TD> stand.contrasts </ TD> <TD>的多数据,nclass矩阵P的类的均值和总体均值的标准之间的差异</ TD> </ TR> <tr valign="top"> <TD>stand.contrasts.star </ TD> <TD>的多数据,p nclass nperms.act阵列标准化的对比度,置换后的数据集</ TD> </ TR> <tr valign="top"> <TD> stand.contrasts.95 </ TD> <TD>的多数据标准化的对比,2.5。可用于确定哪些类的对比度显著基因,是大</ TD> </ TR> <tr valign="top"> <TD> depth</ TD> <TD>对于array.type =“起“,估计每个样品的测序深度。</ TD> </ TR> <tr valign="top"> <TD>call</ TD> <TD>调用序列</ TD> </ TR> </ TABLE>
(作者)----------Author(s)----------
Jun Li and Balasubrimanian Narasimhan and Robert Tibshirani
参考文献----------References----------
Significance analysis of microarrays applied to the ionizing radiation response PNAS 2001 98: 5116-5121, (Apr 24). http://www-stat.stanford.edu/~tibs/SAM
Li, Jun and Tibshirani, R. (2011). Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. To appear, Statistical Methods in Medical Research.
实例----------Examples----------
######### two class unpaired comparison[########2类未配对的比较]
# y must take values 1,2[必须先将y的值1,2]
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))
data=list(x=x,y=y, geneid=as.character(1:nrow(x)),
genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE)
samr.obj<-samr(data, resp.type="Two class unpaired", nperms=100)
delta=.4
samr.plot(samr.obj,delta)
delta.table <- samr.compute.delta.table(samr.obj)
siggenes.table<-samr.compute.siggenes.table(samr.obj,delta, data, delta.table)
# sequence data[序列数据]
set.seed(3)
x<-abs(100*matrix(rnorm(1000*20),ncol=20))
x=trunc(x)
y<- c(rep(1,10),rep(2,10))
x[1:50,y==2]=x[1:50,y==2]+50
data=list(x=x,y=y, geneid=as.character(1:nrow(x)),
genenames=paste("g",as.character(1:nrow(x)),sep=""))
samr.obj<-samr(data, resp.type="Two class unpaired",assay.type="seq", nperms=100)
delta=5
samr.plot(samr.obj,delta)
delta.table <- samr.compute.delta.table(samr.obj)
siggenes.table<-samr.compute.siggenes.table(samr.obj,delta, data, delta.table)
########### two class paired[##########两个类配对]
# y must take values -1, 1, -2,2 etc, with (-k,k) being a pair[,y必须采取值-1,1,-2,2等,与(-k的k)为一对]
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y=c(-(1:10),1:10)
d=list(x=x,y=y, geneid=as.character(1:nrow(x)),
genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE)
samr.obj<-samr(d, resp.type="Two class paired", nperms=100)
#############quantitative response[############定量响应]
# y must take numeric values[,y必须采取的数值]
set.seed(84048)
x=matrix(rnorm(1000*9),ncol=9)
mu=c(3,2,1,0,0,0,1,2,3)
b=runif(100)+.5
x[1:100,]=x[1:100,]+ b
y=mu
d=list(x=x,y=y,
geneid=as.character(1:nrow(x)),genenames=paste("gene", as.character(1:nrow(x))))
samr.obj =samr(d, resp.type="Quantitative", nperms=50)
########### oneclass[##########oneclass]
# y is a vector of ones[y是一个向量的]
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,20))
data=list(x=x,y=y, geneid=as.character(1:nrow(x)),
genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE)
samr.obj<-samr(data, resp.type="One class", nperms=100)
###########survival data[##########生存数据]
# y is numeric; censoring.status=1 for failures, and 0 for censored[y是数字; censoring.status = 1的故障,0截尾]
set.seed(84048)
x=matrix(rnorm(1000*50),ncol=50)
x[1:50,26:50]= x[1:50,26:50]+2
x[51:100,26:50]= x[51:100,26:50]-2
y=abs(rnorm(50))
y[26:50]=y[26:50]+2
censoring.status=sample(c(0,1),size=50,replace=TRUE)
d=list(x=x,y=y,censoring.status=censoring.status,
geneid=as.character(1:1000),genenames=paste("gene", as.character(1:1000)))
samr.obj=samr(d, resp.type="Survival", nperms=20)
################multi-class example[###############多类的例子]
# y takes values 1,2,3,...k where k= number of classes[y将会是值1,2,3,... k,其中k =的班数]
set.seed(84048)
x=matrix(rnorm(1000*10),ncol=10)
x[1:50,6:10]= x[1:50,6:10]+2
x[51:100,6:10]= x[51:100,6:10]-2
y=c(rep(1,3),rep(2,3),rep(3,4))
d=list(x=x,y=y,geneid=as.character(1:1000),
genenames=paste("gene", as.character(1:1000)))
samr.obj <- samr(d, resp.type="Multiclass")
#################### timecourse data[###################时间过程数据]
# elements of y are of the form kTimet where k is the class label and t[其中,k为类标签和t y的元素的形式kTimet]
# is the time; in addition, the suffixes Start or End indicate the first[的时间,此外,开始或结尾的后缀表示第一个]
# and last observation in a given time course[在一个给定的时间过程和最后观测]
# the class label can be that for a two class unpaired, one class or[类的标签可以是两个类未成,一类或]
# two class paired problem[两个类的配对问题]
set.seed(8332)
y=paste(c(rep(1,15),rep(2,15)),"Time",rep(c(1,2,3,4,5,1.1,2.5, 3.7, 4.1,5.5),3),
sep="")
start=c(1,6,11,16,21,26)
for(i in start){
y[i]=paste(y[i],"Start",sep="")
}
for(i in start+4){
y[i]=paste(y[i],"End",sep="")
}
x=matrix(rnorm(1000*30),ncol=30)
x[1:50,16:20]=x[1:50,16:20]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[1:50,21:25]=x[1:50,21:25]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[1:50,26:30]=x[1:50,26:30]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,16:20]=x[51:100,16:20]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,21:25]=x[51:100,21:25]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,26:30]=x[51:100,26:30]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
data=list(x=x,y=y, geneid=as.character(1:nrow(x)),
genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE)
samr.obj<- samr(data, resp.type="Two class unpaired timecourse",
nperms=100, time.summary.type="slope")
##################### pattern discovery[####################模式发现的]
# here there is no outcome y; the desired eigengene is indicated by [这里没有结果Y所需的eigengene表示]
# the argument eigengene.numbe in the data object[在数据对象中的参数eigengene.numbe]
set.seed(32)
x=matrix(rnorm(1000*9),ncol=9)
mu=c(3,2,1,0,0,0,1,2,3)
b=3*runif(100)+.5
x[1:100,]=x[1:100,]+ b
d=list(x=x,eigengene.number=1,
geneid=as.character(1:nrow(x)),genenames=paste("gene", as.character(1:nrow(x))))
samr.obj=samr(d, resp.type="Pattern discovery", nperms=50)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|