SAM(samr)
SAM()所属R语言包:samr
Significance analysis of microarrays - simple user interface
显着性分析的基因芯片 - 简单的用户界面
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Correlates a large number of features (eg genes) with an outcome variable, such as a group indicator, quantitative variable or survival time. This is a simple user interface for the samr function applied to array data. For sequencing data applications, see the function SAMseq.
关联了大量的功能(如基因)与结果变量,比如一组指标,定量变量或生存时间。这是一个简单的用户接口,用于SAMR施加到阵列的数据的功能。测序数据的应用程序,请参阅的功能SAMseq。
用法----------Usage----------
SAM(x,y=NULL,censoring.status=NULL,
resp.type=c("Quantitative","Two class unpaired","Survival","Multiclass",
"One class", "Two class paired","Two class unpaired timecourse",
"One class timecourse","Two class paired timecourse", "Pattern discovery"),
geneid = NULL,
genenames = NULL,
s0=NULL,
s0.perc=NULL,
nperms=100,
center.arrays=FALSE,
testStatistic=c("standard","wilcoxon"),
time.summary.type=c("slope","signed.area"),
regression.method=c("standard","ranks"),
return.x=TRUE,
knn.neighbors=10,
random.seed=NULL,
logged2 = FALSE,
fdr.output = 0.20,
eigengene.number = 1)
参数----------Arguments----------
参数:x
Feature matrix: p (number of features) by n (number of samples), one observation per column (missing values allowed)
特征矩阵:对(功能数)由n(样品数),每列的一个观测值(允许缺失值)
参数:y
n-vector of outcome measurements
n维向量结果测量
参数:censoring.status
n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome
n维向量的审查censoring.status(1 =死亡或事件发生,0 =活了下来,审查或事件),需要审查的生存结果
参数:resp.type
Problem type: "Quantitative" for a continuous parameter; "Two class unpaired"; "Survival" for censored survival outcome; "Multiclass": more than 2 groups; "One class" for a single group; "Two class paired" for two classes with paired observations; "Two class unpaired timecourse", "One class time course", "Two class.paired timecourse", "Pattern discovery"
问题类型:“量化”的连续参数“两班不成对的”,“生存”删失的生存结果,“多类”:2组以上,“一类”为一个组,“两配对”两个类成对观测“两课不成对的时间过程”,“一类的时间过程”,“两个class.paired的时间过程”,“模式发现”
参数:geneid
Optional character vector of geneids for output.
可选字符的矢量为输出的geneids的。
参数:genenames
Optional character vector of genenames for output.
可选字符的矢量为输出的genenames的。
参数:s0
Exchangeability factor for denominator of test statistic; Default is automatic choice. Only used for array data.
互换性检验统计量分母因素,默认是自动的选择。仅用于阵列的数据。
参数:s0.perc
Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values. Only used for array data.
百分使用的标准偏差值S0;默认为自动选择,-1表示S0 = 0(不同的从s0.perc = 0,S0 =第0个百分位数的SD值的标准偏差值=分钟。仅用于阵列数据。
参数:nperms
Number of permutations used to estimate false discovery rates
估计假发现率的排列数
参数:center.arrays
Should the data for each sample (array) be median centered at the outset? Default =FALSE. Only used for array data.,
如果对每个样品的数据(阵列)是在开始时的中位数为中心的?默认值= FALSE。仅用于阵列的数据。
参数:testStatistic
Test statistic to use in two class unpaired case.Either "standard" (t-statistic) or ,"wilcoxon" (Two-sample wilcoxon or Mann-Whitney test). Only used for array data.
使用两个类未配对的case.Either“标准”(t统计量)或“魏氏”(样本Wilcoxon或Mann-Whitney检验)检验统计量。仅用于阵列的数据。
参数:time.summary.type
Summary measure for each time course: "slope", or "signed.area"). Only used for array data.
每个时间过程:“斜率”,或“signed.area”)简要措施。仅用于阵列的数据。
参数:regression.method
Regression method for quantitative case: "standard", (linear least squares) or "ranks" (linear least squares on ranked data). Only used for array data.
回归方法进行定量的情况下:“标准”,(非线性最小二乘)或“行列”(线性最小二乘排名数据)。仅用于阵列的数据。
参数:return.x
Should the matrix of feature values be returned? Only useful for time course data, where x contains summaries of the features over time. Otherwise x is the same as the input data data\$x
矩阵的特征值回来了吗?只适用于时间过程,其中x包含摘要的功能随着时间的推移。否则,x是相同的数据作为输入数据\ $ x的
参数:knn.neighbors
Number of nearest neighbors to use for imputation of missing features values. Only used for array data.
最近的邻居,用于归集缺少的特性值的数目。仅用于阵列的数据。
参数:random.seed
Optional initial seed for random number generator (integer)
可选的初始种子的随机数发生器(整数)
参数:logged2
Has the data been transformed by log (base 2)? This information is used only for computing fold changes
有数据被转化log(基数为2)?此信息仅用于计算倍的变化
参数:fdr.output
(Approximate) False Discovery Rate cutoff for output in significant genes table
(概约)在重要基因表输出的假发现率阈值
参数:eigengene.number
Eigengene to be used (just for resp.type="Pattern discovery")
Eigengene(只是,为resp.type =“模式发现”)
Details
详细信息----------Details----------
This is a simple, user-friendly interface to the samr package used on array data. It calls samr, samr.compute.delta.table and samr.compute.siggenes.table. samr detects differential expression for microarray data, and sequencing data, and other data with a large number of features. samr is the R package that is called by the "official" SAM Excel Addin. The format of the response vector y and the calling sequence is illustrated in the examples below. A more complete description is given in the SAM manual
这是一个简单的,友好的用户界面阵列数据的SAMR包的使用。它要求SAMR,samr.compute.delta.table的和samr.compute.siggenes.table。 SAMR检测差异表达微阵列数据和序列数据,其他数据与大量的功能。 SAMR是R包,被称为“正式的”SAM的Excel加载项。在下面的实施例中示出的格式的响应矢量y和调用序列。更完整的说明中给出的SAM手册
值----------Value----------
A list with components <table summary="R valueblock"> <tr valign="top"><td>samr.obj</td> <td> Output of samr. See documentation for samr for details.</td></tr> <tr valign="top"><td>siggenes.table</td> <td> Table of significant genes, output of samr.compute.siggenes.table. This has components: genes.up— matrix of significant genes having positive correlation with the outcome and genes.lo—matrix of significant genes having negative correlation with the outcome. For survival data, genes.up are those genes having positive correlation with risk- that is, increased expression corresponds to higher risk (shorter survival) genes.lo are those whose increased expression corresponds to lower risk (longer survival).</td></tr> <tr valign="top"><td>delta.table</td> <td> Output of samr.compute.delta.table.</td></tr> <tr valign="top"><td>del</td> <td> Value of delta (distance from 45 degree line in SAM plot) for used for creating delta.table and siggenes.table. Changing the input value fdr.output will change the resulting del.</td></tr> <tr valign="top"><td>call</td> <td> The calling sequence</td></tr> </table>
组件列表<table summary="R valueblock"> <tr valign="top"> <TD>samr.obj</ TD>的SAMR <TD>输出。 SAMR的详细信息,请参阅文档。</ TD> </ TR> <tr valign="top"> <TD> siggenes.table</ TD> <TD>表的显著基因,输出samr.compute.siggenes 。表。这部分组成:genes.up矩阵与的结果和genes.lo矩阵的重要基因呈负相关的结果具有正相关性的重要基因。为了生存数据,genes.up这些基因具有正相关性表达增加风险,也就是说,对应于较高的风险(生存期短)genes.lo是那些表达增加相对应,以降低风险(存活时间较长)。</ TD> </ TR> <tr valign="top"> <TD>delta.table </ TD> <TD>的samr.compute.delta.table输出。</ TD> </ TR> <TR VALIGN =“顶“<TD> del</ TD> <TD> Delta值(SAM图从45度直线距离)用于创建delta.table siggenes.table。更改的输入值fdr.output的改变DEL。</ TD> </ TR> <tr valign="top"> <TD>call </ TD> <TD>的调用序列</ TD > </ TR> </ TABLE>
(作者)----------Author(s)----------
Jun Li and Balasubrimanian Narasimhan and Robert Tibshirani
参考文献----------References----------
Significance analysis of microarrays applied to the ionizing radiation response PNAS 2001 98: 5116-5121, (Apr 24). http://www-stat.stanford.edu/~tibs/SAM
Li, Jun and Tibshirani, R. (2011). Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. To appear, Statistical Methods in Medical Research.
实例----------Examples----------
######### two class unpaired comparison[########2类未配对的比较]
# y must take values 1,2[必须先将y的值1,2]
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))
samfit<-SAM(x,y,resp.type="Two class unpaired")
# examine significant gene list[检查显着的基因列表]
print(samfit)
# plot results[图结果]
plot(samfit)
########### two class paired[##########两个类配对]
# y must take values -1, 1, -2,2 etc, with (-k,k) being a pair[,y必须采取值-1,1,-2,2等,与(-k的k)为一对]
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y=c(-(1:10),1:10)
samfit<-SAM(x,y, resp.type="Two class paired",fdr.output=.25)
#############quantitative response[############定量响应]
set.seed(30)
p=1000
x<-matrix(rnorm(p*20),ncol=20)
y<-rnorm(20)
x[1:20,y>0]=x[1:20,y>0]+4
a<-SAM(x,y,resp.type="Quantitative",nperms=50,fdr.output=.5)
###########survival data[##########生存数据]
# y is numeric; censoring.status=1 for failures, and 0 for censored[y是数字; censoring.status = 1的故障,0截尾]
set.seed(84048)
x=matrix(rnorm(1000*50),ncol=50)
x[1:50,26:50]= x[1:50,26:50]+2
x[51:100,26:50]= x[51:100,26:50]-2
y=abs(rnorm(50))
y[26:50]=y[26:50]+2
censoring.status <- sample(c(0,1),size=50,replace=TRUE)
a<-SAM(x,y,censoring.status=censoring.status,resp.type="Survival",
nperms=20)
################multi-class example[###############多类的例子]
# y takes values 1,2,3,...k where k= number of classes[y将会是值1,2,3,... k,其中k =的班数]
set.seed(84048)
x=matrix(rnorm(1000*10),ncol=10)
y=c(rep(1,3),rep(2,3),rep(3,4))
x[1:50,y==3]=x[1:50,y==3]+5
a <- SAM(x,y,resp.type="Multiclass",nperms=50)
##################### pattern discovery[####################模式发现的]
# here there is no outcome y; the desired eigengene is indicated by [这里没有结果Y所需的eigengene表示]
# the argument eigengene.numbe in the data object[在数据对象中的参数eigengene.numbe]
set.seed(32)
x=matrix(rnorm(1000*9),ncol=9)
mu=c(3,2,1,0,0,0,1,2,3)
b=3*runif(100)+.5
x[1:100,]=x[1:100,]+ b
d=list(x=x,eigengene.number=1,
geneid=as.character(1:nrow(x)),genenames=paste("gene", as.character(1:nrow(x))))
a <- SAM(x, resp.type="Pattern discovery", nperms=50)
#################### timecourse data[###################时间过程数据]
# elements of y are of the form kTimet where k is the class label and t[其中,k为类标签和t y的元素的形式kTimet]
# is the time; in addition, the suffixes Start or End indicate the first[的时间,此外,开始或结尾的后缀表示第一个]
# and last observation in a given time course[在一个给定的时间过程和最后观测]
# the class label can be that for a two class unpaired, one class or[类的标签可以是两个类未成,一类或]
# two class paired problem[两个类的配对问题]
set.seed(8332)
y=paste(c(rep(1,15),rep(2,15)),"Time",rep(c(1,2,3,4,5,1.1,2.5, 3.7, 4.1,5.5),3),
sep="")
start=c(1,6,11,16,21,26)
for(i in start){
y[i]=paste(y[i],"Start",sep="")
}
for(i in start+4){
y[i]=paste(y[i],"End",sep="")
}
x=matrix(rnorm(1000*30),ncol=30)
x[1:50,16:20]=x[1:50,16:20]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[1:50,21:25]=x[1:50,21:25]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[1:50,26:30]=x[1:50,26:30]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,16:20]=x[51:100,16:20]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,21:25]=x[51:100,21:25]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,26:30]=x[51:100,26:30]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
a<- SAM(x,y, resp.type="Two class unpaired timecourse",
nperms=100, time.summary.type="slope")
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|