找回密码
 注册
查看: 4081|回复: 0

R语言 samr包 SAM()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-9-29 22:10:53 | 显示全部楼层 |阅读模式
SAM(samr)
SAM()所属R语言包:samr

                                        Significance analysis of microarrays - simple user interface
                                         显着性分析的基因芯片 - 简单的用户界面

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Correlates a large number of features (eg genes) with an outcome variable, such as a group indicator, quantitative variable or survival time. This is a simple user interface for the samr function applied to array data. For sequencing data applications, see the function SAMseq.
关联了大量的功能(如基因)与结果变量,比如一组指标,定量变量或生存时间。这是一个简单的用户接口,用于SAMR施加到阵列的数据的功能。测序数据的应用程序,请参阅的功能SAMseq。


用法----------Usage----------


SAM(x,y=NULL,censoring.status=NULL,
resp.type=c("Quantitative","Two class unpaired","Survival","Multiclass",
"One class", "Two class paired","Two class unpaired timecourse",
"One class timecourse","Two class paired timecourse", "Pattern discovery"),
geneid = NULL,
genenames = NULL,
s0=NULL,
s0.perc=NULL,
nperms=100,
center.arrays=FALSE,
testStatistic=c("standard","wilcoxon"),
time.summary.type=c("slope","signed.area"),
regression.method=c("standard","ranks"),
return.x=TRUE,
knn.neighbors=10,
random.seed=NULL,
logged2 = FALSE,
fdr.output = 0.20,
eigengene.number = 1)



参数----------Arguments----------

参数:x
Feature matrix: p (number of features) by n (number of samples), one observation per column (missing values allowed)
特征矩阵:对(功能数)由n(样品数),每列的一个观测值(允许缺失值)


参数:y
n-vector of outcome measurements
n维向量结果测量


参数:censoring.status
n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome
n维向量的审查censoring.status(1 =死亡或事件发生,0 =活了下来,审查或事件),需要审查的生存结果


参数:resp.type
Problem type: "Quantitative" for a continuous parameter; "Two class unpaired";  "Survival" for censored survival outcome;  "Multiclass": more than 2 groups;   "One class" for a single group;  "Two class paired" for two classes with paired observations; "Two class unpaired timecourse",  "One class time course", "Two class.paired timecourse", "Pattern discovery"
问题类型:“量化”的连续参数“两班不成对的”,“生存”删失的生存结果,“多类”:2组以上,“一类”为一个组,“两配对”两个类成对观测“两课不成对的时间过程”,“一类的时间过程”,“两个class.paired的时间过程”,“模式发现”


参数:geneid
Optional character vector of geneids for output.
可选字符的矢量为输出的geneids的。


参数:genenames
Optional character vector of genenames for output.
可选字符的矢量为输出的genenames的。


参数:s0
Exchangeability factor  for denominator of test statistic; Default is automatic choice. Only used for array data.
互换性检验统计量分母因素,默认是自动的选择。仅用于阵列的数据。


参数:s0.perc
Percentile of standard  deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of  standard  deviation values= min of sd values. Only used for array data.
百分使用的标准偏差值S0;默认为自动选择,-1表示S0 = 0(不同的从s0.perc = 0,S0 =第0个百分位数的SD值的标准偏差值=分钟。仅用于阵列数据。


参数:nperms
Number of permutations used to estimate false discovery rates
估计假发现率的排列数


参数:center.arrays
Should the data for each sample (array) be median centered at the outset? Default =FALSE. Only used for array data.,
如果对每个样品的数据(阵列)是在开始时的中位数为中心的?默认值= FALSE。仅用于阵列的数据。


参数:testStatistic
Test statistic to use in two class unpaired case.Either "standard" (t-statistic) or ,"wilcoxon" (Two-sample wilcoxon or Mann-Whitney test). Only used for array data.
使用两个类未配对的case.Either“标准”(t统计量)或“魏氏”(样本Wilcoxon或Mann-Whitney检验)检验统计量。仅用于阵列的数据。


参数:time.summary.type
Summary measure for each time course: "slope", or "signed.area"). Only used for array data.
每个时间过程:“斜率”,或“signed.area”)简要措施。仅用于阵列的数据。


参数:regression.method
Regression method for quantitative case: "standard", (linear least squares) or "ranks" (linear least squares  on ranked data). Only used for array data.
回归方法进行定量的情况下:“标准”,(非线性最小二乘)或“行列”(线性最小二乘排名数据)。仅用于阵列的数据。


参数:return.x
Should the matrix of feature values be returned? Only useful for time course data, where x contains summaries of the features over time. Otherwise x is the same as the input data data\$x
矩阵的特征值回来了吗?只适用于时间过程,其中x包含摘要的功能随着时间的推移。否则,x是相同的数据作为输入数据\ $ x的


参数:knn.neighbors
Number of nearest neighbors to use for imputation of missing features values. Only used for array data.
最近的邻居,用于归集缺少的特性值的数目。仅用于阵列的数据。


参数:random.seed
Optional initial seed for random number generator (integer)
可选的初始种子的随机数发生器(整数)


参数:logged2
Has the data been transformed by log (base 2)? This information is used only  for computing fold changes
有数据被转化log(基数为2)?此信息仅用于计算倍的变化


参数:fdr.output
(Approximate) False Discovery Rate cutoff for output in significant genes table
(概约)在重要基因表输出的假发现率阈值


参数:eigengene.number
Eigengene to be used  (just for resp.type="Pattern discovery")
Eigengene(只是,为resp.type =“模式发现”)


Details

详细信息----------Details----------

This is a simple, user-friendly interface to the samr package used on array data. It calls samr, samr.compute.delta.table and samr.compute.siggenes.table. samr detects differential expression for  microarray data, and sequencing data, and other data with a large number of features. samr is the R package that is called by the "official" SAM Excel Addin. The format of the response vector y and the calling sequence is illustrated in the examples below. A more complete  description is given in the SAM manual
这是一个简单的,友好的用户界面阵列数据的SAMR包的使用。它要求SAMR,samr.compute.delta.table的和samr.compute.siggenes.table。 SAMR检测差异表达微阵列数据和序列数据,其他数据与大量的功能。 SAMR是R包,被称为“正式的”SAM的Excel加载项。在下面的实施例中示出的格式的响应矢量y和调用序列。更完整的说明中给出的SAM手册


值----------Value----------

A list with components <table summary="R valueblock"> <tr valign="top"><td>samr.obj</td> <td> Output of samr. See documentation for samr for details.</td></tr> <tr valign="top"><td>siggenes.table</td> <td> Table of significant genes, output of  samr.compute.siggenes.table. This has components: genes.up&mdash; matrix of significant genes having positive correlation with the outcome and genes.lo&mdash;matrix of significant genes having negative correlation with the outcome. For survival data, genes.up are those genes having positive correlation with risk- that is, increased expression corresponds to higher risk (shorter survival) genes.lo are those whose increased expression corresponds to lower risk (longer survival).</td></tr> <tr valign="top"><td>delta.table</td> <td> Output of  samr.compute.delta.table.</td></tr> <tr valign="top"><td>del</td> <td> Value of delta (distance from 45 degree line in SAM plot) for  used for creating delta.table and siggenes.table. Changing the input value fdr.output will change the resulting del.</td></tr> <tr valign="top"><td>call</td> <td> The calling sequence</td></tr> </table>
组件列表<table summary="R valueblock"> <tr valign="top"> <TD>samr.obj</ TD>的SAMR <TD>输出。 SAMR的详细信息,请参阅文档。</ TD> </ TR> <tr valign="top"> <TD> siggenes.table</ TD> <TD>表的显著基因,输出samr.compute.siggenes 。表。这部分组成:genes.up矩阵与的结果和genes.lo矩阵的重要基因呈负相关的结果具有正相关性的重要基因。为了生存数据,genes.up这些基因具有正相关性表达增加风险,也就是说,对应于较高的风险(生存期短)genes.lo是那些表达增加相对应,以降低风险(存活时间较长)。</ TD> </ TR> <tr valign="top"> <TD>delta.table </ TD> <TD>的samr.compute.delta.table输出。</ TD> </ TR> <TR VALIGN =“顶“<TD> del</ TD> <TD> Delta值(SAM图从45度直线距离)用于创建delta.table siggenes.table。更改的输入值fdr.output的改变DEL。</ TD> </ TR> <tr valign="top"> <TD>call </ TD> <TD>的调用序列</ TD > </ TR> </ TABLE>


(作者)----------Author(s)----------


Jun Li and Balasubrimanian Narasimhan and Robert Tibshirani



参考文献----------References----------

Significance analysis of microarrays applied to the ionizing radiation response  PNAS 2001 98: 5116-5121, (Apr 24).  http://www-stat.stanford.edu/~tibs/SAM
Li, Jun and Tibshirani, R. (2011). Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. To appear, Statistical Methods in Medical Research.

实例----------Examples----------



######### two class unpaired comparison[########2类未配对的比较]
# y must take values 1,2[必须先将y的值1,2]

set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)

u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))

samfit<-SAM(x,y,resp.type="Two class unpaired")

# examine significant gene list[检查显着的基因列表]

print(samfit)

# plot results[图结果]
plot(samfit)

########### two class paired[##########两个类配对]

# y must take values  -1, 1, -2,2 etc, with (-k,k) being a pair[,y必须采取值-1,1,-2,2等,与(-k的k)为一对]

set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)

u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u

y=c(-(1:10),1:10)



samfit<-SAM(x,y, resp.type="Two class paired",fdr.output=.25)




#############quantitative response[############定量响应]


set.seed(30)
p=1000
x<-matrix(rnorm(p*20),ncol=20)
y<-rnorm(20)
x[1:20,y>0]=x[1:20,y>0]+4
a<-SAM(x,y,resp.type="Quantitative",nperms=50,fdr.output=.5)






###########survival data[##########生存数据]
# y is numeric; censoring.status=1 for failures, and 0 for censored[y是数字; censoring.status = 1的故障,0截尾]

set.seed(84048)
x=matrix(rnorm(1000*50),ncol=50)
x[1:50,26:50]= x[1:50,26:50]+2
x[51:100,26:50]= x[51:100,26:50]-2

y=abs(rnorm(50))
y[26:50]=y[26:50]+2
censoring.status <- sample(c(0,1),size=50,replace=TRUE)

a<-SAM(x,y,censoring.status=censoring.status,resp.type="Survival",
nperms=20)



################multi-class example[###############多类的例子]
# y takes values 1,2,3,...k where k= number of classes[y将会是值1,2,3,... k,其中k =的班数]

set.seed(84048)
x=matrix(rnorm(1000*10),ncol=10)

y=c(rep(1,3),rep(2,3),rep(3,4))
x[1:50,y==3]=x[1:50,y==3]+5

a <- SAM(x,y,resp.type="Multiclass",nperms=50)






##################### pattern discovery[####################模式发现的]
# here there is no outcome y; the desired eigengene is indicated by [这里没有结果Y所需的eigengene表示]
# the argument eigengene.numbe in the data object[在数据对象中的参数eigengene.numbe]

set.seed(32)
x=matrix(rnorm(1000*9),ncol=9)
mu=c(3,2,1,0,0,0,1,2,3)
b=3*runif(100)+.5
x[1:100,]=x[1:100,]+ b



d=list(x=x,eigengene.number=1,
geneid=as.character(1:nrow(x)),genenames=paste("gene", as.character(1:nrow(x))))


a <- SAM(x, resp.type="Pattern discovery", nperms=50)


#################### timecourse data[###################时间过程数据]

# elements of y are of the form  kTimet  where k is the class label and t[其中,k为类标签和t y的元素的形式kTimet]
# is the time; in addition, the   suffixes Start or End indicate the first[的时间,此外,开始或结尾的后缀表示第一个]
# and last observation in a given time course[在一个给定的时间过程和最后观测]
# the class label can be that for a two class unpaired, one class or[类的标签可以是两个类未成,一类或]
# two class paired problem[两个类的配对问题]

set.seed(8332)
y=paste(c(rep(1,15),rep(2,15)),"Time",rep(c(1,2,3,4,5,1.1,2.5, 3.7, 4.1,5.5),3),
sep="")
start=c(1,6,11,16,21,26)
for(i in start){
y[i]=paste(y[i],"Start",sep="")
}
for(i in  start+4){
y[i]=paste(y[i],"End",sep="")
}
x=matrix(rnorm(1000*30),ncol=30)
x[1:50,16:20]=x[1:50,16:20]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[1:50,21:25]=x[1:50,21:25]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[1:50,26:30]=x[1:50,26:30]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)

x[51:100,16:20]=x[51:100,16:20]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,21:25]=x[51:100,21:25]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)
x[51:100,26:30]=x[51:100,26:30]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE)

a<- SAM(x,y,  resp.type="Two class unpaired timecourse",
nperms=100, time.summary.type="slope")



转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-29 11:45 , Processed in 0.030111 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表