查看: 587|回复: 0

R语言 phenoTest包 findCopyNumber()函数中文帮助文档(中英文对照)

发表于 2012-2-26 11:04:02 | 显示全部楼层 |阅读模式

                                         Find copy number regions using expression data in a similar way ACE does.

                                         译者:生物统计家园网 机器人LoveR


Given enrichment scores between two groups of samples and the chromosomical positions of those enrichment scores this function finds areas where the enrichemnt is bigger/lower than expected if the positions where assigned at random. Plots of the regions and positions of the enriched regions are provided.


findCopyNumber(x, minGenes = 15, B = 100, p.adjust.method = "BH",
pvalcutoff = 0.05, exprScorecutoff = NA, mc.cores = 1, useAllPerm = F,
genome = "hg19", chrLengths, sampleGenome = TRUE, useOneChr = FALSE,
useIntegrate = TRUE,plot=TRUE)


An object of class data.frame with gene or probe identifiers as row names and the following columns: es (the enrichment score), chr (the chromosome where the gene or probe belong to) and pos (position in the chromosome in megabases). It can be obtained (from an epheno object) with the function getEsPositions.  

Minimum number of genes in a row that have to be enriched to mark the region as enriched. Has to be bigger than 2.  

Number of permuations that will be computed to calculate pvalues. If useAllPerm is FALSE this value has to be bigger than 100.  If useAllPerm is TRUE the computations are much more expensive, therefore it is not recommended to use a B bigger than 100.  
数,将计算计算pvalues permuations。 useAllPerm如果是FALSE这个值要大于100。如果useAllPerm是真正的计算是昂贵得多,因此它不建议使用大于100的B。

P value adjustment method to be used. p.adjust.methods provides a list of available methods.  
要使用P值调整方法。 p.adjust.methods提供了一个可用的方法列表。

All genes with an adjusted p value lower than this parameter will be considered enriched.  

Genes with a smoothed score that is not bigger (lower if the given number is negative) than the specified value will not be considered significant.  

Number of cores to be used in the computation. If mc.cores is bigger than 1 the multicore library has to be loaded.  
在计算中使用的核心数量。 mc.cores如果大于1multicore库被加载。

If FALSE for each gene only permutations of genes that are in an area with similar density (similar number of genes close to them) are used to compute pvalues.  If TRUE all permutations are used for each gene.  We recommend to use the option FALSE after having observed that the enrichment can depend on the number of genes that are in the area.  We recommend to use the option TRUE if the positions of the enrichment score are equidistant. Take into account that this option is much slower and needs less permutations, therefore a smaller B is preferred.  See details for more info.  

Genome that will be used to draw cytobands.  

An object of class numeric containing chromosome names as names. This names have to be the same as the ones used in x$chr If missing the last position is used.  

If positions are sampled over the hole genome (across chromosomes) or within each chromosome. This is TRUE by default.   

Use only one chromosome to build the distribution under the null hypothesis that genes/probes are not enriched. By default this is FALSE. The chromosome that is used is chosen as follows: after removing small chromosomes we select the one closest to the median quadratic distance to 0. Setting this parameter to TRUE decreases processing time.  

If we want to use integrate or pnorm to compute pvalues. The first does not assume any distribution for the distribution under the null hypothesis, the second assumes it is normally distributed.  

If FALSE the function will make no plots.  



Enrichemnt scores can be either log fold changes, log hazard ratios, log variabiliy ratios or any other score.

Within each chromosome a smoothed score for each gene is obtained via generalized additive models, the smoothing parameter for each chromosome being chosen via cross-validation. The obtained smoothing parameter of each chromosome will be used in permutations.

We assessed statistical significance by permuting the positions thrue the hole genome. If useAllPerm is FALSE for each gene the permutations of genes that are in an area with similar density (distance to tenth gene) are used to compute pvalues. We observed that genes with similar densities tend to have similar smoothed scores. If we set 1000 permutations (B=1000) scores are permuted thrue  the hole genome 10 times (1000/100). For each smoothed scored the permutations of the 100 smoothed scores with most similar density (distance to tenth gene) are used. Therefore each smoothed score will be compared to 1000 smoothed scores obtained from permutations.
我们评估置换的位置thrue洞基因组统计学意义。 useAllPerm如果是FALSE,每个基因的基因,在一个区域具有类似密度的排列(第十届基因的距离)用来计算pvalues的。我们观察到的具有类似密度的基因往往有类似的平滑分数。如果我们把1000的排列(B= 1000)分数置换thrue洞基因组的10倍(1000/100)。对于每一个平滑的得分与100平滑的分数的排列密度最相似的(第十届基因的距离)。因此,每一个平滑的得分将较1000平滑的排列获得分数。

If scores are at the same distance in the genome from each other (for instance when we have a score every fixed certain bases) the option useAllPerm=TRUE is recommended. In this case every smoothed score is compared to all smoothed scores obtained via permutations. In this case having 20,000 genes and setting the paramter B=10 would mean that the scores are permuted 10 times times thrue the hole genome, obtaining 200,000 permuted smoothed scores. Each observed smoothed score will be tested against the distribution of the 200,000 permuted smoothed scores.
如果分数在彼此的基因组中相同的距离(例如,当我们有一个得分,每一个固定的某些碱基)选项useAllPerm= TRUE,建议。在这种情况下,每一个平滑的得分相比,所有平滑,通过置换获得的分数。在这种情况下,有2万个基因,并设置参数置B=10将意味着分数置换10倍thrue孔的基因组,获得200,000置换平滑分数。每个观测平滑的得分将被测试反对200,000置换平滑分数的分布。

Only regions with as many genes as told in minGenes being statistically significant (pvalue lower than parameter pvalcutoff) after adjusting pvalues with the method specified in p.adjust.method will be selected as enriched. If exprScorecutoff is different form NA, a gene to be statistically significant will need (aditionally to the pvalue cutoff) to have a smoothed score bigger (lower if exprScorecutoff is negative) than the specified value.


Plots all chromomes and marks the enriched regions. Also returns a data.frame containing the positions of the enriched regions. This output can be passed by to the genesInArea function to obtain the names of the genes that are in each region.


Evarist Planet

参见----------See Also----------

getEsPositions, genesInArea


#mypos &lt;- getEsPositions(epheno,'Relapse')[mypos < -  getEsPositions(epheno票价,复发)]
#mypos$chr &lt;- '1' #we set all probes to chr one for illustration purposes[mypos美元CHR < -  1#我们设置的所有探针作说明用途以CHR一]
#                 #(we want a minimum number of probes per chromosome) [#(我们希望每个染色体的探针的最低数量)]
#regions &lt;- findCopyNumber(mypos,B=10,plot=FALSE) [区域< -  findCopyNumber人(mypos,= 10,积= FALSE,)]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


使用道具 举报

您需要登录后才可以回帖 登录 | 注册


手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-1 15:50 , Processed in 0.022107 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表