nsFilter(genefilter)
nsFilter()所属R语言包:genefilter
Filtering of Features in an ExpressionSet
特点筛选ExpressionSet在
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The function nsFilter tries to provide a one-stop shop for different options of filtering (removing) features from an ExpressionSet. Filtering features exhibiting little variation, or a consistently low signal, across samples can be advantageous for the subsequent data analysis (Bourgon et al.). Furthermore, one may decide that there is little value in considering features with insufficient annotation.
的功能nsFilter试图提供一个一站式过滤(去除)从ExpressionSet的功能不同的选项。参展跨样本,几乎没有变化,或持续偏低的信号过滤功能,可以为后续的数据分析(Bourgon等)的优势。此外,可以决定是有考虑不足的注解功能价值不大。
用法----------Usage----------
nsFilter(eset, require.entrez=TRUE,
require.GOBP=FALSE, require.GOCC=FALSE,
require.GOMF=FALSE, require.CytoBand=FALSE,
remove.dupEntrez=TRUE, var.func=IQR,
var.cutoff=0.5, var.filter=TRUE,
filterByQuantile=TRUE, feature.exclude="^AFFX", ...)
varFilter(eset, var.func=IQR, var.cutoff=0.5, filterByQuantile=TRUE)
featureFilter(eset, require.entrez=TRUE,
require.GOBP=FALSE, require.GOCC=FALSE,
require.GOMF=FALSE, require.CytoBand=FALSE,
remove.dupEntrez=TRUE, feature.exclude="^AFFX")
参数----------Arguments----------
参数:eset
an ExpressionSet object
ExpressionSet对象
参数:var.func
The function used as the per-feature filtering statistic. This function should return a numeric vector of length one when given a numeric vector as input.
作为每个功能过滤统计的功能。这个函数应该返回一个长度为1的数值向量作为输入时,数字向量。
参数:var.filter
A logical indicating whether to perform filtering based on var.func.
一个逻辑说明是否执行过滤的基础上var.func。
参数:filterByQuantile
A logical indicating whether var.cutoff is to be interprested as a quantile of all var.func values (the default), or as an absolute value.
一个逻辑,是否var.cutoff是所有var.func值(默认),分量或绝对值interprested。
参数:var.cutoff
A numeric value. If var.filter is TRUE, features whose value of var.func is less than either: the var.cutoff-quantile of all var.func values (if filterByQuantile is TRUE), or var.cutoff (if filterByQuantile is FALSE) will be removed.
一个数值。如果var.filter的是TRUE,功能,其价值的var.func是比任何var.cutoff所有var.func值(如果filterByQuantile的是TRUE)-位数,或var.cutoff(filterByQuantile是FALSE)将被删除。
参数:require.entrez
If TRUE, filter out features without an Entrez Gene ID annotation. If using an annotation package where an identifier system other than Entrez Gene IDs is used as the central ID, then that ID will be required instead.
如果TRUE,筛选出没有Entrez基因标识标注的功能。如果使用Entrez基因身份证识别系统比使用作为中央ID,则该ID将被要求,而不是一个注解包。
参数:require.GOBP, require.GOCC, require.GOMF
If TRUE, filter out features whose target genes are not annotated to at least one GO term in the BP, CC or MF ontology, respectively.
如果TRUE,筛选出的特点,其靶基因至少有一个在BP,CC或MF本体的GO术语,分别不注明。
参数:require.CytoBand
If TRUE, filter out features whose target genes have no mapping to cytoband locations.
如果TRUE,过滤出功能没有映射到cytoband位置的靶基因。
参数:remove.dupEntrez
If TRUE and there are features mapping to the same Entrez Gene ID (or equivalent), then the feature with the largest value of var.func will be retained and the other(s) removed.
如果TRUE有功能映射到相同的Entrez基因身份证(或同等学历),然后功能var.func将被保留,其他()删除的最大价值。
参数:feature.exclude
A character vector of regular expressions. Feature identifiers (i.e. value of featureNames(eset)) that match one of the specified patterns will be filtered out. The default value is intended to filter out Affymetrix quality control probe sets.
一个正则表达式的字符向量。将被过滤掉功能标识符(即featureNames(eset)的值)匹配指定的模式之一。默认值是为了过滤掉Affymetrix公司的质量控制探针组。
参数:...
Unused, but available for specializing methods.
未使用的,但对于专业的方法。
Details
详情----------Details----------
In this Section, the effect of filtering on the type I error rate estimation / control of subsequent hypothesis testing is explained. See also the paper by Bourgon et al.
在本节中,类型过滤效果,我错误率估计/控制后续的假设检验的解释。也看到Bourgon等文件。
Marginal type I errors: Filtering on the basis of a statistic which is independent of the test statistic used for detecting differential gene expression can increase the detection rate at the same marginal type I error. This is clearly the case for filter criteria that do not depend on the data, such as the annotation based criteria provided by the nsFilter and featureFilter functions. However, marginal type I error can also be controlled for certain types of data-dependent criteria. Call U^1 the stage 1 filter statistic, which is a function that is applied feature by feature, based on whose value the feature is or is not accepted to pass to stage 2, and which depends only on the data for that feature and not any other feature, and call U^2 the stage 2 test statistic for differential expression. Sufficient conditions for marginal type-I error control are:
边缘I型错误:统计,这是独立于用于检测基因差异表达可以提高破案率在相同的边际I类错误的测试统计的基础上筛选。这显然是为筛选条件的情况下,如nsFilter和featureFilter功能所提供的注释基于标准的数据,不依赖于。然而,边际I类错误,也可以对某些类型的数据依赖的标准加以控制。打检测U^1第1阶段的过滤器统计,这是一个应用功能特性,根据其价值功能是不能被接受传递给第2阶段,这取决于该功能只对数据的功能而不是任何其他的功能,并呼吁U^2第2阶段差异表达的测试统计。边际I型误差控制的充分条件是:
U^1 the overall (across all samples) variance or mean, U^2 the t-statistic (or any other scale and location invariant statistic), data normal distributed and exchangeable across samples.
U^1整体(所有样品)方差或意味着,U^2t-统计(或任何其他的规模和位置不变的统计),数据和样品间正常的分布式交换。
U^1 the overall mean, U^2 the moderated t-statistic (as in limma's eBayes function), data normal distributed and exchangeable.
U^1整体平均,U^2放缓t-统计(limma的eBayes功能),数据正常分布和交换。
U^1 a sample-class label independent function (e.g. overall mean, median, variance, IQR), U^2 the Wilcoxon rank sum statistic, data exchangeable.
U^1一个样本类标号独立的功能(如总体平均数,中位数,方差,四分),U^2Wilcoxon秩和统计,数据交换。
Experiment-wide type I error: Marginal type-I error control provided by the conditions above is sufficient for control of the family wise error rate (FWER). Note, however, that common false discovery rate (FDR) methods depend not only on the marginal behaviour of the test statistics under the null hypothesis, but also on their joint distribution. The joint distribution can be affected by filtering, even when this filtering leaves the marginal distributions of true-null test statistics unchanged. Filtering might, for example, change correlation structure. The effect of this is negligible in many cases in practice, but this depends on the dataset and the filter used, and the assessment is in the responsibility of the data analyst.
实验型宽我的错误:上述条件所提供的边际类型,我的错误控制是足够的家庭明智的错误率(FWER)控制。注意,但是,常见的错误发现率(FDR)的方法,不仅取决于零假设下检验统计量的边际行为,同时也对他们的联合分布。联合分布可以通过过滤的影响,即使这种过滤叶真空检验统计量不变的边际分布。过滤可能,例如,改变相关结构。这种效果是在实践中的许多情况下可以忽略不计,但是这取决于对数据集和使用的过滤器,评估的数据分析员的责任。
Annotation Based Filtering Arguments require.entrez, require.GOBP, require.GOCC, require.GOMF and require.CytoBand filter based on available annotation data. The annotation package is determined by calling annotation(eset).
基于注解的过滤参数require.entrez,require.GOBP,require.GOCC,require.GOMF和require.CytoBand可用的注释数据的基础上筛选。注解包是由调用annotation(eset)。
Variance Based Filtering The var.filter, var.func, var.cutoff and varByQuantile arguments control numerical cutoff-based filtering. Probes for which var.func returns NA are removed. The default var.func is IQR, which we here define as rowQ(eset, ceiling(0.75 * ncol(eset))) - rowQ(eset, floor(0.25 * ncol(eset))); this choice is motivated by the observation that unexpressed genes are detected most reliably through low variability of their features across samples. Additionally, IQR is robust to outliers (see note below). The default var.cutoff is 0.5 and is motivated by a rule of thumb that in many tissues only 40% of genes are expressed. Please adapt this value to your data and question.
过滤var.filter,var.func,var.cutoff和varByQuantile参数控制基于数值截止滤波基于方差。探针var.func返回NA被删除的。默认的var.func是IQR,这是我们在这里定义为rowQ(eset, ceiling(0.75 * ncol(eset))) - rowQ(eset, floor(0.25 * ncol(eset)));这种选择是由未表达的基因检测,通过跨样本的功能低变异,最可靠的观察动机。此外,IQR是强大到离群(见下文附注)。默认var.cutoff是0.5是一个经验法则,在许多组织中,只有40%的基因表达的动机。请此值,以适应你的数据和问题。
By default the numerical-filter cutoff is interpreted as a quantile, so with the default settings, 50% of the genes are filtered.
默认情况下,数字滤波器的截止被解释为位数,所以使用默认设置,50%的基因被筛选。
Variance filtering is performed last, so that (if varByQuantile=TRUE and remove.dupEntrez=TRUE) the final number of genes does indeed exclude precisely the var.cutoff fraction of unique genes remaining after all other filters were passed.
方差滤波是最后一个,所以(如果varByQuantile=TRUE和remove.dupEntrez=TRUE)最终的基因数量确实排除正是var.cutoff分数后剩余的所有其他过滤器通过独特的基因。
The stand-alone function varFilter does only var.func-based filtering (and no annotation based filtering). featureFilter does only annotation based filtering and duplicate removal; it always performs duplicate removal to retain the highest-IQR probe for each gene.
独立的功能varFilter不只是var.func基于过滤(没有标注基于过滤)。 featureFilter只标注基于过滤和去除重复的,它始终执行重复删除,保留每个基因探针的最高四分。
值----------Value----------
For nsFilter a list consisting of:
对于nsFilter列表,包括:
参数:eset
the filtered ExpressionSet
过滤ExpressionSet
参数:filter.log
a list giving details of how many probe sets where removed for each filtering step performed.
列表给多少探针集在每个过滤执行步骤中删除的细节。
For both varFilter and featureFilter the filtered ExpressionSet.
对于既varFilter和featureFilter过滤ExpressionSet。
注意----------Note----------
IQR is a reasonable variance-filter choice when the dataset is split into two roughly equal and relatively homogeneous phenotype groups. If your dataset has important groups smaller than 25% of the overall sample size, or if you are interested in unusual individual-level patterns, then IQR may not be sensitive enough for your needs. In such cases, you should consider using less robust and more sensitive measures of variance (the simplest of which would
IQR是一个合理的方差滤波器选择时分裂成两个大致相等和相对均匀的表型组数据集。如果你的数据集有超过25%的总体样本大小较小的重要群体,或在不寻常的个人层次的模式,如果你感兴趣,那么IQR可能不敏感足以满足您的需求。在这种情况下,你应该考虑使用更强大和更敏感的措施(其中,最简单的方差
作者(S)----------Author(s)----------
Seth Falcon (somewhat revised by Assaf Oron)
参考文献----------References----------
Independent filtering increases power for detecting differentially expressed genes, Technical Report.
举例----------Examples----------
library("hgu95av2.db")
library("Biobase")
data(sample.ExpressionSet)
ans <- nsFilter(sample.ExpressionSet)
ans$eset
ans$filter.log
## skip variance-based filtering[#跳过基于方差滤波]
ans <- nsFilter(sample.ExpressionSet, var.filter=FALSE)
a1 <- varFilter(sample.ExpressionSet)
a2 <- featureFilter(sample.ExpressionSet)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|