R语言 WGCNA包 userListEnrichment()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 22:25:29

userListEnrichment(WGCNA)
userListEnrichment()所属R语言包：WGCNA

                                       Measure enrichment between inputted and user-defined lists
                                       测量输入和用户定义的列表之间的富集

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function measures list enrichment between inputted lists of genes and files containing user-defined lists of genes.  Significant enrichment is measured using a hypergeometric test.  A pre-made collection of brain-related lists can also be loaded.  The function writes the significant enrichments to a file, but also returns all overlapping genes across all comparisons.
此功能测量列表输入列表的基因和文件，其中包含用户自定义列表的基因之间的富集。主要富集测量使用超几何测试。也可以加载预先制作的收集大脑相关的列表。该函数写的显着富集到一个文件中，但还返回所有重叠的基因在所有的比较。

用法----------Usage----------

userListEnrichment(geneR, labelR, fnIn = NULL, catNmIn = fnIn, nameOut = "enrichment.csv",
               useBrainLists = FALSE, useBloodAtlases = FALSE, omitCategories = "grey",
               outputCorrectedPvalues = TRUE, useStemCellLists = FALSE, outputGenes = FALSE,
                  minGenesInCategory = 1, useBrainRegionMarkers = FALSE, useImmunePathwayLists = FALSE)

参数----------Arguments----------

参数：geneR
A vector of gene (or other) identifiers.  This vector should include ALL genes in your analysis (i.e., the genes correspoding to your labeled lists AND the remaining background reference genes).
一种向量，基因（或其它）的标识符。这个向量在你的分析应该包括所有的基因（例如，您的标记列表，其余的背景参照基因的基因correspoding）。

参数：labelR
A vector of labels (for example, module assignments) corresponding to the geneR list.  NOTE: For all background reference genes that have no corresponding label, use the label "background" (or any label included in the omitCategories parameter).
的向量的标签（例如，模块分配）对应的GENER列表。注：对于所有的的背景参照基因有没有相应的标签，使用标签“背景”（或任何的标签包括中的omitCategories的参数）。

参数：fnIn
A vector of file names containing user-defined lists.  These files must be in one of three specific formats (see details section).  The default (NULL) may only be used if one of the "use_____" parameters is TRUE.
的矢量文件名中包含用户自定义的列表。这些文件必须在三个特定的格式之一（详情请参阅“一节）。缺省的（NULL）仅可用于，如果一个“use_____”参数为TRUE。

参数：catNmIn
A vector of category names corresponding to each fnIn.  This name will be appended to each overlap corresponding to that filename.  The default sets the category names as the corresponding file names.
一个向量的每个fnIn相应的类别名称。此名称将被添加到该文件名对应的每个重叠。默认设置的类别名称为相应的文件名。

参数：nameOut
Name of the file where the output enrichment information will be written.  (Note that this file includes only a subset of what is returned by the function.)
输出丰富的信息将被写入的文件的名称。（请注意，本文件仅包含一个子集，由该函数返回的是什么。）

参数：useBrainLists
If TRUE, a pre-made set of brain-derived enrichment lists will be added to any user-defined lists for enrichment comparison.  The default is FALSE.  See references section for related references.
如果是TRUE，预先制作的的脑源性富集列表将被添加到任何用户定义的列表富集的比较。默认值是FALSE。请参阅相关文献的参考文献部分。

参数：useBloodAtlases
If TRUE, a pre-made set of blood-derived enrichment lists will be added to any user-defined lists for enrichment comparison.  The default is FALSE.  See references section for related references.
如果是TRUE，预先制作的的血源性富集列表将被添加到任何用户定义的列表富集的比较。默认值是FALSE。请参阅相关文献的参考文献部分。

参数：omitCategories
Any labelR entries corresponding to these categories will be ignored.  The default ("grey") will ignore unassigned genes in a standard WGCNA network.
对应于这些类别的任何labelR条目将被忽略。将忽略未分配的默认值（“灰色”）基因一个标准WGCNA网络。

参数：outputCorrectedPvalues
If TRUE (default) only pvalues that are significant after correcting for multiple comparisons (using Bonferroni method) will be outputted to nameOut.  Otherwise the uncorrected p-values will be outputted to the file.  Note that both sets of p-values for all comparisons are reported in the returned "pValues" parameter.
如果为TRUE（默认值）仅是显着的多重比较（使用Bonferroni方法），修正后的pvalues将被输出到nameOut。否则，未校正的p-值，将被输出到文件上。需要注意的是两套报告的所有比较的P-值在返回的“pValues”参数。

参数：useStemCellLists
If TRUE, a pre-made set of stem cell (SC)-derived enrichment lists will be added to any user-defined lists for enrichment comparison.  The default is FALSE.  See references section for related references.
如果是TRUE，预先制作的干单元（SC）衍生富集列表将被添加到任何用户定义的列表富集的比较。默认值是FALSE。请参阅相关文献的参考文献部分。

参数：outputGenes
If TRUE, will output a list of all genes in each returned category, as well as a count of the number of genes in each category.  The default is FALSE.
如果设置为TRUE，将输出每个返回的类别中的所有基因的列表，以及在每个类别中的基因的数量计数。默认值是FALSE。

参数：minGenesInCategory
Will omit all significant categories with fewer than minGenesInCategory genes (default is 1).
将省略所有重大的类别，少于minGenesInCategory基因（默认为1）。

参数：useBrainRegionMarkers
If TRUE, a pre-made set of enrichment lists for human brain regions will be added to any user-defined lists for enrichment comparison.  The default is FALSE.  These lists are derived from data from the Allen Human Brain Atlas (http://human.brain-map.org/).  See references section for more details.
如果为true，前组为人类的大脑区域的富集列表将被添加到任何用户定义的列表富集的比较。默认值是FALSE。艾伦人脑阿特拉斯（http://human.brain-map.org/），数据来自这些列表。有关详细信息，请参阅参考资料一节。

参数：useImmunePathwayLists
If TRUE, a pre-made set of enrichment lists for immune system pathways will be added to any user-defined lists for enrichment comparison.  The default is FALSE.  These lists are derived from the lab of Daniel R Saloman.  See references section for more details.
如果是TRUE，预先制作的免疫系统途径富集列表将被添加到任何用户定义的列表富集的比较。默认值是FALSE。这些名单是来自实验室的丹尼尔ŕ所罗门。有关详细信息，请参阅参考资料一节。

Details

详细信息----------Details----------

User-inputted files for fnIn can be in one of three formats:
用户输入的文件为fnIn可以在以下三种格式之一：

1) Text files (must end in ".txt") with one list per file, where the first line is the list descriptor and the remaining lines are gene names corresponding to that list, with one gene per line.  For example Ribosome RPS4 RPS8 ...
1）的文本文件（必须结束在“文本”）与每个文件的一个列表，其中第一行是列表描述符和其它线的基因名称对应到该列表中，每行的一个基因。例如核糖体RPS4 RPS8的...

2) Gene / category files (must be csv files), where the first line is the column headers corresponding to Genes and Lists, and the remaining lines correspond to the genes in each list, for any number of genes and lists.  For example: Gene, Category RPS4, Ribosome RPS8, Ribosome ... NDUF1, Mitohcondria NDUF3, Mitochondria ... MAPT, AlzheimersDisease PSEN1, AlzheimersDisease PSEN2, AlzheimersDisease ...
2）基因/类文件（必须是csv文件），其中第一行是对应的基因和列表的列标题，而其余的行对应的基因在每个列表中，为任意数量的基因，并列出。例如：基因，核糖体，核糖体RPS8 RPS4，分类... NDUF1，Mitohcondria NDUF3，线粒体...... MAPT，AlzheimersDisease PSEN1，AlzheimersDisease PSEN2，AlzheimersDisease ...

3) Module membership (kME) table in csv format.  Currently, the module assignment is the only thing that is used, so as long as the Gene column is 2nd and the Module column is 3rd, it doesn't matter what is in the other columns.  For example, PSID, Gene, Module, <other columns> <psid>, RPS4, blue, <other columns> <psid>, NDUF1, red, <other columns> <psid>, RPS8, blue, <other columns> <psid>, NDUF3, red, <other columns> <psid>, MAPT, green, <other columns> ...
3）模块的成员（KME）CSV格式的表格。目前，该模块的任务是使用的唯一的事情是，只要的基因列第二和模块列第三，它是什么并不重要，在其他列中。例如，PSID，基因，模块，<other列<psid>，RPS4，蓝，<other列<psid>，NDUF1，红色中，<other列<psid>，RPS8，蓝色，<other列<psid>，NDUF3，红色，MAPT，<other列<psid>，绿色，<other列...

值----------Value----------

参数：pValues
A matrix showing the number of overlapping genes and both the uncorrected and Bonferroni corrected p-values for every pair of list overlaps tested.
A矩阵示出重叠的基因的数目和未校正的和Bonferroni校正测试的每一个对列表重叠的p-值。

参数：ovGenes
A list of character vectors corresponding to the overlapping genes for every pair of list overlaps tested.  Specific overlaps can be found by typing <variableName>$ovGenes$'<labelR> – <comparisonCategory>'.  See example below.
特征向量对应于测试的每一个对列表重叠的重叠基因的列表。具体的重叠，可以发现通过键入<variableName> $ ovGenes“<labelR>  -  <comparisonCategory>。见下面的例子。

参数：sigOverlaps
Identical information that is written to nameOut.  P-values (corrected or uncorrected, depending on outputCorrectedPvalues) corresponding to all significant enrichments.
相同信息写入nameOut。 P-值（修正或未经修正，取决于outputCorrectedPvalues）对应的所有显着富集。

（作者）----------Author(s)----------

Jeremy Miller

参考文献----------References----------

References for the pre-defined brain lists (useBrainLists=TRUE, in alphabetical order by category descriptor) are as follows:

1. Blalock => Blalock E, Geddes J, Chen K, Porter N, Markesbery W, Landfield P (2004) Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. PNAS 101:2173-2178. 2. Colangelo => Colangelo V, Schurr J, Ball M, Pelaez R, Bazan N, Lukiw W (2002) Gene expression profiling of 12633 genes in Alzheimer hippocampal CA1: transcription and neurotrophic factor down-regulation and up-regulation of apoptotic and pro-inflammatory signaling. J Neurosci Res 70:462-473.  3. Liang => Liang WS, et al (2008) Altered neuronal gene expression in brain regions differentially affected by Alzheimer's disease: a reference data set. Physiological genomics 33:240-56.

1. Ginsberg => Ginsberg SD, Che S (2005) Expression profile analysis within the human hippocampus: comparison of CA1 and CA3 pyramidal neurons. J Comp Neurol 487:107-118. 2. Lein => Lein E, Zhao X, Gage F (2004) Defining a molecular atlas of the hippocampus using DNA microarrays and high-throughput in situ hybridization. J Neurosci 24:3879-3889. 3. Newrzella => Newrzella D, et al (2007) The functional genome of CA1 and CA3 neurons under native conditions and in response to ischemia. BMC Genomics 8:370. 4. Torres => Torres-Munoz JE, Van Waveren C, Keegan MG, Bookman RJ, Petito CK (2004) Gene expression profiles in microdissected neurons from human hippocampal subregions. Brain Res Mol Brain Res 127:105-114. 5. GorLorT => In either Ginsberg or Lein or Torres list.

1. GSE772 => Gan L, et al. (2004) Identification of cathepsin B as a mediator of neuronal death induced by Abeta-activated microglial cells using a functional genomics approach. J Biol Chem 279:5565-5572. 2. GSE1910 => Albright AV, Gonzalez-Scarano F (2004) Microarray analysis of activated mixed glial (microglia) and monocyte-derived macrophage gene expression. J Neuroimmunol 157:27-38. 3. AitGhezala => Ait-Ghezala G, Mathura VS, Laporte V, Quadros A, Paris D, Patel N, et al. Genomic regulation after CD40 stimulation in microglia: relevance to Alzheimer's disease. Brain Res Mol Brain Res 2005;140(1-2):73-85. 4. 3treatments_Thomas => Thomas, DM, Francescutti-Verbeem, DM, Kuhn, DM (2006) Gene expression profile of activated microglia under conditions associated with dopamine neuronal damage. The FASEB Journal 20:515-517.

1. 2+_26Mar08 => Genetics-based disease genes in two or more studies from http://www.alzforum.org/ (compiled by Mike Oldham). 2. Bachoo => Bachoo, R.M. et al. (2004) Molecular diversity of astrocytes with implications for neurological disorders. PNAS 101, 8384-8389. 3. Foster => Foster, LJ, de Hoog, CL, Zhang, Y, Zhang, Y, Xie, X, Mootha, VK, Mann, M. (2006) A Mammalian Organelle Map by Protein Correlation Profiling. Cell 125(1): 187-199. 4. Morciano => Morciano, M. et al. Immunoisolation of two synaptic vesicle pools from synaptosomes: a proteomics analysis. J. Neurochem. 95, 1732-1745 (2005). 5. Sugino => Sugino, K. et al. Molecular taxonomy of major neuronal classes in the adult mouse forebrain. Nat. Neurosci. 9, 99-107 (2006).

NOTE: Original data came from this neuronal-cell-type-selection experiment in mouse: Sugino K, Hempel C, Miller M, Hattox A, Shapiro P, Wu C, Huang J, Nelson S (2006). Molecular taxonomy of major neuronal classes in the adult mouse forebrain. Nat Neurosci 9:99-107

References for the pre-defined blood atlases (useBloodAtlases=TRUE, in alphabetical order by category descriptor) are as follows:
1. Abbas AB, Baldwin D, Ma Y, Ouyang W, Gurney A, et al. (2005). Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data.  Genes Immun. 6(4):319-31. 2. Grigoryev YA, Kurian SM, Avnur Z, Borie D, Deng J, et al. (2010).  Deconvoluting post-transplant immunity: cell subset-specific mapping reveals pathways for activation and expansion of memory T, monocytes and B cells.  PLoS One. 5(10):e13358. 3. Watkins NA, Gusnanto A, de Bono B, De S, Miranda-Saavedra D, et al. (2009). A HaemAtlas: characterizing gene expression in differentiated human blood cells. Blood. 113(19):e1-9.

References for the pre-defined stem cell (SC) lists (useStemCellLists=TRUE, in alphabetical order by category descriptor) are as follows:
hematopoietic stem cells/progenitor cells (CD133+ cells), from: Cui K, Zang C, Roh TY, Schones DE, Childs RW, Peng W, Zhao K. (2009). Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4:80-93
TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B, Johnstone SE, Cole MF, Isono K, et al. (2006) Control of developmental regulators by polycomb in human embryonic stem cells. Cell 125:301-313
References and more information for the pre-defined human brain region lists (useBrainRegionMarkers=TRUE):
Three categories of marker genes are presented: 1. globalMarker(top200) = top 200 global marker genes for 22 large brain structures. Genes are ranked based on fold change enrichment (expression in region vs. expression in rest of brain) and the ranks are averaged between brains 2001 and 2002 (human.brain-map.org). 2. localMarker(top200) =  top 200 local marker genes for 90 large brain structures.  Same as 1, except fold change is defined as expression in region vs. expression in larger region (format: <region>_IN_<largerRegion>).  For example, enrichment in CA1 is relative to other subcompartments of the hippocampus. 3. localMarker(FC>2) = same as #2, but only local marker genes with fold change > 2 in both brains are included.  Regions with <10 marker genes are omitted.
More information for the pre-defined immune pathways lists (useImmunePathwayLists=TRUE):

实例----------Examples----------

# Example: first, read in some gene names and split them into categories[例：第一，阅读一些基因的名称，将它们分为类]
data(BrainLists);
listGenes = unique(as.character(BrainLists[,1]))
set.seed(100)
geneR = sort(sample(listGenes,10000))
categories = sort(rep(standardColors(10),1000))
categories[sample(1:10000,1000)] = "grey"
write(c("TESTLIST1",geneR[700:1500], sep="\n"),"TESTLIST1.txt")
write(c("TESTLIST2",geneR[1600:2400],sep="\n"),"TESTLIST2.txt")

# Now run the function![现在运行的功能！]
system.time({testResults = userListEnrichment(geneR, labelR=categories, fnIn=c("TESTLIST1.txt","TESTLIST2.txt"), catNmIn=c("TEST1","TEST2"),
nameOut = "testEnrichment.csv",useBrainLists=FALSE, omitCategories ="grey")})

# To see a list of all significant enrichments, either open the file "testEnrichments.csv" in the current directory, or type:[要查看列表中的所有显着富集，可以打开在当前目录中的文件“testEnrichments.csv”，或类型：]
testResults$sigOverlaps

# To see all of the overlapping genes between two categories (whether or not the p-value is significant), type restResults$ovGenes$'<labelR> -- <comparisonCategory>'.  For example:[要看到所有的重叠基因两个类别之间（p值是否显著），的类型restResults $ ovGenes $ <labelR>  -  <comparisonCategory>“。例如：]
testResults$ovGenes$"black -- TESTLIST1__TEST1"
testResults$ovGenes$"red -- salmon_M12_Ribosome__HumanMeta"

# More detailed overlap information is in the pValue output.  For example:[更详细的重叠的信息，是在P值输出。例如：]
head(testResults$pValue)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册