getGEO(GEOquery)
getGEO()所属R语言包:GEOquery
Get a GEO object from NCBI or file
从NCBI或文件得到一个GEO对象
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function is the main user-level function in the GEOquery package. It directs the download (if no filename is specified) and parsing of a GEO SOFT format file into an R data structure specifically designed to make access to each of the important parts of the GEO SOFT format easily accessible.
此功能是用户级在GEOquery包主要功能。它引导到一个R数据结构专门设计,使访问每个GEO的软格式,方便的重要组成部分下载(如果没有指定文件名)和地缘软格式文件的解析。
用法----------Usage----------
getGEO(GEO = NULL, filename = NULL, destdir = tempdir(), GSElimits=NULL,
GSEMatrix=TRUE,AnnotGPL=FALSE)
参数----------Arguments----------
参数:GEO
A character string representing a GEO object for download and parsing. (eg., 'GDS505','GSE2','GSM2','GPL96')
一个字符串,代表GEO对象为下载和解析。 (例如,“GDS505,GSE2,GSM2”,“GPL96)
参数:filename
The filename of a previously downloaded GEO SOFT format file or its gzipped representation (in which case the filename must end in .gz). Either one of GEO or filename may be specified, not both. GEO series matrix files are also handled. Note that since a single file is being parsed, the return value is not a list of esets, but a single eset when GSE matrix files are parsed.
先前下载的GEO软格式的文件或gzip压缩的代表性(在这种情况下,文件名必须结束。GZ)的文件名。任GEO或文件名可以指定,而不是两个。 GEO系列矩阵文件处理。请注意,因为一个单一的文件被解析,返回值是不是esets列表,但单一的ESET GSE矩阵文件进行解析时。
参数:destdir
The destination directory for any downloads. Defaults to the architecture-dependent tempdir. You may want to specify a different directory if you want to save the file for later use. Doing so is a good idea if you have a slow connection, as some of the GEO files are HUGE!
任何下载的目标目录。默认架构依赖TEMPDIR。您可能要指定一个不同的目录,如果你要保存的文件,供以后使用。这样做是一个好主意,如果你有一个缓慢的连接,作为GEO的文件是巨大的!
参数:GSElimits
This argument can be used to load only a contiguous subset of the GSMs from a GSE. It should be specified as a vector of length 2 specifying the start and end (inclusive) GSMs to load. This could be useful for splitting up large GSEs into more manageable parts, for example.
这种说法,可以用于装载从GSE只有一个连续的GSMS子集。它应该被指定为一个长度的向量2指定的开始和结束(含)GSMS加载。分裂上升到更易于管理的部分,例如,大两房,这可能是有用的。
参数:GSEMatrix
A boolean telling GEOquery whether or not to use GSE Series Matrix files from GEO. The parsing of these files can be many orders-of-magnitude faster than parsing the GSE SOFT format files. Defaults to TRUE, meaning that the SOFT format parsing will not occur; set to FALSE if you for some reason need other columns from the GSE records.
一个布尔告诉GEOquery是否使用从GEOGSE系列矩阵文件。这些文件的解析,可以有许多订单数量级比解析GSE软格式的文件更快。默认为true,这意味着软格式解析不会发生;设置为FALSE,如果由于某种原因需要从GSE记录的其他列。
参数:AnnotGPL
A boolean defaulting to FALSE as to whether or not to use the Annotation GPL information. These files are nice to use because they contain up-to-date information remapped from Entrez Gene on a regular basis. However, they do not exist for all GPLs; in general, they are only available for GPLs referenced by a GDS
一个布尔值,默认为FALSE是否或不使用GPL的注释信息。这些文件是很好用的,因为它们包含到的最新信息,定期从Entrez的基因重新映射。然而,他们不存在的所有GPLS在一般情况下,他们只由GDS引用GPLS
Details
详情----------Details----------
getGEO functions to download and parse information available from NCBI GEO (http://www.ncbi.nlm.nih.gov/geo). Here are some details about what is avaible from GEO. All entity types are handled by getGEO and essentially any information in the GEO SOFT format is reflected in the resulting data structure.
getGEO功能下载并分析从NCBI GEO(http://www.ncbi.nlm.nih.gov/geo)提供的资料。这里有关于什么是从地缘avaible一些细节。所有的实体类型由getGEO处理,并基本上反映在GEO软格式的任何信息所产生的数据结构。
From the GEO website:
从土力工程网站:
The Gene Expression Omnibus (GEO) from NCBI serves as a public repository for a wide range of high-throughput experimental data. These data include single and dual channel microarray-based experiments measuring mRNA, genomic DNA, and protein abundance, as well as non-array techniques such as serial analysis of gene expression (SAGE), and mass spectrometry proteomic data. At the most basic level of organization of GEO, there are three entity types that may be supplied by users: Platforms, Samples, and Series. Additionally, there is a curated entity called a GEO dataset.
从NCBI(GEO)的基因表达的综合服务作为一个公共库的高通量实验数据的广泛。这些数据包括单通道和双通道芯片为基础的实验测量基因,基因组DNA和蛋白质的丰度,以及非阵列技术,如基因表达序列分析(SAGE)和质谱分析蛋白质组数据。在对地观测组织的最基层,有三个实体类型,可以由用户提供的平台,样品,系列。此外,还有被称为GEO数据集是一个策划的实体。
A Platform record describes the list of elements on the array (e.g., cDNAs, oligonucleotide probesets, ORFs, antibodies) or the list of elements that may be detected and quantified in that experiment (e.g., SAGE tags, peptides). Each Platform record is assigned a unique and stable GEO accession number (GPLxxx). A Platform may reference many Samples that have been submitted by multiple submitters.
平台记录描述了阵列上(例如的cDNA,寡核苷酸probesets的ORF,抗体),或在该实验可检测和量化(如SAGE标签,肽类)的元素的列表中的元素的列表。每个平台记录被分配一个独特的和稳定的GEO的加入人数(GPLxxx)。一个平台,可能会引用已经由多个提交者提交的许多样品。
A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it. Each Sample record is assigned a unique and stable GEO accession number (GSMxxx). A Sample entity must reference only one Platform and may be included in multiple Series.
一个范例记录描述的条件下,个别样品处理,它接受的操作,以及从它派生的每个元素的丰度测量。每个样品记录被分配一个独特的和稳定的GEO的加入人数(GSMxxx)。一个简单的实体必须引用只有一个平台,可以包含多个系列。
A Series record defines a set of related Samples considered to be part of a group, how the Samples are related, and if and how they are ordered. A Series provides a focal point and description of the experiment as a whole. Series records may also contain tables describing extracted data, summary conclusions, or analyses. Each Series record is assigned a unique and stable GEO accession number (GSExxx).
A系列记录定义一组相关样本被认为是一组,有关样品如何,如果和它们是如何排列的一部分。 A系列提供了一个联络点,并作为一个整体的实验说明。系列记录也可能包含表描述提取数据,总结结论,或分析。每个系列的记录被分配一个独特的和稳定的GEO的加入人数(GSExxx)。
GEO DataSets (GDSxxx) are curated sets of GEO Sample data. A GDS record represents a collection of biologically and statistically comparable GEO Samples and forms the basis of GEO's suite of data display and analysis tools. Samples within a GDS refer to the same Platform, that is, they share a common set of probe elements. Value measurements for each Sample within a GDS are assumed to be calculated in an equivalent manner, that is, considerations such as background processing and normalization are consistent across the dataset. Information reflecting experimental design is provided through GDS subsets.
GEO数据集(GDSxxx)策划GEO的样本数据集。一个GDS记录代表生物和统计学相媲美的地球静止轨道的样品和形式GEO的数据显示和分析工具套件的基础上的集合。在GDS的样品是指在同一平台,也就是说,他们共享一套通用的探针元素。为每一个GDS内的样品价值计量假设在相同的方式计算,即整个数据集,如后台处理和规范化的考虑是一致的。通过GDS子集提供信息反映实验设计。
值----------Value----------
An object of the appropriate class (GDS, GPL, GSM, or GSE) is returned. If the GSEMatrix option is used, then a list of ExpressionSet objects is returned, one for each SeriesMatrix file associated with the GSE accesion. If the filename argument is used in combination with a GSEMatrix file, then the return value is a single ExpressionSet.
返回一个适当的类(GDS,GPL的,GSM,或GSE)的对象。用于,如果GSEMatrix选项,然后返回一个ExpressionSet对象的名单,为与GSE accesion有关每个SeriesMatrix文件之一。如果结合与GSEMatrix文件的文件名参数,则返回值是单一ExpressionSet。
警告----------Warning ----------
Some of the files that are downloaded, particularly those associated with GSE entries from GEO are absolutely ENORMOUS and parsing them can take quite some time and memory. So, particularly when working with large GSE entries, expect that you may need a good
一些下载的文件,特别与GSE来自地球的条目相关的绝对是巨大的和分析,他们可以采取相当长的一段时间和内存。因此,尤其是当大GSE项工作,希望你可能需要一个良好的
作者(S)----------Author(s)----------
Sean Davis
参见----------See Also----------
getGEOfile
getGEOfile
举例----------Examples----------
# gds <- getGEO("GDS10")[GDS < - getGEO(“GDS10”)]
# gds[GDS]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|