找回密码
 注册
查看: 694|回复: 0

R语言 chopsticks包 read.HapMap.data()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 14:59:36 | 显示全部楼层 |阅读模式
read.HapMap.data(chopsticks)
read.HapMap.data()所属R语言包:chopsticks

                                         function to import HapMap genotype data as snp.matrix
                                         功能导入snp.matrix HapMap的基因型数据

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Given a URL for HapMap genotype data, read.HapMap.data,  download and convert the genotype data into a snp.matrix class object, and saving snp support infomation into an associated data.frame.
HapMap的基因型数据的URL,read.HapMap.data,下载并转换成一个snp.matrix类对象的基因型数据,并保存到一个相关的数据框SNP支持简介。


用法----------Usage----------


read.HapMap.data(url, verbose=FALSE, save=NULL, ...)



参数----------Arguments----------

参数:url
URL for HapMap data. Web data is to be specified with prefix "http://", ftp data with prefix "ftp://", and local file as "file://"
HapMap数据的URL。 Web数据是被指定的“http://”前缀,前缀“ftp://”FTP数据,并为“文件:/ /”本地文件


参数:verbose
Where the dnSNPalleles annotation is ambiguous, output more details information about how/why assignment is made. See Notes below.
凡dnSNPalleles注解是含糊不清,输出更多的细节信息分配作出有关如何/为什么。看到下面的注释。


参数:save
filename to save the download - if unspecified, a temporary file will be created but removed afterwards.
文件名保存下载 - 如果未指定,将创建一个临时文件,但事后删除。


参数:...
Place-holder for further switches - currently ignored.
地方人进一步交换机 - 目前被忽略。


Details

详情----------Details----------

During the conversion, if the dbSNPAlleles entry is exactly of the form "X/Y", where X, Y = A or C or G or T, then it is used directly for assigning allele 1 and allele 2.
在转换过程中,,如果dbSNPAlleles进入完全相同的形式是“X / Y轴”,其中的X,Y =一个或C或G或T,那么它是用来直接分配等位基因1和等位基因2。

However, about 1 in 1000 entries are more complicated e.g. may involving deletion, e.g. "-/A/G" or "-/A/AGT/G/T". Some heuristics are used in such cases, in which the observed genotypes in the specific snp of the current batch are examined in two passes. The first time to see which bases are present, excluding "N".
然而,在1000个条目约1更复杂,例如可能涉及缺失,例如: “-/A/G”或“-/A/AGT/G/T”。在这种情况下,使用一些启发式,在两次传球检查中观察到,在当前批次的特定SNP的基因型。第一次看到该碱基目前,“N”的除外。

If more than 2 bases are observed in the batch specified in the url, the routine aborts, but so far this possibility has not arisen in tests. If there is exactly two, then allele 1 and 2 are assigned in alphabatical order (dbSNPAlleles entries seems to be always in dictionary order, so the assignment made should agree with a shorten version of the dbSNPAlleles entry). Likewise, if only "A" or "T" is observed, then we know automatically it is the first (assigned as "A/.") or the last allele (assigned as "./T") of a hypothetical pair, without looking at the dbSNPAlleles entry. For other observed cases of 1 base, the routine goes further and look at the dnSNPAlleles entry and see if it begins with "-/X/" or ends with "/X", as a single base, and compare it with the single base observed to see if it should be allele 1 (same as the beginning, or different from the end) and allele 2 (same as the end, or different from the beginning). If no decision can be made for a particular snp entry, the routine aborts with an appropriate message. (for zero observed bases, assignment is "./.", and of course, all observed genotypes of that snp are therefore converted to the equivalent of NA)
如果超过2个碱基在指定的网址,常规中止批次观察,但到目前为止,这种可能性并没有在测试中出现的。如果有正好有两个,然后1和2等位基因是在alphabatical为了分配(dbSNPAlleles条目似乎总是在字典顺序,因此取得的转让应同意与的dbSNPAlleles条目缩短版本)。同样,如果只有“一”或“T”型观察,然后我们知道自动,它是第一个(指定为“/”)或最后等位基因(“。/”分配)一个假设的对,不看在dbSNPAlleles条目。对于其他观测到的情况下,1个基本的日常更进一步,并期待在dnSNPAlleles条目,如果它开始“ -  / x /”或“/ X”作为一个单一的碱基,结束,并比较它与单个碱基的观察,看它是否应该是1(一样的开始,或从末端不同)等位基因和等位基因2(相同的结束,或从一开始就不同)。如果没有可以决定一个特定的SNP进入,与相应的消息在例行中止。 (零观测碱基,任务是“/”,当然,发现该SNP基因型因此转换为相当于钠)

(This heuristics does not cover all grounds, but practically it seems to work. See Notes below.)
(此启发式并不涵盖所有的理由,但实际上它似乎工作。见以下说明。)


值----------Value----------

Returns a list containing these two items when successful, otherwise returns NULL:
返回一个列表,包含这两个项目的成功,否则返回NULL:


参数:snp.data
A snp.matrix-class object containing the snp data
一个snp.matrix-class对象,其中包含的SNP数据


参数:snp.support
A data.frame, containing the dbSNPalleles, Chromosome, Position, Strand entries from the hapmap genotype file, together with the actual Assignment used for allele 1 and allele 2 during the conversion (See Details above and Note below).
数据框,含dbSNPalleles,Chromosome,Position,Strand条目从HapMap基因型文件,连同实际Assignment用于等位基因1等位基因在转换过程中(见上面详细信息,并注意以下)2。


注意----------Note----------

Using both "file://" for url and save duplicates the file. (i.e. by default, the routine make a copy of the url in any case, but tidy up afterwards if run without save).
使用url和save复制的文件都“文件:/ /”。 (即默认情况下,常规的URL在任何情况下的副本,但收拾之后,如果没有save运行)。

Sometimes the assignment may not be unique e.g. dnSNPAlleles entry "A/C/T" and only "C" is observed - this can be assigned "A/C" or "C/T". (currently it does the former). One needs to be especially careful when joining two sets of snp data and it is imperative to compare the assigment supplementary data to see they are compatible. (e.g. for an "A/C/T" entry, one data set may have "C" only and thus have assignment "A/C" and have all of it assigned Allele 2 homozygotes, whereas another data set contains both "C" and "T" and thus the first set needs to be modified before joining).
有时,转让可能不是唯一的例如dnSNPAlleles条目“的A / C / T”和“C”的观察 - 这可分配的“A / C”或“的C / T”。 (目前它是前者)。一个需要加入两个SNP数据集时,要特别小心,当务之急是比较的assigment补充资料,看它们是兼容的。 (如“的A / C /”项,可能有一个数据集的“C”,从而有分配“的A / C”,有它的所有分配的等位基因2纯合子,而另一个数据集包含“C”和“T”型,因此第一套需要修改前加入)。

A typical run, chromosome 1 for CEU, contains about ~400,000 snps and ~100 samples, and the snp.matrix object is about ~60MB (40 million bytes for snps plus overhead) and similiar for the support data (i.e. ~ 2x), takes about 30 seconds, and at peak memory usage requires ~ 4x . The actual download is ~20MB, which is compressed from ~200MB.  
一个典型的运行,持续教育部门的1号染色体,包含约~40万个SNPs和~100个样品,约60MB(40百万字节加上开销单核苷酸多态性)的支持数据(即~2倍)同级snp.matrix对象,大约需要30秒,峰值内存使用需要~4倍。实际下载~20MB~200MB压缩。


作者(S)----------Author(s)----------


Hin-Tak Leung <a href="mailto:htl10@users.sourceforge.net">htl10@users.sourceforge.net</a>



参考文献----------References----------

<h3>See Also</h3>

举例----------Examples----------


## Not run: [#无法运行:]

## ** Please be aware that the HapMap project generates new builds from[#**请注意,HapMap项目产生新版本]
## ** to time and the build number in the URL changes.[#**时间和内部版本号中的URL变化。]

> library(snpMatrix)
> testurl <- "http://www.hapmap.org/genotypes/latest/fwd_strand/non-redundant/genotypes_chr1_CEU_r21_nr_fwd.txt.gz"
> result1 <- read.HapMap.data(testurl)
> sum1 <- summary(result1$snp.data)

> head(sum1[is.finite(sum1$z.HWE),], n=10)
           Calls Call.rate         MAF      P.AA       P.AB      P.BB      z.HWE
rs1933024     87 0.9666667 0.005747126 0.0000000 0.01149425 0.9885057 0.05391549
rs11497407    89 0.9888889 0.005617978 0.0000000 0.01123596 0.9887640 0.05329933
rs12565286    88 0.9777778 0.056818182 0.0000000 0.11363636 0.8863636 0.56511033
rs11804171    83 0.9222222 0.030120482 0.0000000 0.06024096 0.9397590 0.28293272
rs2977656     90 1.0000000 0.005555556 0.9888889 0.01111111 0.0000000 0.05299907
rs12138618    89 0.9888889 0.050561798 0.0000000 0.10112360 0.8988764 0.50240136
rs3094315     88 0.9777778 0.136363636 0.7272727 0.27272727 0.0000000 1.48118392
rs17160906    89 0.9888889 0.106741573 0.0000000 0.21348315 0.7865169 1.12733108
rs2519016     85 0.9444444 0.047058824 0.0000000 0.09411765 0.9058824 0.45528615
rs12562034    90 1.0000000 0.088888889 0.0000000 0.17777778 0.8222222 0.92554468

## ** Please be aware that the HapMap project generates new builds from[#**请注意,HapMap项目产生新版本]
## ** to time and the build number in the URL changes.[#**时间和内部版本号中的URL变化。]

## This URL is broken up into two to fit the width of[#此URL分为两个适合宽度]
## the paper. There is no need in actual usage:[#纸。有没有必要在实际使用中:]
> testurl2 <- paste("http://www.hapmap.org/genotypes/latest/",
                  "fwd_strand/non-redundant/genotypes_chr1_JPT_r21_nr_fwd.txt.gz", sep="")
> result2 <- read.HapMap.data(testurl2)

> head(result2$snp.support)
           dbSNPalleles Assignment Chromosome Position Strand
rs10399749          C/T        C/T       chr1    45162      +
rs2949420           A/T        A/T       chr1    45257      +
rs4030303           A/G        A/G       chr1    72434      +
rs4030300           A/C        A/C       chr1    72515      +
rs3855952           A/G        A/G       chr1    77689      +
rs940550            C/T        C/T       chr1    78032      +

## End(Not run)[#结束(不运行)]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-5-15 03:14 , Processed in 0.028567 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表