找回密码
 注册
查看: 1075|回复: 0

R语言 snpStats包 read.long()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-26 14:46:02 | 显示全部楼层 |阅读模式
read.long(snpStats)
read.long()所属R语言包:snpStats

                                         Read SNP genotype data in long format
                                         SNP基因型数据读长格式

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

This function reads SNP genotype data from a file in which each line refers to a single genotype call. Replaces the earlier function read.snps.long.
此功能SNP基因型数据从一个文件中每一行是指一个单一的基因型调用读取。替换早前功能read.snps.long。


用法----------Usage----------


read.long(file, samples, snps,
            fields = c(snp = 1, sample = 2, genotype = 3, confidence = 4,
                       allele.A = NA, allele.B = NA),
            split = "\t| +", gcodes, no.call = "", threshold = NULL,
            lex.order = FALSE, verbose = FALSE)



参数----------Arguments----------

参数:file
Name(s) of file(s) to be read (can be gzipped)  
文件名称(S)(S)被读取(可以是gzip压缩)


参数:samples
Either a vector of sample identifiers, or the number of samples to be read. If a single file is to be read and this argument is omitted, the file will be scanned initially and all samples will be included  
无论是样品标识的向量,或样本数要读。如果一个单一的文件是要读省略此参数,该文件将开始扫描所有样品将被纳入


参数:snps
Either a vector of SNP identifiers, or the number of SNPs to be read. If a single file is to be read and this argument is omitted, the file will be scanned initially and all SNPs will be included  
无论是向量的SNP标识符,或单核苷酸多态性的数目被读取。如果一个单一的文件是要读省略此参数,该文件将开始扫描,将包括所有的SNPs


参数:fields
A named vector giving the locations of the required fields. See Details below  
命名的向量所需的字段的位置。详情如下


参数:split
A regular expression specifying how the input line will be split into fields. The default value specifies separation of fields by a TAB character, or by one or more blanks  
正则表达式指定的输入线如何将分裂成领域。默认值指定一个制表符,或由一个或多个空白领域分离


参数:gcodes
When the genotype is read as a single field, this argument specifies how it is handled. See Details below.  
当基因型读作为一个单一的领域,这个参数指定它是如何处理。详见下文。


参数:no.call
The string which indicates "no call" for either a genotype or (when the genotype is read as two allele fields) an allele  
字符串指示基因型(基因型为两个等位基因领域读取)等位基因“没有呼叫”


参数:threshold
A vector of length 2 giving the lower and higher acceptable limits for the confidence score  
一个向量的长度2给予的信心得分较低和较高的可接受的范围


参数:lex.order
If TRUE, the alleles at each locus will be in lexographical order. Otherwise, ordering of alleles is arbitrary, depending on the order in which they are encountered  
如果TRUE,将在lexographical为了在每个座位的等位基因。否则,等位基因的顺序是任意的,在它们出现的顺序而定


参数:verbose
If TRUE, this turns on output from the function. Otherwise only error and warning messages are produced  
如果TRUE,这打开输出的功能。否则只有错误和警告消息产生


Details

详情----------Details----------

Each line on the input file represents a single call and is split into fields using the function strsplit. The required fields are extracted according to the fields argument. This must contain the locations of the sample and snp identifier fields and either the location of a genotype field or the locations of two allele fields.
输入文件的每一行代表一个单独的调用,并分裂成使用功能strsplit领域。所需的字段提取根据fields论点。这必须包含的样品和SNP标识符字段的地点和位置基因型领域或两个等位基因领域的位置。

If the samples and snps arguments contain vectors of character strings, a SnpMatrix is created with these row and column names and the  genotype values are "cherry-picked" from the input file. If either, or both, of these arguments are specified simply as numbers, then these numbers determine the dimensions of the SnpMatrix created. In this case samples and/or SNPs are included in the SnpMatrix on a first-come-first-served basis. If either or both of these arguments are omitted, a preliminary scan of the input file is carried out to find the missing sample and/or SNP identifiers.  In this scan,  when a sample or SNP identifier differs from that in the previous line, but is identical to one previously found, then all the relevant identifiers are assumed to have been found. This implies that the file must be sorted, in some consistent order, by sample and by SNP (although either one of these may vary fastest).
如果samples和snps参数包含字符串向量,1 SnpMatrix创建与这些行和列的名称和基因型值是从输入文件“樱桃采摘”。如果其中一个,或两者兼而有之,这些参数指定为数字简单地说,这些数字确定SnpMatrix创建的尺寸。在这种情况下,样品和/或单核苷酸多态性SnpMatrix先到先得,先到先得。如果省略这些参数的任何一方或双方,输入文件进行初步扫描,以寻找丢失的样品和/或SNP标识符。在此扫描,当样品或SNP标识符不同于前行,但以前发现的相同,那么所有相关的标识符假设已发现。这意味着该文件必须排序在某些一致的顺序,样品和单核苷酸多态性(虽然其中任何一个可能会有所不同最快)。

If the genotype is read as a single field, it's handling is specified by the gcodes argument. If this is absent, NULL, or NA,  then the genotype is assumed to be represented by a two-character field (the two characters representing the two alleles). If gcodes is a single string, then this is assumed to be a regular expression which will split the genotype field into two allele fields. Otherwise, gcode must be an array of length three, specifying the three genotype codes in the order "AA", "AB", "BB"
如果基因型是作为一个单一的领域看,它的处理是由gcodes参数指定。如果这是缺席的,NULL或NA,然后基因型假设由两个字符字段(代表的两个等位基因的两个字符)代表。如果gcodes是一个字符串,然后这被认为是一个正则表达式将分成两个等位基因领域的基因型领域。否则,gcode必须是一个长度的三个数组,指定三个基因型代码为“AA”的顺序,“AB”的“BB”


值----------Value----------

If the genotype is read as a single field matching one of three specified codes, the function returns an object of class SnpMatrix. Otherwise it returns a list whose first element is the SnpMatrix object and whose second element is a dataframe containing the allele codes, with the SNP identifiers as row names. Note that allele codes only occur in this file if they occur in a genotype which was accepted. Thus, monomorphic SNPs have allele.B coded as NA, and SNPs which never pass confidence score filters have both alleles coded as NA.
如果基因型读取匹配的三个指定的代码作为一个单一的领域,该函数返回一个对象类SnpMatrix。否则,它返回一个列表的第一个元素是SnpMatrix对象的第二个元素是一个含有等位基因编码的dataframe,行名称的SNP标识符。注释等位基因代码只发生在这个文件中,如果它们发生在一个被接受的基因型。因此,单形性的SNPsallele.BNA编码,单核苷酸多态性从未传递信心得分过滤器有两个等位基因编码NA。


注意----------Note----------

Unlike read.snps.long, this function is written entirely in R and may not be particularly fast. However, it imposes no restrictions on the allele codes recognized.
与read.snps.long,此功能是写在R完全可能不是特别快。然而,对认可的等位基因编码没有任何限制。

Homozygous genotypes are assumed to be represented in the input file  by coding both alleles to the same value. No special provision is made to read XSnpMatrix  objects; such data should first be read as a SnpMatrix and then coerced to an XSnpMatrix using new or as.
纯合子基因型假设输入文件中的两个等位基因编码为相同的值表示。没有特殊的规定是由读取XSnpMatrix对象,这样的数据,首先应读为1 SnpMatrix然后强迫XSnpMatrix用new或as。


作者(S)----------Author(s)----------



David Clayton <a href="mailto:david.clayton@cimr.cam.ac.uk">david.clayton@cimr.cam.ac.uk</a>




参见----------See Also----------

SnpMatrix-class, XSnpMatrix-class
SnpMatrix-class,XSnpMatrix-class


举例----------Examples----------


##[#]
## No example supplied yet[#没有例子还提供]
##[#]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 17:31 , Processed in 0.019987 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表