read.snps.long(snpStats)
read.snps.long()所属R语言包:snpStats
Read SNP data in long format (deprecated)
SNP数据读长格式(已过时)
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Reads SNP data when organized in free format as one call per line. Other than the one call per line requirement, there is considerable flexibility. Multiple input files can be read, the input fields can be in any order on the line, and irrelevant fields can be skipped. The samples and SNPs to be read must be pre-specified, and define rows and columns of an output object of class "SnpMatrix". This function has been replaced in versions 1.3 and later by the more flexible function read.long.
读取SNP数据,在自由格式为每行调用举办。除了要求每行一个呼叫,有相当大的灵活性。输入字段可以读取多个输入文件,可以在任何命令就行了,可以跳过不相关的领域。样品和SNPs要读取必须是预先指定的,并定义一个类"SnpMatrix"输出对象的行和列。在版本1.3和更高更灵活的功能read.long此功能已被取代。
用法----------Usage----------
read.snps.long(files, sample.id = NULL, snp.id = NULL, diploid = NULL,
fields = c(sample = 1, snp = 2, genotype = 3, confidence = 4),
codes = c("0", "1", "2"), threshold = 0.9, lower = TRUE,
sep = " ", comment = "#", skip = 0, simplify = c(FALSE,FALSE),
verbose = FALSE, in.order=TRUE, every = 1000)
参数----------Arguments----------
参数:files
A character vector giving the names of the input files
一个特征向量,输入文件的名称
参数:sample.id
A character vector giving the identifiers of the samples to be read
一个特征向量,予以提供样品的标识
参数:snp.id
A character vector giving the names of the SNPs to be read
字符向量的SNP的名称予以
参数:diploid
A logical array of the same length as sample.id, required if reading data into an XSnpMatrix rather than a SnpMatrix. This vector gives the expected ploidy for each row. If the same value suffices for all rows, then a scalar may be supplied
逻辑阵列的长度相同sample.id,如果读入的数据XSnpMatrix,而不是一个SnpMatrix需要。这个向量给每一行的预期套数。如果相同的值,足以为所有行,然后一个标量,可提供
参数:fields
A integer vector with named elements specifying the positions of the required fields in the input record. The fields are identified by the names sample and snp for the sample and SNP identifier fields, confidence for a call confidence score (if present) and either genotype if genotype calls occur as a single field, or allele1 and allele2 if the two alleles are coded in different fields
一个名为所需的字段中输入记录的指定位置的元素的整数向量。这些领域确定的名称sample和snp样品和SNP标识符字段,confidence呼叫信心得分(如果存在的话),要么genotype的如果基因型分型作为一个单一的领域发生,或allele1和allele2如果两个等位基因编码在不同的领域
参数:codes
Either the single string "nucleotide" denoting that coding in terms of nucleotides (A, C, G or T, case insensitive), or a character vector giving genotype or allele codes (see below)
无论是单个字符串"nucleotide"表示,核苷酸编码(A,C,G或T给予,不区分大小写),或者一个字符向量基因型或等位基因编码(见下文)
参数:threshold
A numerical value for the calling threshold on the confidence score
一个调用的信心得分阈值的数值
参数:lower
If TRUE, then threshold represents a lower bound. Otherwise it is an upper bound
如果TRUE,则threshold代表的下限。否则它是一个上限
参数:sep
The delimiting character separating fields in the input record
分隔字符分隔字段中输入记录
参数:comment
A character denoting that any remaining input on a line is to be ignored
字符表示,任何剩余的输入行被忽略
参数:skip
An integer value specifying how many lines are to be skipped at the beginning of each data file
一个整数,指定是在每个数据文件的开头跳过多少行
参数:simplify
If TRUE, sample and SNP identifying strings will be shortened by removal of any common leading or trailing sequences when they are used as row and column names of the output SnpMatrix
如果将缩短清除任何共同的前导或尾随序列当他们行和列名的输出TRUESnpMatrix,样品和SNP标识字符串
参数:verbose
If TRUE, a progress report is generated as every every lines of data are read
TRUE如果,进度报告产生的每every数据线读取
参数:in.order
If TRUE, input lines are assumed to be in the correct order (see details)
如果TRUE,输入行被认为是在正确的顺序(见详情)
参数:every
See verbose
看到verbose
Details
详情----------Details----------
If nucleotide coding is not used, the codes argument should be a character array giving the valid codes. For genotype coding of autosomal SNPs, this should be an array of length 3 giving the codes for the three genotypes, in the order homozygous(AA), heterozygous(AB), homozygous(BB). All other codes will be treated as "no call". The default codes are "0", "1", "2". For X SNPs, males are assumed to be coded as homozygous, unless an additional two codes are supplied (representing the AY and BY genotypes). For allele coding, the codes array should be of length 2 and should specify the codes for the two alleles. Again, any other code is treated as "missing" and, for X SNPs, males should be coded either as homozygous or by omission of the second allele.
如果不使用核苷酸编码,codes参数应该是一个字符数组,给予有效的代码。为常染色体单核苷酸多态性的基因编码,这应该是一个长度为3的数组提供三种基因型的代码,在为了纯合子(AA),杂合子(AB),纯合子(BB)。所有其他的代码将被视为“不呼叫”。默认代码"0""1","2"。对于X个SNPs,男性承担被编码为纯合子,除非提供一个额外的两个代码(代表AY和基因型)。等位基因编码,应该是codes数组的长度为2,并应指定两个等位基因的代码。再次,任何其他代码为“失踪”的X个SNPs,治疗,男性应为纯合子或遗漏的第二个等位基因编码。
For nucleotide coding, nucleotides are assigned to the nominal alleles in alphabetic order. Thus, for a SNP with either "T" and "A" nucleotides in the variant position, the nominal genotypes AA, AB and BB will refer to A/A, A/T and T/T.
编码核苷酸,核苷酸分配中按字母顺序排列的名义等位基因。因此,一个“T”型和“A”在不同的位置的核苷酸单核苷酸多态性,基因型AA,AB和BB的名义将参考的A / A和A / T和T / T
Although the function allows for reading into an object of class XSnpMatrix directly, it is usually preferable to read such data as a "SnpMatrix" (i.e. as autosomal) and to coerce it to an object of type "XSnpMatrix" later using as(..., "X.SnpMatrix") or new("XSnpMatrix", ..., diploid=...). If diploid is coded NA for any subject the latter course must be followed, since NAs are not accepted in the diploid argument.
虽然功能为读成一个类的对象允许XSnpMatrix直接,它通常是可取的阅读"SnpMatrix"(即作为常染色体显性遗传此类数据),并强迫它类型的对象的<X >后来使用"XSnpMatrix"或as(..., "X.SnpMatrix")。如果new("XSnpMatrix", ..., diploid=...)编码diploid任何问题必须遵循的,因为后者NASNA参数不接受。
If the in.order argument is set TRUE, then the vectors sample.id and snp.id must be in the same order as they vary on the input file(s) and this ordering must be consistent. However, there is no requirement that either SNP or sample should vary fastest as this is detected from the input. If in.order is FALSE, then no assumptions about the ordering of the input file are assumed and SNP and sample identifiers are looked up in hash tables as they are read. This option must be expected, therefore, to be somewhat slower. Each file may represent a separate sample or SNP, in which case the appropriate .id argument can be omitted; row or column names are then taken from the file names.
如果in.order参数设置TRUE,然后向量sample.id和snp.id必须以相同的顺序,因为它们在输入文件(S)不同,这个顺序必须是一致的。然而,没有任何规定,SNP或样品最快的应该有所不同,因为这是从输入检测。如果in.order是FALSE,然后输入文件排序不假设假设和SNP和样品标识哈希表查找,因为他们正在阅读。此选项必须预期,因此,有些慢。每个文件可以代表一个单独的样品或SNP,在这种情况下,适当.id参数可以省略,然后从文件名,行或列名。
值----------Value----------
An object of class "SnpMatrix" or "XSnpMatrix".
一个对象类"SnpMatrix"或"XSnpMatrix"。
注意----------Note----------
The function will read gzipped files.
该函数将读gzip文件。
If in.order is TRUE, every combination of sample and snp listed in the sample.id and snp.id arguments must be present in the input file(s). Otherwise the function will search for any missing observation until reaching the end of the data, ignoring everything else on the way.
如果in.order是TRUE,列出每个样品和SNP组合sample.id和snp.id参数必须是在输入文件(S)出席。否则功能将寻找任何缺少的观察,直至达到年底的数据,而忽略其他一切方式。
作者(S)----------Author(s)----------
David Clayton <a href="mailto:david.clayton@cimr.cam.ac.uk">david.clayton@cimr.cam.ac.uk</a>
参见----------See Also----------
read.plink,
read.plink
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|