R语言 GWASTools包 ncdfAddData()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 21:26:22

ncdfAddData(GWASTools)
ncdfAddData()所属R语言包：GWASTools

                                       Write genotypic calls and/or associated metrics to a netCDF file
                                       基因型分型和/或相关指标写入到一个netCDF文件

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Genotypic calls and/or associated quantitative variables (e.g. quality score, intensities) are read from text files and written to an existing netCDF file in which those variables were defined previously.
基因型分型和/或相关的定量变量（如质量得分，强度）从文本文件读取和写入现有的netCDF文件先前定义这些变量。

用法----------Usage----------

ncdfAddData(path = "", ncdf.filename,
         snp.annotation, scan.annotation,
         sep.type, skip.num, col.total, col.nums,
         scan.name.in.file, scan.start.index = 1,
         diagnostics.filename = "ncdfAddData.diagnostics.RData",
         verbose = TRUE)

ncdfAddIntensity(path = "",  ncdf.filename,
               snp.annotation, scan.annotation,
               scan.start.index = 1, n.consecutive.scans = -1,
               diagnostics.filename = "ncdfAddIntensity.diagnostics.RData",
               verbose = TRUE)

ncdfCheckGenotype(path = "", ncdf.filename,
               snp.annotation, scan.annotation,
               sep.type, skip.num, col.total, col.nums,
               scan.name.in.file, check.scan.index, n.scans.loaded,
               diagnostics.filename = "ncdfCheckGenotype.diagnostics.RData",
               verbose = TRUE)

ncdfCheckIntensity(path = "", intenpath = "", ncdf.filename,
               snp.annotation, scan.annotation,
               sep.type, skip.num, col.total, col.nums,
               scan.name.in.file, check.scan.index,
               n.scans.loaded, affy.inten = FALSE,
               diagnostics.filename = "ncdfCheckIntensity.diagnostics.RData",
               verbose = TRUE)

参数----------Arguments----------

参数：path
Path to the raw text files.
原始文本文件的路径。

参数：intenpath
Path to the raw text files containing intensity, if "inten.file" is given in scan.annotation.
原始的文本文件，包含强度，如果“inten.file”在scan.annotation给予的路径。

参数：ncdf.filename
Name of the netCDF file in which to write the data.
netCDF文件，在其中写入数据的名称。

参数：snp.annotation
SNP annotation data.frame containing SNPs in the same order as those in the snp dimension of the netCDF file.  Column names must be "snpID" (integer ID) and "snpName", where snpName matches the snp ids inside the raw genoypic data files.
SNP的注释数据框包含netCDF文件的SNP尺寸相同的顺序中的SNPs。列名必须是“snpID（整数ID）”和“snpName”，其中snpName匹配的SNP IDS内部的原料genoypic数据文件。

参数：scan.annotation
Scan annotation data.frame with columns "scanID" (integer id of scan in the netCDF file), "scanName", (sample name inside the raw data file) and "file" (corresponding raw data file name).
扫描数据框列“scanID”（netCDF文件的扫描整数ID），的“scanName”（样本内的原始数据文件的名称）和“文件”（相应的原始数据文件名）的注释。

参数：sep.type
Field separator in the raw text files.
在原始文本文件的字段分隔符。

参数：skip.num
Number of rows to skip, which should be all rows preceding the genotypic or quantitative data (including the header).
跳过的行数，这应该是所有行前的基因型或定量数据（包括标题）。

参数：col.total
Total number of columns in the raw text files.
原始的文本文件中的列的总数。

参数：col.nums
An integer vector indicating which columns of the raw text  file contain variables for input.  names(col.nums) must be a subset of c("snp", "sample", "geno", "a1", "a2", "qs", "x", "y", "rawx", "rawy", "r", "theta", "ballelefreq", "logrratio"). The element "snp" is the column of SNP ids, "sample" is sample ids, "geno" is diploid genotype (in AB format), "a1" and "a2" are alleles 1 and 2 (in AB format), "qs" is quality score, "x" and "y" are normalized intensities, "rawx" and "rawy" are raw intensities, "r" is the sum of normalized intensities, "theta" is angular polar coordinate, "ballelefreq" is the B allele frequency, and "logrratio" is the Log R Ratio.
整数向量，表示原始文本文件的列包含输入变量。 names(col.nums)必须是一个C子集（“SNP”，“样本”，“基因型”，“A1”，“A2”，“QS”，“X”，“Y” ，“rawx”，“rawy”，“R”，“θ”，“ballelefreq”，“logrratio”）。元素的“单核苷酸多态性”是的SNP ID的列，“样品”是样品ID，“基因型”是二倍体基因型（AB型格式），“A1”和“A2”等位基因1和2（ AB公司），“QS”质量得分，“x”和“Y”是归强度，“rawx”和“rawy”是原始强度，“R”是总和归强度，“ ; THETA“角极坐标”ballelefreq“B等位基因频率，和”logrratio“的logR比率。

参数：scan.name.in.file
An indicator for the presence of sample name within the file. A value of 1 indicates a column with repeated values of the sample name (Illumina format), -1 indicates sample name embedded in a column heading (Affymetrix format) and 0 indicates no sample name inside the raw data file.
样品名称在文件中存在的一个指标。值为1表示一列重复值的样品名称（Illumina的格式），-1表示嵌入在一个列标题（Affymetrix公司），0表示原始数据文件内没有样品名称样品名称。

参数：scan.start.index
A numeric value containing the index of the sample dimension of the netCDF file at which to begin writing.
含指数开始写netCDF文件的样本尺寸的数值。

参数：n.consecutive.scans
The number of consecutive "sampleID" indices for which to write intensity values, beginning at scan.start.index (which equals the number of "ALLELE_SUMMARY" files to process). When n.consecutive.scans=-1, all samples from scan.start.index to the total number will be processed.
连续的“sampleID”指标而写的强度值，开始在scan.start.index（等于“ALLELE_SUMMARY”文件处理）。当n.consecutive.scans = -1，所有从scan.start.index到总数的样品将被处理。

参数：check.scan.index
An integer vector containing the indices of the sample dimension of the netCDF file to check.
整数向量netCDF文件的样本维指数检查。

参数：n.scans.loaded
Number of scans loaded in the netCDF file.
netCDF文件中加载的扫描数目。

参数：affy.inten
Logical value indicating whether Affy intensities are in separate files from quality scores.  If TRUE, must also specify intenpath.
逻辑值，该值指示Affy强度是否在单独的文件质量分数。如果TRUE，还必须指定intenpath。

参数：diagnostics.filename
Name of the output file to save diagnostics.
输出文件的名称保存诊断。

参数：verbose
Logical value specifying whether to show progress information.
逻辑值，指定是否显示进度信息。

Details

详情----------Details----------

These functions read genotypic and associated data from raw text files. The files to be read and processed are specified in the sample annotation. ncdfAddData expects one file per sample, with each file having one row of data per SNP probe. The col.nums argument allows the user to select and identify specific fields for writing to the netCDF file. Illumina text files and Affymetrix ".CHP" files can be used here (but not Affymetrix "ALLELE_SUMMARY" files).
这些功能从原始文本文件阅读的基因型和相关数据。样品中的注解指定文件进行读取和处理。 ncdfAddData希望每一个样本文件，每个文件有一行每SNP的探测数据。 col.nums参数，允许用户选择和确定写入netCDF文件的具体领域。 Illumina的文本文件和Affymetrix公司。“热电联产”的文件，这里可以使用（但不Affymetrix公司“ALLELE_SUMMARY”文件）。

A SNP annotation data.frame is a pre-requisite for this function. It has the same number of rows (one per SNP) as the raw text file and a column of SNP names matching those within the raw text file. It also has a column of integer SNP ids matching the values (in order) of the "snp" dimension of the netCDF file.
一个SNP的注释数据框是此功能的先决条件。它具有原始的文本文件和一个SNP的匹配在原始文本文件的名字列在相同的行数（每单核苷酸多态性）。它也有一个整数SNP ID的列相匹配的“单核苷酸多态性”的netCDF文件的尺寸值（按顺序）。

A sample annotation data.frame is also a pre-requisite. It has one row per sample with columns corresponding to sample name (as it occurs within the raw text file), name of the raw text file for that sample and an integer sample id (to be written as the "sampleID" variable in the netCDF file).
一个样本标注的数据框也是一个先决条件。它有一个排每个样品与样品名称（因为它发生在原始文本文件），该样品的原始文本文件的名称和一个整数样品ID（被视为的“sampleID”变量写在相应的列netCDF文件）。

The genotype calls in the raw text file may be either one column of diploid calls or two columns of allele calls. The function takes calls in AB format and converts them to a numeric code indicating the number of "A" alleles in the genotype (i.e. AA=2, AB=1, BB=0 and missing=-1).
原始的文本文件中的基因型分型可能是二倍体调用一列或两列等位基因检测。该函数在AB格式的呼叫和将其转换为数字代码表示的“A”等位基因的基因型（即机管局= 2，A-B = 1，启用BB = 0，丢失= -1）的数量。

While each raw text file is being read, the functions check for errors and irregularities and records the results in a list of vectors. If any problem is detected, that raw text file is skipped.
虽然每个原始的文本文件被读取时，功能检查错误和违规行为，并记录在向量列表的结果。如果发现任何问题，即原始的文本文件将被跳过。

ncdfAddIntensity uses scan.start.index and n.consecutive.scans to identify the set of integer sample ids for input (from the netCDF file). It then uses the sample annotation data.frame to identify the corresponding sample names and "ALLELE_SUMMARY" file names to read. The "ALLELE_SUMMARY" files have two rows per SNP, one for X (A allele) and one for Y (B allele). These are reformatted to one row per SNP and and ordered according to the SNP integer id in the netCDF file. The correspondence between SNP names in the "ALLELE_SUMMARY" file and the SNP integer ids is made using the SNP annotation data.frame.
ncdfAddIntensity使用scan.start.index和n.consecutive.scans整数样品ID识别输入（从netCDF文件）一套。然后，它使用的样本标注的数据框，以确定相应的样本名称和的“ALLELE_SUMMARY”文件名读取。的“ALLELE_SUMMARY”文件有两行，每单核苷酸多态性，为X（A等位基因）为Y（B等位基因）之一。这些格式化每一个SNP的行，并下令根据netCDF文件中的SNP整数ID。单核苷酸多态性“ALLELE_SUMMARY”文件和SNP整数ID名称之间的对应关系，利用SNP注释数据框。

ncdfCheckGenotype and ncdfCheckIntensity check the contents of netCDF files against raw text files.
ncdfCheckGenotype和ncdfCheckIntensity核对原始文本文件NetCDF文件的内容。

These functions use the ncdf library, which provides an interface between R and netCDF.
这些功能使用ncdf库，它提供了一个研发和NetCDF接口。

值----------Value----------

The netCDF file specified in argument ncdf.filename is populated with genotype calls and/or associated quantitative variables.  A list of diagnostics with the following components is returned. Each vector has one element per raw text file processed.
在参数ncdf.filename指定netCDF文件人口与的基因型分型和/或相关的定量变量。以下组件的诊断列表返回。每个向量有原始文字处理文件的每一个元素。

参数：read.file
A vector indicating whether (1) or not (0) each file was read successfully.
说明（1）或（0）是否成功读取每个文件的一个向量。

参数：row.num
A vector of the number of rows read from each file. These should all be the same and equal to the number of rows in the SNP annotation data.frame.
读取每个文件的行数的向量。这些都应该是相同的，等于在SNP注释数据框的行数。

参数：samples
A list of vectors containing the unique sample names in the sample column of each raw text file. Each vector should have just one element.
包含独特的样品名称，样品列在每个原始文本文件的向量名单。每个向量都应该有一个元素。

参数：sample.match
A vector indicating whether (1) or not (0) the sample name inside the raw text file matches that in the sample annotation data.frame
一个向量，指示是否（1）或（0）内的原始文本文件匹配的样品名称，样品中的注释数据框

参数：missg
A list of vectors containing the unique character string(s) for missing genotypes (i.e. not AA,AB or BB) for each raw text file.
含有缺失基因型为每个原始的文本文件（即没有到AA，AB或BB）的唯一的字符串（S）的向量列表。

参数：snp.chk
A vector indicating whether (1) or not (0) the raw text file has the expected set of SNP names (i.e. matching those in the SNP annotation data.frame).
一个向量，指示是否（1）或（0）原始文本文件有预期的SNP名称集（即匹配那些在SNP注释数据框）。

参数：chk
A vector indicating whether (1) or not (0) all previous checks were successful and the data were written to the netCDF file.
表明是否（1）或（0）以前所有的检查是成功的，数据被写入netCDF文件的一个向量。

ncdfCheckGenotypes returns the following additional list items.
ncdfCheckGenotypes返回以下额外的列表项目。

参数：snp.order
A vector indicating whether (1) or not (0) the snp ids are in the same order in each file.
表明是否（1）或（0）在每个文件中的顺序相同的SNP IDS是一个向量。

参数：geno.chk
A vector indicating whether (1) or not (0) the genotypes in the netCDF match the text file.
说明是否（1）或（0）NetCDF的基因型相匹配的文本文件的一个向量。

ncdfCheckIntensity returns the following additional list items.
ncdfCheckIntensity返回以下额外的列表项目。

参数：qs.chk
A vector indicating whether (1) or not (0) the quality scores in the netCDF match the text file.
说明是否（1）或（0）NetCDF的质量分数相匹配的文本文件的一个向量。

参数：read.file.inten
A vector indicating whether (1) or not (0) each intensity file was read successfully (if intensity files are separate).
说明是否（1）或（0）每个强度文件被成功读取（如果强度文件是分开的）的一个向量。

参数：sample.match.inten
A vector indicating whether (1) or not (0) the sample name inside the raw text file matches that in the sample annotation data.frame (if intensity files are separate).
一个向量表示是否（1）或（0）内的原始文本文件匹配的样品名称，样品标注的数据框（如果强度文件是分开的）。

参数：rows.equal
A vector indicating whether (1) or not (0) the number of rows read from each file are the same and equal to the number of rows in the SNP annotation data.frame (if intensity files are separate).
一个向量，表明是否（1）或（0）从每个文件中读取的行数是相同的，平等的SNP注释数据框的行数（如果强度文件是分开的）。

参数：snp.chk.inten
A vector indicating whether (1) or not (0) the raw text file has the expected set of SNP names (i.e. matching those in the SNP annotation data.frame) (if intensity files are separate).
一个向量，指示是否（1）或（0）原始文本文件有预期的SNP名称集（即匹配那些在SNP注释数据框）（如果强度文件是分开的）。

参数：inten.chk
A vector for each intensity variable indicating whether (1) or not (0) the intensities in the netCDF match the text file.
每个强度指示是否（1）或（0）NetCDF的强度相匹配的文本文件变量的向量。

注意----------Note----------

These functions were modeled after similar code written by Thomas Lumley.
这些功能后，由托马斯·拉姆利书面类似的代码为蓝本。

作者（S）----------Author(s)----------

Cathy Laurie

参见----------See Also----------

ncdf, ncdfCreate, ncdfSubset
ncdf，ncdfCreate，ncdfSubset

举例----------Examples----------

library(GWASdata)

#############[＃＃＃＃＃＃＃＃＃＃＃＃]
# Illumina - genotype file[Illumina公司 - 基因型文件]
#############[＃＃＃＃＃＃＃＃＃＃＃＃]
# first create empty netCDF[首先创建空NetCDF的]
data(illumina_snp_annot)
snpAnnot <- illumina_snp_annot
data(illumina_scan_annot)
scanAnnot <- illumina_scan_annot[1:3,] # subset of samples for testing[测试样本的子集]
ncfile <- tempfile()
ncdfCreate(snpAnnot, ncfile, variables="genotype",
            n.samples=nrow(scanAnnot))

# add data[添加数据]
path <- system.file("extdata", "illumina_raw_data", package="GWASdata")
snpAnnot <- snpAnnot[,c("snpID", "rsID")]
names(snpAnnot) <- c("snpID", "snpName")
scanAnnot <- scanAnnot[,c("scanID", "genoRunID", "file")]
names(scanAnnot) <- c("scanID", "scanName", "file")
col.nums <- as.integer(c(1,2,12,13))
names(col.nums) <- c("snp", "sample", "a1", "a2")
diagfile <- tempfile()
res <- ncdfAddData(path, ncfile, snpAnnot, scanAnnot, sep.type=",",
                  skip.num=11, col.total=21, col.nums=col.nums,
                  scan.name.in.file=1, diagnostics.filename=diagfile)

file.remove(diagfile)
file.remove(ncfile)

#############[＃＃＃＃＃＃＃＃＃＃＃＃]
# Affymetrix - genotype file[Affymetrix公司 - 基因型文件]
#############[＃＃＃＃＃＃＃＃＃＃＃＃]
# first create empty netCDF[首先创建空NetCDF的]
data(affy_snp_annot)
snpAnnot <- affy_snp_annot
data(affy_scan_annot)
scanAnnot <- affy_scan_annot[1:3,] # subset of samples for testing[测试样本的子集]
ncfile <- tempfile()
ncdfCreate(snpAnnot, ncfile, variables="genotype",
            n.samples=nrow(scanAnnot))

# add data[添加数据]
path <- system.file("extdata", "affy_raw_data", package="GWASdata")
snpAnnot <- snpAnnot[,c("snpID", "probeID")]
names(snpAnnot) <- c("snpID", "snpName")
scanAnnot <- scanAnnot[,c("scanID", "genoRunID", "chpFile")]
names(scanAnnot) <- c("scanID", "scanName", "file")
col.nums <- as.integer(c(2,3)); names(col.nums) <- c("snp", "geno")
diagfile <- tempfile()
res <- ncdfAddData(path, ncfile, snpAnnot, scanAnnot, sep.type="\t",
                  skip.num=1, col.total=6, col.nums=col.nums,
                  scan.name.in.file=-1, diagnostics.filename=diagfile)
file.remove(diagfile)

# check[查]
diagfile <- tempfile()
res <- ncdfCheckGenotype(path, ncfile, snpAnnot, scanAnnot, sep.type="\t",
                     skip.num=1, col.total=6, col.nums=col.nums,
                     scan.name.in.file=-1, check.scan.index=1:3,
                     n.scans.loaded=3, diagnostics.filename=diagfile)
file.remove(diagfile)
file.remove(ncfile)

#############[＃＃＃＃＃＃＃＃＃＃＃＃]
# Affymetrix - intensity file[Affymetrix公司 - 强度文件]
#############[＃＃＃＃＃＃＃＃＃＃＃＃]
# first create empty netCDF[首先创建空NetCDF的]
snpAnnot <- affy_snp_annot
scanAnnot <- affy_scan_annot[1:3,] # subset of samples for testing[测试样本的子集]
ncfile <- tempfile()
ncdfCreate(snpAnnot, ncfile, variables=c("quality","X","Y"),
            n.samples=nrow(scanAnnot))

# add sampleID and quality[添加sampleID和质量]
path <- system.file("extdata", "affy_raw_data", package="GWASdata")
snpAnnot <- snpAnnot[,c("snpID", "probeID")]
names(snpAnnot) <- c("snpID", "snpName")
scanAnnot1 <- scanAnnot[,c("scanID", "genoRunID", "chpFile")]
names(scanAnnot1) <- c("scanID", "scanName", "file")
col.nums <- as.integer(c(2,4)); names(col.nums) <- c("snp", "qs")
diagfile <- tempfile()
res <- ncdfAddData(path, ncfile, snpAnnot, scanAnnot1, sep.type="\t",
                  skip.num=1, col.total=6, col.nums=col.nums,
                  scan.name.in.file=-1, diagnostics.filename=diagfile)
file.remove(diagfile)

# add intensity[增加强度]
scanAnnot2 <- scanAnnot[,c("scanID", "genoRunID", "alleleFile")]
names(scanAnnot2) <- c("scanID", "scanName", "file")
diagfile <- tempfile()
res <- ncdfAddIntensity(path, ncfile, snpAnnot, scanAnnot2,
                     diagnostics.filename=diagfile)
file.remove(diagfile)

# check[查]
intenpath <- system.file("extdata", "affy_raw_data", package="GWASdata")
scanAnnot <- scanAnnot[,c("scanID", "genoRunID", "chpFile", "alleleFile")]
names(scanAnnot) <- c("scanID", "scanName", "file", "inten.file")
diagfile <- tempfile()
res <- ncdfCheckIntensity(path, intenpath, ncfile, snpAnnot, scanAnnot, sep.type="\t",
                     skip.num=1, col.total=6, col.nums=col.nums,
                     scan.name.in.file=-1, check.scan.index=1:3,
                     n.scans.loaded=3, affy.inten=TRUE,
                     diagnostics.filename=diagfile)

file.remove(diagfile)
file.remove(ncfile)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 GWASTools包 ncdfAddData()函数中文帮助文档(中英文对照)

浏览过的版块