R语言 maanova包 read.madata()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 23:56:14

read.madata(maanova)
read.madata()所属R语言包：maanova

                                    Read Microarray data
                                       读取芯片数据

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This is the function to read Microarray experiment data from a TAB  delimited text file or matrix object.
这是制表符分隔的文本文件或矩阵对象读取芯片实验数据的功能。

用法----------Usage----------

read.madata(datafile=datafile, designfile=designfile, covM = covM,
arrayType=c("oneColor", "twoColor"),header=TRUE, spotflag=FALSE, n.rep=1, avgreps=0,
  log.trans=FALSE, metarow, metacol, row, col, probeid, intensity, matchDataToDesign=FALSE, ...)

参数----------Arguments----------

参数：datafile
Matrix R object or data file name with path  name as a string.
矩阵R对象或数据文件的名称作为一个字符串与路径名称。

参数：designfile
Matrix or data.frame R object or design file name with path as a string.
矩阵或数据框R对象或设计文件的名字作为一个字符串路径。

参数：covM
Gene specific covariate matrix. Specify this only if you have gene specific covariate matrix.
基因特定的协变量矩阵。指定这个唯一的，如果你有特定基因协矩阵。

参数：arrayType
Specify if it is one or two color array. Default is one color.
指定如果它是一个或两个颜色数组。默认情况下是一种颜色。

参数：header
A logical value indicating when input files (data file, design file or covariate matrix) are TAB delimited file, whether they have column header.
一个逻辑值时，输入文件（数据文件，设计文件或协变量矩阵）制表符分隔的文件，他们是否有列标题。

参数：spotflag
A flag to indicate whether the input file contains the flag for bad spot or not.
一个标志，表示输入文件中是否包含坏点或不标志。

参数：n.rep
An integer to represent the number of replicates.
复制一个整数代表的数量。

参数：avgreps
An integer to indicate whether to average or collapse the replicates or not. 0 means no average; 1 means to take the mean of the replicates; 2 means to take the median of the replicates.
表明是否平均或倍数复制或不是一个整数。 0表示不平均;手段，采取平均的复制手段2位数的复制。

参数：log.trans
A logical value to indicate whether to take log2 transformation on the raw data or not. It is FALSE by default.If this is TRUE, TransformMethod field will be set to "log2".
一个逻辑值，指明是否需要对原始数据或不log2改造。它是由default.If FALSE这是真的，TransformMethod场将设置“的log2”。

参数：metarow
For 2-dye array. The column number for meta row. Default values are 1s.
对于2染料阵列。元的行的列数。默认值是1秒。

参数：metacol
For 2-day array. The column number for meta column. Default values are 1s.
为期2天的阵列。元列的列数。默认值是1秒。

参数：row
For 2-day array. The column number for row. Default value is NA.
为期2天的阵列。行的列数。默认值是NA。

参数：col
For 2-day array. The column number for column. Default value is NA.
为期2天的阵列。列数。默认值是NA。

参数：probeid
The column number storing probe (clone) id. When datafile is matrix R object, it assumes rowname of the data is probe id. If data does not have row name, then 1,2,... is used as a probe id. For TAB delimited file, if probeid is not provided, it assumes that the first column stores the probe id. If you do not have probe id then set probeid = 0.
列号存储探针（克隆）的ID。矩阵R对象时，数据文件，它假定的数据rowname是探针ID。如果数据没有列名，然后1,2，...被用来作为探针ID。为制表符分隔的文件，如果不提供probeid，它假定第一列存储的探针ID。如果你没有探针ID，然后设置probeid = 0。

参数：intensity
The start column number of intensity. For the matrix R object, it assumes intensity starts from the first column and for TAB delimited file, it assumes intensity stars from the second column, as a default.
强度开始列数。矩阵R对象，它假定强度开始从第一列和制表符分隔的文件，它假定从第二列强度明星作为默认。

参数：matchDataToDesign
Defaults to false. If set to TRUE then the datafile column headers (or colnames(datafile) in the case of a matrix) will be matched up to the design file's Array column. This allows you to ignore the input order of array data as long as the datafile's header values can be matched exactly to the designfile's Array values
默认为false。如果设置为TRUE，那么数据文件的列标题（或矩阵的情况下colnames（数据文件））将匹配设计文件的阵列列。这允许你只要忽略阵列数据的输入顺序的数据文件的头值可以完全匹配到designfile的数组值

参数：...
Other gene information in the data file.
在数据文件中的其他基因信息。

值----------Value----------

An object of class madata, which is a list of following components:
一个类的对象madata，这是一个以下组件的列表：

参数：n.gene
Total number of genes in the experiment.
在实验中基因总数。

参数：n.rep
Number of replicates in the experiment, if .
数重复实验，如果。

参数：n.spot
Number of spots for each gene.
每个基因的点的数目。

参数：data
data field. It is either the log2 transformed data (if log.trans=TRUE), or just the original data (if log.trans=FALSE).
数据字段。它要么是转变“的log2数据（如果log.trans = TRUE，），或只是原始数据的（如果log.trans = FALSE时）。

参数：n.array
Number of arrays in the experiment.
在实验中的阵列。

参数：n.dye
Number of dyes.
染料的数目。

参数：flag
A matrix for spot flag. Each element corresponding to one spot. 0 means normal spot, all other values mean bad spot.
一个矩阵点标志。每个元素对应一个点。 0表示正常点，所有其他值意味着坏点。

参数：metarow
Meta row for each spot.
每个点的Meta行。

参数：metacol
Meta column for each spot.
每个点的Meta列。

参数：row
Row for each spot.
每个点的行。

参数：col
Column for each spot.
列的每个点。

参数：ArrayName
A list of strings to represent the names of intensity data.
一个字符串列表，代表强度数据的名称。

参数：design
An object to represent the experimental design.
一个对象来表示实验设计。

参数：Others
Other experiment information listed in the data file and specified by user.
数据文件中列出的其他实验信息，由用户指定。

准备数据文件----------Preparing data file----------

Before using the package, user need to prepare the input data file.
之前使用的软件包，用户需要准备输入数据文件。

1) The data file can be a matrix type R object, such as the output of exprs() from array or beadarray package. It is assumed that the intensity is started from the first column and row name is probe ID. Otherwise, column number containing probe ID and intensity should be specified.
1）数据文件可以是一个矩阵式的R对象，如输出exprs（从数组或beadarray包）。据推测，强度开始从第一列和行的名字是探针ID。否则，应指定列数含探针ID和强度。

2) The data file can be a TAB delimited text file. In this file, each row corresponds to a gene. In the columns, you can put some gene specific information, e.g., the Probe ID, Gene Bank ID, etc. and the grid location of the spot. But most importantly you need to put the intensity data after that. Most of the Microarray gridding software generate one file for each slide. At this point, you need to manually combine them into the data file. You need to decide which data you want to use in analysis, e.g., mean versus median, background subtracted or not, etc. For N-dye array, your intensity data should have N columns for each array.  These N columns need to be adjacent to each other. You can put the spot flag as a column after intensity data for each array. (Note that if you have flag, you will have N+1 columns data for each array.) If you have replicates, replicated measurements of the same probe (clone) on the same array should appear in adjacent rows.
2）数据文件可以是一个制表符分隔的文本文件。在这个文件中，每一行对应一个基因。列中，你可以把一些特定基因的信息，例如，探针ID，基因银行的ID等，当场格位置。但最重要的是，你需要把后的强度数据。大多数网格软件的芯片生成一个文件为每个幻灯片。在这一点上，你需要手动合并到数据文件。你需要决定你要使用在分析数据，例如，平均比中位数，背景扣除与否，对于染料的N-阵列等，强度数据应为每个阵列有N列。这些N列的需要，是彼此相邻。作为列后，为每个阵列的强度数据，你可以把现场的标志。（请注意，如果您有标志，你将有N +1个列每个阵列的数据。）如果您有重复，同一探针复制（克隆）测量同一阵列上应该会出现在相邻行。

For example, for a 2-dye cDNA array, you have four slides scanned by Gene Pix and you get four files. First you open your favorite Spread Sheet editor, e.g., MS Excel. Copy your probe ID and Cluster ID to the first 2 columns. Then open one of the files generated by Gene Pix, copy the grid location into next 4 columns (you only need to do this once because they are all the same for four slides). Then for all four files, copy the two columns of foreground median value (if you want to use it) and one column of flag to the file in the order of Cy5, Cy3, flag. Then select the whole file and row sort it according to probe ID. Save the file as tab delimited text file and you are done.
例如，对于一个2染料的cDNA阵列，你有四个基因PIX扫描幻灯片，你会得到四个文件。首先打开您最喜爱的电子表格编辑器，如MS Excel的。将探针ID的第2列和聚类ID。然后打开一个基因PIX生成的文件，复制到明年4列（你只需要执行一次，因为他们都是相同的四张幻灯片）的网格位置。所有四个文件，然后复制前景的中值的两列（如果你想用它）和Cy5的顺序，Cy3标记，标志旗，以文件的一列。然后选择整个文件和行排序，根据探针ID。将文件保存为制表符分隔的文本文件和你做。

The data file must be "full", that is, all rows have to have the same number of fields. When you have missing data in your datafile, you need to check the data or use  fill.missing to fill in missing variable.
数据文件必须是“完整”，也就是说，所有行必须有相同的字段数。当你在你的数据文件丢失的数据，你需要检查的数据，或使用fill.missing，以填补缺失的变量。

Sometimes leading and trailing TAB in the text file will bring problems, depends on the operating system. So user need to be careful about that.
有时开头和结尾的文本文件中的TAB会带来问题，取决于操作系统。因此，用户需要小心。

准备设计文件----------Preparing design file----------

Design file can be data.frame or matrix R object or TAB delimited text file. Number of rows of this file equals number of arrays times N (the number of dyes) (plus one for column header, if design file is a TAB delimited file and header = T). The row of design file *MUST* be organized by the order of datafile unless the matchDataToDesign parameter is set to TRUE. For example, if the datafile stores the intensity from array1, array11, array2,..., then the row of designfile must follow this order. Number of columns of this file depends on the experimental design. For example, you can have "Strain", "Diet", "Sex", etc. in your design file. You *MUST* have a column named "Array" in the design file. For two-color array, in addition to the "Array" column, you must have "Sample" and "Dye" columns (case sensitive) in the design file. "Sample" should be integers representing biological individuals. Reference samples should have Sample number to be zero(0). Reference sample will always be treated as fixed factor in mixed model and it will not be involved in any test.
设计文件可以是数据框或矩阵R对象或制表符分隔的文本文件。这个文件的行数等于阵列次数N（染料）（加一个列标题，如果设计文件是制表符分隔的文件和头= T）的数量。设计文件*行*必须通过数据文件的顺序组织，除非matchDataToDesign参数设置为TRUE。例如，如果数据文件存储在array1的强度，array11，数组，...，然后designfile行必须按照这个顺序。这个文件中的列的数量取决于实验设计。例如，您可以在您的设计文件有“应变”，“饮食”，“性别”等。 *你*必须有一列名为“阵列”在设计文件中。两色阵列，除了“阵列”列，你必须有“样品”和“染料”列在设计文件（大小写敏感）。 “样品”，应该是代表生物个体的整数。参考样本应该有样本数是零（0）。参考样品将永远被视为固定的因素，在混合模式，它不会参与任何测试。

You must NOT have "Spot", "Label" and "covM" columns. They are reserved for spotting, labeling and covariance effects.
你决不能有“点”，“标签”和“covM”列。他们发现，标签和方差的影响保留。

Note that you DO NOT have to use all factors in design file. You can put all factors in design file but turn them on/off in formula in fitmaanova.
请注意，您必须在设计文件中使用的所有因素。你可以把设计文件中的所有因素，但他们打开公式中的开/关fitmaanova。

准备协文件----------Preparing covariate file----------

If you have array specific covariate, it should be included in the design matrix. If you have gene specific covariate, you need to prepare matrix type R object or TAB delimited text file, "covM". The size of "covM" equals to the size of intensity data (and TAB delimited text file must have column header if header = T, but NO row name). Specify covM only if you have gene specific covariate variable. Covariate variable must be a numeric value and need to be specified in the fitmaanova.
如果你有数组特定的协变量，它应包括在设计矩阵。如果您有特定基因的协变量，你需要准备矩阵R型对象或制表符分隔的文本文件，“covM”。大小等于“covM”强度数据的大小（制表符分隔的文本文件中必须有列标题，如果头= T，但没有行名称）。指定covM只，如果你有基因特定的协变量。协变量的变量必须是一个数值，并需要指定在fitmaanova。

作者（S）----------Author(s)----------

Hao Wu

举例----------Examples----------

# note that .CEL files are not distributed with the package, thus following[需要注意的是CEL文件不包分配，因此，继]
# code does not work. This shows how to read data from affy (or beadarray)[代码不起作用。这表明从affy读取数据（或beadarray）]
# package, when TAB delimited design file is ready.[包时，制表符分隔的设计文件已准备就绪。]

## Not run: [＃无法运行：]
library(affy)
beforeRma <- ReadAffy()
rmaData <- rma(beforeRma)
datafile <- exprs(rmaData)
abf1 <- read.madata(datafile=datafile,designfile="design.txt")

# make and read designfile (data.frame type R object) from R[从R designfile（数据框式R对象和阅读）]
design.table <- data.frame(Array=row.names(pData(beforeRma)));
Strain <- rep(c('Aj', 'B6', 'B6xAJ'), each=6)
Sample <- rep(c(1:9), each=2)
designfile <- cbind(design.table, Strain, Sample)
abf1 <- read.madata(datafile, designfile=designfile)

# read in a TAB delimited file with spot flag - for two color array[读在当场标志的制表符分隔的文件 - 两种颜色阵列]
# HAVE TO SPECIFY that the data is from two color array[必须指定的数据是从两种颜色阵列]
kidney.raw <- read.madata("kidney.txt", designfile="kidneydesign.txt",
metarow=1, metacol=2, col=3, row=4, probeid=6,
intensity=7, arrayType='twoColor',log.trans=T, spotflag=T)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 maanova包 read.madata()函数中文帮助文档(中英文对照)

浏览过的版块