R语言 lumi包 lumiR()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 23:47:51

lumiR(lumi)
lumiR()所属R语言包：lumi

                                       Read in Illumina expression data
                                       在Illumina的表达数据的读取

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Read in Illumina expression data. We assume the data was saved in a comma or tab separated text file.
阅读Illumina的表达数据。我们假设数据被保存在一个逗号或制表符分隔的文本文件。

用法----------Usage----------

lumiR(fileName, sep = NULL, detectionTh = 0.01, na.rm = TRUE, convertNuID = TRUE, lib.mapping = NULL, dec = '.', parseColumnName = FALSE, checkDupId = TRUE,
QC = TRUE, columnNameGrepPattern = list(exprs='AVG_SIGNAL', se.exprs='BEAD_STD', detection='DETECTION', beadNum='Avg_NBEADS'),
inputAnnotation=TRUE, annotationColumn=c('ACCESSION', 'SYMBOL', 'PROBE_SEQUENCE', 'PROBE_START', 'CHROMOSOME', 'PROBE_CHR_ORIENTATION', 'PROBE_COORDINATES', 'DEFINITION'), verbose = TRUE, ...)

参数----------Arguments----------

参数：fileName
fileName of the data file
数据文件的文件名

参数：sep
the separation character used in the text file.
在文本文件中使用的分隔符。

参数：detectionTh
the p-value threshold of determining detectability of the expression. See more details in lumiQ
p值的阈值确定探测的表达。看到更多的细节lumiQ

参数：na.rm
determine whether to remove NA
确定是否要删除不适用

参数：convertNuID
determine whether convert the probe identifier as nuID
确定是否转换探针标识作为nuID

参数：lib.mapping
a Illumina ID mapping package, e.g, lumiHumanIDMapping, used by addNuID2lumi
1 Illumina的ID映射包，例如，lumiHumanIDMapping，使用addNuID2lumi

参数：dec
the character used in the file for decimal points.
字符用于在小数点文件。

参数：parseColumnName
determine whether to parse the column names and retrieve the sample information (Assume the sample information is separated by "\_".)
确定是否解析列名和检索样品信息（假设样品信息“\ _”分隔。）

参数：checkDupId
determine whether to check duplicated TargetIDs or ProbeIds. The duplicated ones will be averaged.
确定是否到检查的重复TargetIDs或ProbeIds，。将重复的平均值。

参数：QC
determine whether to do quality control assessment after read in the data.
决定是否做质量控制评估后，在数据读取。

参数：columnNameGrepPattern
the string grep patterns used to determine the slot corresponding columns.
字符串的grep模式用来确定槽的相应列。

参数：inputAnnotation
determine whether input the annotation information outputted by BeadStudio if exists.
确定是否输入注释信息BeadStudio输出如果存在。

参数：annotationColumn
the column names of the annotation information outputted by BeadStudio
列名的注释信息输出由BeadStudio

参数：verbose
a boolean to decide whether to print out some messages
一个布尔值，决定是否打印出一些消息

参数：...
other parameters used by read.table function
read.table功能使用的其他参数

Details

详情----------Details----------

The function can automatically determine the separation character if it is Tab or comma. Otherwise, the user should specify the separator manually. If the annotation library is provided, the Illumina Id will be replaced with nuID, which is used as the index Id for the lumi annotation packages. If the annotation library is not provided, it will try to directly convert the probe sequence (if provided in the BeadStudio output file) as nuIDs.
该功能可以自动确定如果它是制表符或逗号分隔符。否则，用户应当手动指定的分隔。如果提供的注释库，Illumina的ID将被替换，这是nuID LUMI注解包索引ID。如果未提供注释库，它会尝试直接转换为nuIDs探针序列（如果在提供的BeadStudio输出文件）。

The parameter "columnNameGrepPattern" is designed for some advanced users. It defines the string grep patterns used to determine the slot corresponding columns. For example, for the "exprs" slot in LumiBatch object, it is composed of the columns whose name includes "AVG\_SIGNAL". In some cases, the user may not want to read the "detection" and "beadNum" related columns to save memory. The user can set the "detection" and "beadNum" as NA in "columnNameGrepPattern". If the 'se.exprs' is set as NA or the corresponding columns are not available, then lumiR will create a ExpressionSet object instead of LumiBatch object.
参数“columnNameGrepPattern”是专为一些高级用户。它定义字符串的grep模式，用来确定槽的相应列。例如，对于的“exprs”在LumiBatch对象插槽，它是由列其名称中包含“的AVG \ _SIGNAL”。在某些情况下，用户可能不希望阅读的“检测”和“beadNum”相关的列，以节省内存。用户可以设置“检测”，在“columnNameGrepPattern”和的“beadNum”为NA。如果“se.exprs”设置为NA或不提供相应的列，，然后lumiR将创建而不是LumiBatch对象ExpressionSet的对象。

The parameter "parseColumnName" is designed to parse the column names and retrieve the sample information. We assume the sample information is separated by "\_" and the last element after "\_" is the sample label (sample names of the LumiBatch object).  If the parsed sample labels are not unique, then the entire string will be used as the sample label. For example: "1881436055\_A\_STA 27aR" is included in one of the column names of BeadStudio output file. Here, the program will first treat "STA 27aR" as the sample label. If it is not unique across the samples, "1881436055\_A\_STA 27aR" will be the sample label. If it is still not unique, the program will report warning messages. All the parsed information is kept in the phenoData slot. By default, "parseColumnName" is FALSE. We suggest the users use it only when they know what they are doing.
设计参数“parseColumnName”解析列名和检索的样本资料。我们假设样本资料分开“\ _”和“\ _”之后的最后一个元素是样品标签（的LumiBatch对象的样本名）。如果分析的样品标签不是唯一的，那么整个字符串将被用来作为样品的标签。例如：“1881436055 \ _A \ _STA 27aR”包括在的BeadStudio输出文件中的列名之一。在这里，程序将首先把样品标签“的STA 27aR”。如果不是在样品中是唯一的“，”1881436055 \ _A \ _STA 27aR将样本标签。如果它仍然是不是唯一的，该计划将报告警告消息。所有的分析资料保存在phenoData插槽。默认情况下，“parseColumnName”是假的。我们建议用户使用它，只有当他们知道自己在做什么。

Current version of lumiR can adaptively read the output of BeadStudio Verson 1 and 3. The format Version 3 made quite a few changes comparing with previous versions. One change is the detection value. It was called detectable when the detection value is close to one for Version 1 format. However, the detection value became a p-value in the Version 3. As a result, the detectionTh is automatically changed based on the version. The detectionTh 0.01 for the Version 3 will be changed as the detectionTh 0.99 for Version 1. Another big change is that Version 3 separately output the control probe (gene) information and a "Samples Table". As a result, the controlData slot in LumiBatch class was added to keep the control probe (gene) information, and a QC slot to keep the quality control information, including the "Sample Table" output by BeadStudio version 3.
lumiR当前版本可以自适应读BeadStudio VERSON 1和3的输出。格式第3版相当与以前的版本相比，一些变化。变化之一是检测值。它被称为检测时的检测值是接近一个版本1格式。然而，检测值成为在第3版的p值。作为一个结果，detectionTh自动改变的基础上的版本。第3版0.01 detectionTh将改变作为detectionTh 0.99版本1。另一大变化是，第3版分别输出控制探针（基因）的信息和“样品表”。作为一个结果，在LumiBatch类controlData插槽添加到保持控制探针（基因）的信息，以及QC槽，以保持质量控制信息，包括“示例表”由BeadStudio第3版的输出。

The recent version of BeadStudio can also output the annotation information together with the expression data. In the users also want to input the annotation information, they can set the parameter "inputAnnotation" as TRUE. At the same time, they can also specify which columns to be inputted by setting parameter "annotationColumn". The BeadStudio annotation columns include: SPECIES, TRANSCRIPT, ILMN\_GENE, UNIGENE\_ID, GI, ACCESSION, SYMBOL, PROBE\_ID, ARRAY\_ADDRESS\_ID, PROBE\_TYPE, PROBE\_START, PROBE\_SEQUENCE, CHROMOSOME, PROBE\_CHR\_ORIENTATION, PROBE\_COORDINATES, DEFINITION, ONTOLOGY\_COMPONENT, ONTOLOGY\_PROCESS, ONTOLOGY\_FUNCTION, SYNONYMS, OBSOLETE\_PROBE\_ID. As the annotation data is huge, by default, we only input: ACCESSION, SYMBOL, PROBE\_START, CHROMOSOME, PROBE\_CHR\_ORIENTATION, PROBE\_COORDINATES, DEFINITION. This annotation information is kept in the featureData slot of ExpressionSet, which can be retrieved using pData(featureData(x.lumi)), suppose x.lumi is the LumiBatch object. As some annotation information may be outdated. We recommend using Bioconductor annotation packages to retrieve the annotation information.
最新版本的BeadStudio也可以输出的注释信息，共同表达数据。在用户还需要输入注释信息，就可以将该参数设置为TRUE“inputAnnotation”。同时，他们还可以指定哪些列要输入参数设置“annotationColumn”。 BeadStudio注释列包括：种，成绩单，ILMN \ _GENE，UniGene的\的_ID，胃肠道，加入，象征，探针\的_ID，阵列\ _ADDRESS \的_ID，探针\ _Type，探针\ _start，探针\ _SEQUENCE，染色体，探针\ _CHR \ _ORIENTATION，探针\ _COORDINATES，定义，本体\ _COMPONENT，本体\ _PROCESS，本体\ _FUNCTION，同义词，过时\ _PROBE \的_ID。作为注释的数据是巨大的，默认情况下，我们只输入：加入符号，探针\ _start，染色体，探针\ _CHR \ _ORIENTATION，探针\ _COORDINATES，定义。此批注的信息保持在ExpressionSet featureData插槽，可以检索使用PDATA（featureData（x.lumi）），假设x.lumi是LumiBatch对象。正如一些注释信息可能会过时。我们建议使用的Bioconductor注释包的注释信息检索。

The BeadStudio may output either STDEV or STDERR (standard error of the mean) columns. As the variance stabilization (see vst function) requires the information of the standard deviation instead of the standard error of the mean, the value correction is required. The lumiR function will automatically check whether the BeadStudio output file includes STDEV or STDERR columns. If it is STDERR columns, it will correct STDERR as STDEV. The corrected value will be x * sqrt(N), where x is the STDERR value (standard error of the mean), N is the number of beads corresponding to the probe. (Thanks Sebastian Balbach and Gordon Smyth kindly provided this information.). This correction was previous implemented in the lumiT function.
BeadStudio可能输出要么STDEV或STDERR（平均值的标准误差）列。作为方差稳定（见vst功能）需要的信息，而不是平均值的标准误差的标准偏差值校正是必需的。 lumiR功能会自动检查是否的BeadStudio输出文件包括STDEV或STDERR列。如果是STDERR的列，它会纠正STDEV的STDERR的。修正值将X * SQRT（N），其中x是STDERR的值（平均值的标准误差），N是珠相应的探针数量。（的感谢塞巴斯蒂安Balbach和戈登·史密斯请提供此信息。）。这修正了以前的实施在lumiT功能。

值----------Value----------

return a LumiBatch object
返回LumiBatch对象

作者（S）----------Author(s)----------

Simon Lin, Pan Du

参见----------See Also----------

LumiBatch, addNuID2lumi
LumiBatch，addNuID2lumi

举例----------Examples----------

## specify the file name[＃指定文件名。]
# fileName <- 'Barnes_gene_profile.txt' # Not Run[文件名< - “Barnes_gene_profile.txt＃不能运行]
## load the data[＃加载数据。]
# x.lumi <- lumiR(fileName)[< -  lumiR x.lumi（文件名）]

## load the data with empty detection and beadNum slots[＃加载空检测和beadNum插槽的数据]
# x.lumi <- lumiR(fileName, columnNameGrepPattern=list(detection=NA, beadNum=NA))[< -  lumiR x.lumi（文件名，columnNameGrepPattern =列表（检测= NA，beadNum = NA））]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 lumi包 lumiR()函数中文帮助文档(中英文对照)

浏览过的版块