R语言 scrime包 simulateSNPglm()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-29 23:04:55

simulateSNPglm(scrime)
simulateSNPglm()所属R语言包：scrime

                                    Simulation of SNP data
                                       模拟SNP数据

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Simulates SNP data. Interactions of some of the simulated SNPs are then used to specify either a binary or a quantitative response by a logistic or linear regression model, respectively.
模拟SNP数据。然后，一些模拟的单核苷酸多态性的相互作用用于分别指定二进制或MF或线性回归模型的定量响应。

用法----------Usage----------

simulateSNPglm(n.obs = 1000, n.snp = 50, list.ia = NULL, list.snp = NULL,
beta0 = -0.5, beta = 1.5, maf = 0.25, sample.y = TRUE, p.cutoff = 0.5,
err.fun = NULL, rand = NA, ...)

参数----------Arguments----------

参数：n.obs
number of observations that should be generated.
应生成的观测数。

参数：n.snp
number of SNPs that should be generated.
应生成的SNPs的数目。

参数：list.ia
a list consisting of numeric vectors (or values) specifying the genotypes of the SNPs that should be explanatory for the response. Each of these vectors must be composed of some of the numbers -3, -2, -1, 1, 2, 3, where 1 denotes the homozygous reference genotype, 2 the heterozygous genotype, and 3 the homozygous variant genotype, and a minus before these numbers means that the corresponding SNP should be not of this genotype. If, e.g., one of the vectors is given by c(1, -1, -3) and the corresponding vector in list.snp is c(5, 7, 8), then the corresponding interaction used in the regression model to specify the response is
一个列表，由指定的数字矢量（或值）的SNP的基因型，应该是解释的响应。这些矢量中的每一个，必须由一些数字-3，-2，-1，1，2，3，其中，1表示的纯合子的参考基因型，2的杂合基因型，和3纯合子的基因型，和负之前，这些数字表示相应的SNP不应该是这个基因型。如果，例如，由下式给出一个向量c(1, -1, -3)和list.snp是c(5, 7, 8)，那么相应的交互中使用的回归模型来指定响应中的相应向量

(SNP5 == 1) & (SNP7 != 1) & (SNP8 != 3).
(SNP5 == 1) & (SNP7 != 1) & (SNP8 != 3)。

For more details, see Details. Must be specified if list.snp is specified.  If both list.ia and list.snp are NULL, then the interactions shown in the Details section are used.
有关详细信息，请参阅详细信息。必须规定，如果list.snp指定。如果这两个list.ia和list.snp是NULL，然后在“详细资料”节的相互作用。

参数：list.snp
a list consisting of numeric vectors specifying the SNPs that compose the interactions used in the regression model. Each of these vectors must have the same length as the corresponding vector in list.ia, and  must consist of integers between 1 and n.snp, where the integer i corresponds to the ith column of the simulated SNP matrix. If list.ia is specified but not list.snp, then the first n SNPs are used to generate the interactions, where n is the total number of values in list.ia. For the case that both list.ia and list.snp are not specified, see Details.
一个列表，由指定的单核苷酸多态性构成的相互作用的回归模型中使用的数字向量。每个这些向量中的相应向量list.ia必须具有相同的长度，并且必须之间的整数，包括1和n.snp，其中整数i对应i模拟SNP矩阵列。如果list.ia的规定，但不是list.snp，然后第一个n单核苷酸多态性是用来产生相互作用，nlist.ia的总数值。对于这两个list.ia和list.snp不指定，详细的情况。

参数：beta0
a numeric value specifying the intercept of the regression model.
指定一个数值的回归模型的截距。

参数：beta
a non-negative numeric value or vector of the same length as list.ia (i.e.\ one numeric value for each interaction) specifying the parameters in the regression model.
一个非负的数值或向量的长度相同，list.ia（即\数值为每个交互）指定的回归模型中的参数。

参数：maf
either an integer, or a vector of length 2 or n.snp specifying the minor allele frequency. If an integer, all the SNPs will have the same minor allele frequency. If a vector of length n.snp, each SNP will have the minor allele frequency specified in the corresponding entry of maf. If length 2, then maf is interpreted as the range of the minor allele frequencies, and for each SNP, a minor allele frequency will be randomly drawn from a uniform distribution with the range given by maf.
一个整数，或向量的长度为2或n.snp指定次要等位基因频率。如果所有的SNP的整数，将具有相同的次要等位基因频率。如果一个向量的长度n.snp，每个SNP有轻微的等位基因频率在相应的条目，maf。如果长度为2，那么maf被解释为次要等位基因频率的范围，并为每个SNP，未成年人的等位基因频率从均匀分布的maf给定范围内随机抽取。

参数：sample.y
should the values of the response in the logistic regression model be randomly drawn using the probabilities of the respective observations for being a case? If FALSE, then the response value of an observation is 1 if its probability for being a case is larger than p.cutoff, and otherwise the observation is classified as a control (i.e.\ the value of the response is 0). Ignored if err.fun is specified.
Logistic回归模型的响应值应随机抽取各自的意见的情况下使用的概率？如果FALSE，然后观察的响应值是1，如果为是的情况下的概率是大于p.cutoff，和其他的观察被分类作为对照（即\的值的响应是0）。如果忽略err.fun指定。

参数：p.cutoff
a probability, i.e.\ a numeric value between 0 and 1, naming the cutoff for an observation to be called a case if sample.y = FALSE. For details, see sample.y. Ignored if sample.y = TRUE or err.fun is specified.
的概率，即\ 0和1之间的数值，命名为观察到的情况下被称为截止如果sample.y = FALSE。有关详细信息，请参阅sample.y。如果sample.y = TRUE或err.fun指定被忽略。

参数：err.fun
a function for generating the error that is added to the linear model to determine the value of the (quantitative) response. If NULL, a logistic regression model is fitted. If specified, a linear model is fitted. Therefore, this argument is used to differ between the two types of models. The specified function must have as first argument the number of values that should be generated and as output a vector consisting of these values. Further arguments can also be specified because of ... in simulateSNPglm. If, e.g., err.fun = rnorm, then rnorm(n.obs) will be called to generate n.obs observations from a standard normal function.
产生的错误的功能，被添加到的线性模型，以确定的值（定量）响应。如果NULL，logistic回归模型拟合。如果指定了线性模型拟合。因此，使用该参数时，两种模型之间的差异。指定的功能，必须具有作为第一个参数的值应该被产生的数量和组成的向量中这些值作为输出。进一步的论据，也可以指定，因为...simulateSNPglm。如果，例如，err.fun = rnorm，那么rnorm(n.obs)将被称为生成n.obs观察从一个标准的正常功能。

参数：rand
a numeric value for setting the random number generator in a reproducible state.
设置在一个再现状态的随机数发生器的一个数字值。

参数：...
further arguments of the function specified by err.fun.
进一步参数的功能指定err.fun。

Details

详细信息----------Details----------

simulateSNPglm first simulates a matrix consisting of n.obs observations and n.snp SNPs, where the minor allele frequencies of these SNPs are given by maf.
simulateSNPglm先n.obs的意见和n.snp个SNPs，这些SNPs的次要等位基因频率的maf。模拟矩阵组成，

Note that all SNPs are currently simulated independently of each other such that they are unlinked.
请注意，所有的单核苷酸多态性是独立于彼此，使得它们是无关联的当前模拟。

Afterwards, the response is determined by a regression model using the specifications of list.ia, list.snp, beta0 and beta. Depending on whether err.fun is specified or not, a linear or a logistic regression model is used, respectively, i.e.\ the response Y is continuous or binary.
之后，响应是由一个回归模型使用的规格list.ia，list.snp，beta0和beta。根据是否err.fun指定与否，线性或分别使用logistic回归模型，即\ Y是连续或二进制响应。

By default, a logistic regression model
默认情况下，logistic回归模型

logit(Prob(Y = 1)) = beta0 + beta[1] * L1 + beta[2] * L2 + ...
罗吉（PROB（Y = 1））=beta0+beta[1]*L1+beta[2]*L2+ ...

is fitted, since err.fun = NULL.
安装，因为err.fun = NULL。

If both list.ia and list.snp are NULL, then interactions similar to the one considered, e.g., in Nunkesser et al.\ (2007) or Schwender et al.\ (2007) are used, i.e.\
如果没有list.ia和list.snp是NULL，然后相互作用的类似的考虑，例如，在Nunkesser等。\（2007）或Schwender等。\（2007年）中使用，即\

L1 = (SNP6 != 1) & (SNP7 == 1)\ \ \ and
L1=(SNP6 != 1) & (SNP7 == 1) \ \ \

L2 = (SNP3 == 1) & (SNP9 == 1) & (SNP10 == 1),
L2=(SNP3 == 1) & (SNP9 == 1) & (SNP10 == 1)的，

by setting list.ia = list(c(-1, 1), c(1, 1, 1)) and list.snp = list(c(6, 7), c(3, 9, 10)).
通过设置list.ia = list(c(-1, 1), c(1, 1, 1))和list.snp = list(c(6, 7), c(3, 9, 10))。

Using the above model Prob(Y = 1) is computed for each observation, and its value of the response is determined either by a random draw from a Bernoulli distribution using this probability (if sample.y = TRUE), or by evaluating if Prob(Y = 1) > p.cutoff (if sample.y = FALSE).
使用上述模式PROB（Y = 1）来计算对于每个观测，其响应值确定可以通过使用该概率来自伯努利分配一个随机拉伸（如果sample.y = TRUE），或通过评估如果PROB（Y = 1）>如果p.cutoff（sample.y = FALSE）。

If err.fun is specified, then the linear model
如果err.fun指定，则线性模型

Y = beta0 + beta[1] * L1 + beta[2] * L2 + ... + error
Y=beta0+beta[1]*L1+beta[2]*L2+ ... +错误

is used to determine the values of the response Y, where the values for error are given by the output of a call of err.fun.
用于确定的响应的值Y，其中的错误的值由下式给出的呼叫err.fun输出。

值----------Value----------

An object of class simSNPglm consisting of <table summary="R valueblock"> <tr valign="top"><td>x</td> <td> a matrix with n.obs rows and n.snp columns containing the simulated SNP values.</td></tr> <tr valign="top"><td>y</td> <td> a vector of length n.obs composed of the values of the response.</td></tr> <tr valign="top"><td>beta0</td> <td> the value of the intercept.</td></tr> <tr valign="top"><td>beta</td> <td> the vector of parameters.</td></tr> <tr valign="top"><td>ia</td> <td> a character vector naming the explanatory interactions.</td></tr> <tr valign="top"><td>maf</td> <td> a vector of length n.snp composed of the minor allele frequencies.</td></tr> <tr valign="top"><td>prob</td> <td> a vector of length n.obs consisting of the values of Prob(Y = 1) (if err.fun = NULL).</td></tr> <tr valign="top"><td>err</td> <td> a vector of length n.obs composed of the values of the error (if err.fun is specified).</td></tr> <tr valign="top"><td>p.cutoff</td> <td> the value of p.cutoff (if err.fun = NULL and sample.y = FALSE).</td></tr> <tr valign="top"><td>err.call</td> <td> a character string naming the call of the error function (if err.fun is specified).</td></tr> </table>
一个类的对象simSNPglm组成的<table summary="R valueblock"> <tr valign="top"> <TD> x</ TD> <td>一个矩阵n.obs的行和n.snp列，其中包含模拟SNP值。</ TD> </ TR> <tr valign="top"> <TD>y </ TD> <td>一个向量的长度n.obs的值的响应组成。</ TD> </ TR> <tr valign="top"> <TD>beta0 </ TD> <TD>的截距值。 </ TD> </ TR> <tr valign="top"> <TD> beta</ TD> <TD>的参数向量。</ TD> </ TR> <TR VALIGN =“顶部“> <TD> ia </ TD> <td>一个字符命名的解释相互作用的矢量。</ TD> </ TR> <tr valign="top"> <TD>maf / TD> <td>一个向量的长度n.snp组成的次要等位基因频率。</ TD> </ TR> <tr valign="top"> <TD> prob</ TD> <td>一个向量的长度n.obs的值的概率（Y = 1）（如果err.fun = NULL）</ TD> </ TR> <tr valign="top"> <TD> err </ TD> <td>一个向量的长度n.obs的错误的值（如果err.fun指定）组成。</ TD> </ TR> <tr valign="top"> <TD> p.cutoff </ TD> <TD>的价值p.cutoff（err.fun = NULL和sample.y = FALSE）。</ TD> </ TR> <tr valign="top"> <TD> err.call</ TD> <td>一个字符串命名的误差函数的调用（如果err.fun指定）。</ TD> </ TR> </ TABLE>

（作者）----------Author(s)----------

Holger Schwender, <a href="mailto:holger.schwender@udo.edu">holger.schwender@udo.edu</a>

参考文献----------References----------

Nunkesser, R., Bernholt, T., Schwender, H., Ickstadt, K. and Wegener, I.\ (2007). Detecting High-Order Interactions of Single Nucleotide Polymorphisms Using Genetic Programming. Bioinformatics, 23, 3280-3288.
Schwender, H.\ (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.

参见----------See Also----------

simulateSNPs, summary.simSNPglm, simulateSNPcatResponse
simulateSNPs，summary.simSNPglm，simulateSNPcatResponse

实例----------Examples----------

# The simulated data set described in Details.[模拟数据集的详细信息。]

sim1 <- simulateSNPglm()
sim1

# A bit more information: Table of probabilities of being a case[多一点信息：Table of概率的情况下]
# vs. numbers of cases and controls.[与病例组和对照组的数字。]

summary(sim1)

# Calling an observation a case if its probability of being[调用的观察情况，如果它的概率是]
# a case is larger than 0.5 (the default for p.cutoff).[的情况下是大于0.5（默认为p.cutoff）。]

sim2 <- simulateSNPglm(sample.y = FALSE)
summary(sim2)

# If ((SNP4 != 2) & (SNP3 == 1)), (SNP5 ==3) and[如果（（SNP4！= 2）＆（SNP3 == 1）），（SNP5 == 3）和]
# ((SNP12 !=1) & (SNP9 == 3)) should be the three interactions[（（SNP12！= 1）＆（SNP9半套体== 3））应该是三种相互作用]
# (or variables) that are explanatory for the response,[（或变量）的响应，是说明]
# list.ia and list.snp are specified as follows.[被指定为list.ia和list.snp。]

list.ia <- list(c(-2, 1), 3, c(-1,3))
list.snp <- list(c(4, 3), 5, c(12,9))

# The binary response and the data set consisting of [二进制响应和数据集组成的]
# 600 observations and 25 SNPs, where the minor allele[600观测和25个SNP位点，其中未成年人的等位基因]
# frequency of each SNP is randomly drawn from a[每个SNP的频率从随机抽取]
# uniform distribution with minimum 0.1 and maximum 0.4,[分布比较均匀，最低0.1，最高0.4，]
# is then generated by[然后，所产生的]

sim3 <- simulateSNPglm(n.obs = 600, n.snp = 25,
  list.ia = list.ia, list.snp = list.snp, maf = c(0.1, 0.4))
sim3

summary(sim3)

# If the response should be quantitative, err.fun has[如果响应应该是定量的，err.fun]
# to be specified. To use a normal distribution with mean 0[被指定。要使用正态分布，均值为0]
# (default in rnorm) and a standard deviation of 2 [（默认情况下，在rnorm）和一个标准偏差为2]
# as the distribution of the error, call[分布的错误，请致电]

simulateSNPglm(err.fun = rnorm, sd = 2)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册