找回密码
 注册
查看: 2304|回复: 0

R语言 CGEN包 snp.logistic()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 14:29:04 | 显示全部楼层 |阅读模式
snp.logistic(CGEN)
snp.logistic()所属R语言包:CGEN

                                        Logistic regression analysis for a single SNP
                                         Logistic回归分析单个SNP

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Performs logistic regression including a particular SNP (G) and a set of covariates (X) that could include environmental covariates or/and other genetic variables. Included are three analysis options: (i) Unconstrained maximum-likelihood: This  method is equivalent to prospective logistic regression analysis and corresponds  to maximum-likelihood analysis of case-control data allowing the joint distribution of all the factors (the SNP of interest and all other covariates) of the model to be completely unrestricted (non-parametric) (ii) Constrained maximum-likelihood: This method performs maximum-likelihood analysis of case-control data under the assumption of HWE and indepenence between the SNP and  other factors of the model.The analysis allows the assumptions of HWE and independence to be   valid only conditional on certain stratification variables (S), such as self reported ethnicity or principal compoenets of population stratification. (iii) Empirical-Bayes: This method uses an  empirical-Bayes type "shrinkage estimation" technique to trade-off bias and variance between the constrained and unconstrained maximum-likelihood estimators.
执行包括一个特定的SNP(G)和(X)的一组共变数,其中可能包括环境变项和/或其他遗传变量的logistic回归。包括三个分析选项:(一)无约束最大似然:这种方法是相当于准logistic回归分析和对应的可能性最大允许的情况下控制所有因素的联合分布(数据分析感兴趣的SNP和所有其他变模型是完全不受限制的(非参数)(ii)条)约束最大似然:此方法执行的HWE和indepenence之间的SNP和其他因素的假设下最大可能的情况下控制数据的分析模型,分析允许HWE和独立的假设是只适用于一定的分层变量(S),如自我报告种族或群体分层的主要compoenets条件。 (三)实证贝叶斯:此方法使用经验贝叶斯型“收缩估计”技术贸易关之间的偏差和方差约束和无约束的最大似然估计。


用法----------Usage----------


snp.logistic(data, response.var, snp.var, main.vars=NULL, int.vars=NULL,
             strata.var=NULL, op=NULL)



参数----------Arguments----------

参数:data
Data frame containing all the data. No default.  
数据框包含的所有数据。没有默认值。


参数:response.var
Name of the binary response variable coded  as 0 (controls) and 1 (cases). No default.  
编码为0(对照组)和1个(件)的二进制响应变量的名称。没有默认值。


参数:snp.var
Name of the genotype variable coded as 0, 1, 2 (or 0, 1).  This variable will be included as a main effect in the model.  No default.  
编码为0,1,2(0,1)基因型变量的名称。这个变量将被包括在模型中的主要作用。没有默认值。


参数:main.vars
Character vector of variable names or a formula for all covariates of interest  which need to be included in the model as main effects. The default is NULL, so that only the SNP variable will be included as a main effect in the model.  
特征向量的变量名或利益的需要为主要的影响包括在模型中的所有协变量的公式。默认是空的,所以只有单核苷酸多态性变量将被列入作为模型中的主要作用。


参数:int.vars
Character vector of variable names or a formula for all covariates of interest that will interact with the SNP variable. The default is NULL, so that no interactions will be in the model.
特征向量的变量名或感兴趣的所有协变量的公式,将与SNP的变量。默认是空的,所以没有交互模型。


参数:strata.var
Name of the stratification variable or a formula (see details for more). The default is NULL (1 stratum).  
分层变量的名称或公式(见详情)。默认值为NULL(1层)。


参数:op
A list with names genetic.model, reltol, maxiter, and optimizer (see details). The default is NULL.  
名genetic.model,reltol,maxiter,optimizer(见详情)名单。默认值为NULL。


Details

详情----------Details----------

The data is first fit using standard logistic regression. The estimated parameters from the standard logistic regression are then used as the initial estimates for the constrained model. For this, the optim() function is used to compute the maximum likelihood estimates and the estimated covariance matrix. The empirical Bayes estimates are then computed by combining both sets of estimated parameters (see below). The "strata" option, that is relevent for the  CML and EB method, allows  the assumption of HWE and G-X independence to be valid only conditional on a given set of other factors. If a variable name is provided, then the unique level of the variable will be used to define categorical strata.  If a formula object is given,  then it is assumed that the formula describes a parametric model for variation of allele frequency of the SNP as a function of the variables included in the formula. No assumption is made about the relationship between X and factor in S. Typically, S would include self reported ethnicity,  study, center/geographic region and principal components of population stratification. The CML method with the "strata" defined by principal compoenents of population stratification can be viewed as a generalization of adjusted case-only method described in Bhattacharjee et al. (2010). More details of the individual methods follow.
数据首先符合使用标准的logistic回归。从标准的logistic回归估计参数,然后作为约束模型的初步估计。对于这一点,optim()函数用于计算的最大似然估计和估计的协方差矩阵。经验Bayes估计,然后计算参数估计相结合的两套(见下文)。 “阶层”的选项,是为慢性粒单元白血病和EB方法洪鸿,杨凤池,允许HWE和GX独立的假设是只适用于其他因素给定的条件。如果一个变量的名称,然后独特水平的变量将被用来定义类别的阶层。如果给出一个公式对象,那么它假定该公式描述了参数化模型作为公式中的变量的函数的SNP等位基因频率的变化。没有假设是关于X和技术因素之间的关系,通常情况下,S将包括自我报告种族,研究,中心/GEO区域和人口分层的主要组成部分。人口分层的主要compoenents定义为“阶层”的慢性粒单元白血病的方法可以被视为一个调整的情况下只尔吉等描述的方法的推广。 (2010年)。按照个别方法的更多细节。

Definition of the likelihood under the gene-environment independence assumption: <br>
基因与环境的独立性假设:参考下定义的可能性

Let D = 0, 1 be the case-control status, G = 0, 1, 2 denote the SNP genotype, S  denote the stratification variable(s) and X denote the set of all other factors to be included in the regression model. Suppose the risk of the disease (D), given G, X and S can be described by a logistic regression model of the form
设D = 0,1的情况下,控制状态,G = 0时,1,2表示SNP基因型,S表示分层变量(S)和X分别被列入回归模型中的所有其他因素。假设(四)疾病的风险,给予G,X和S可谓logistic回归模型的形式

where Z is the entire design matrix (including G, X, possibly S and their interaction with X) and beta is the vector of associated regression coefficients.  The CML method assumes Pr(G|X,S)=Pr(G|S), i.e., G and X are conditionally independent given S. The current implementation of the CML method also assume the SNP genotype frequency follows HWE given S=s, although this is not necessary in general. Thus, if f_s denotes the allele frequency given S=s, then
其中Z是整个设计矩阵(包括G,X,可能是S和他们与X的相互作用)和beta是相关的回归系数向量。慢性粒单元白血病的方法假定PR(七| X,S)= PR(七),即G和X是有条件的独立与慢性粒单元白血病的方法目前实施承担的SNP基因型频率如下HWE给予S = S ,虽然这是没有必要的,一般。因此,如果f_s表示S = S等位基因频率,然后

If xi_s = log(f_S/(1 - f_s)), then
如果xi_s = log(f_S/(1 - f_s)),然后

and


Chatterjee and Carroll (2005) showed that under the above constraints, the maximum-likelihood estimate for the beta coefficients under case-control design can be obtained based on a simple conditional likelihood of the form
查特吉和Carroll(2005)表明,为的beta的根据病例对照设计系数的最大似然估计,上述限制下,可以得出一个简单的形式有条件的可能性

where the sum is taken over the 6 combinations of d and g and theta_s(d,g) = d*alpha` + d*Z*beta + I(g=1)*log(2) + g*xi_s. If S is a single categorical variable, then a separate xi_s is allowed for each S=s. If S is specified using a formula object, then it is assumed  xi_s=V_s*gamma, where V_s is the design matrix associated with the formula object and gamma is the vector of stratification parameters.  If for example, S is specified as "strata=~PC1+PC2+...PCK" where PCk's denote principal components of population stratification, then it is assumed that the allele frequency of the SNP varies in directions of the different principal components in a logistic linear fashion.
总和被接管的6个组合D和Gtheta_s(d,g) = d*alpha + d*Z*beta + I(g=1)*log(2) + g*xi_s.如果S是一个单一的类别变量,然后一个单独的xi_s允许每个S = S。如果指定了S使用一个公式对象,则假定xi_s=V_s*gamma,其中V_s公式对象和相关的设计矩阵gamma是分层的参数向量。如果例如,S被指定为“阶层=~PC1的+的PC2 + ... PCK的”PCK的表示人口分层的主要组成部分,那么它是假设的SNP等位基因频率在不同的主要组成部分的方向变化MF的线性方式。

Definition of the empirical bayes estimates: <br> Let beta_UML be the parameter estimates from standard logistic regression, and let eta = (beta_CML, xi_CML)  be the estimates under the gene-environment independence assumption.  Let psi = beta_UML - beta_CML, and phi^2 be the vector of variances of beta_UML. Define diagonal matrices of weights to be  W1 = diag(psi^2/(psi^2 + phi^2)) and W2 = diag(phi^2/(psi^2 + phi^2)),  where  psi^2 is the elementwise product of the vector  psi. Now, the empirical bayes parameter estimates are
参考的经验Bayes估计的定义:让beta_UML标准logistic回归参数估计,让eta = (beta_CML, xi_CML)是基因与环境的独立性假设下的估计。让psi = beta_UML - beta_CML,phi^2是对beta_UML差异的向量。定义对角矩阵的重量是W1 = diag(psi^2/(psi^2 + phi^2))和W2 = diag(phi^2/(psi^2 + phi^2)),其中psi^2是矢量psi的elementwise的产品。现在,经验Bayes参数估计

For the estimated covariance matrix, define the diagonal matrix
为估计的协方差矩阵,定义对角矩阵

where again the exponentiation is the elementwise product of the vectors. If I is the pxp identity matrix and we define the px2p matrix C = (A, I - A), then  the estimated covariance matrix is
在那里再次幂是向量的elementwise产品的。如果I是PXP身份的矩阵,我们定义px2p矩阵C = (A, I - A),那么估计协方差矩阵

The covariance term COV(beta_UML, beta_CML) is obtained using an influence function method (see Chen YH, Chatterjee N, and Carroll R. for details about  the above formulation of the empirical-Bayes method). <br>
协COV(beta_UML, beta_CML)利用影响函数法(见上面制定的经验Bayes方法的详细信息,陈云浩,查特吉N和卡罗尔河)。参考

Options list: <br>
选项列表:参考

Below are the names for the options list op. All names have default values if they are not specified.
下面是选项列表op的名称。如果它们没有被指定,所有的名字有默认值。

genetic.model 0-3: The genetic model for the SNP. 0=trend, 1=dominant,  2=recessive, 3=general.
genetic.model0-3:为SNP的遗传模型。 0 =趋势,1 =显性隐性,2 =,3 =一般。

reltol Stopping tolerance. The default is 1e-6.
reltol停止容忍。默认为1E-6。

maxiter Maximum number of iterations. The default is 100.
maxiter最大迭代次数。默认是100。

optimizer One of "BFGS", "CG", "L-BFGS-B", "Nelder-Mead", "SANN". The default is "BFGS".
optimizer“的BFGS”,“企业管治”,“L-bfgs-B的”,“内尔德,美赞臣”,“宋双”。默认是“的BFGS”。


值----------Value----------

A list containing sublists with names UML (unconstrained maximum likelihood), CML (constrained maximum likelihood), and EB (empirical Bayes). Each sublist contains the parameter estimates (parms) and covariance matrix (cov). The lists UML and CML also contain the log-likelihood (loglike).  The list CML also contains the results for the stratum specific allele frequencies under the HWE assumption (strata.parms and strata.cov).
名UML(约束最大似然),慢性粒单元白血病(约束最大似然)和EB(经验Bayes)的列表,其中包含子列表。每个子表包含的参数估计(PARMS)和协方差矩阵(COV)。 UML和CML名单还包含log的可能性(loglike)。慢性粒单元白血病列表也包含地层下的的HWE假设(strata.parms和strata.cov)特定的等位基因频率的结果。


参考文献----------References----------

An empirical Bayes approach to trade-off between bias and efficiency. Biometrics 2008, 64(3):685-94.
type I error, power and designs. Genetic Epidemiology, 2008, 32:615-26.
exploting gene-environment independence in case-control studies. Biometrika, 2005, 92, 2, pp.399-418.
case-control studies. Journal of the American Statistical Association, 2009, 104: 220-233.
Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only studies. American Journal of Human Genetics, 2010, 86(3):331-342. <br>

参见----------See Also----------

snp.scan.logistic, snp.matched
snp.scan.logistic,snp.matched


举例----------Examples----------


# Use the ovarian cancer data[使用卵巢癌数据]
data(Xdata, package="CGEN")

# Fit using a stratification variable[适合采用分层变量]
ret <- snp.logistic(Xdata, "case.control", "BRCA.status",
                     main.vars=c("oral.years", "n.children"),
                     int.vars=c("oral.years", "n.children"),
                     strata.var="ethnic.group")

# Compute a summary table for the models[模型计算汇总表]
getSummary(ret)

# Compute a Wald test for the main effect of the SNP and interaction[计算Wald检验的SNP和互动的主要作用1]
getWaldTest(ret, c("BRCA.status", "BRCA.statusral.years", "BRCA.status:n.children"))

# Fit the same model as above using formulas[符合以上使用公式同一型号]
ret2 <- snp.logistic(Xdata, "case.control", "BRCA.status",
                     main.vars=~oral.years + n.children,
                     int.vars=~oral.years + n.children,
                     strata.var="ethnic.group")

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-27 14:08 , Processed in 0.025091 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表