snp.matched(CGEN)
snp.matched()所属R语言包:CGEN
Robust G-G and G-E Interaction with Finely-Matched Case-Control Data.
强大的GG和GE与精细配对病例对照数据的相互作用。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Performs a conditional likelihood-based analysis of matched case-control data typically modeling a particular SNP and a set of covariates that could include environmental covariates or/and other genetic variables. Three alternative analysis options are included: (i) Conditional Logistic Regression (CLR): This method is classical CLR that does not try to utilize G-G or G-E independence allowing the joint distribution of the covariates in the model to be completely unrestricted (non-parametric) (ii) Constrained Conditional Logistic (CCL) : This method performs CLR analysis of case-control data under the assumption of gene-environment (or/and gene-gene) independence not in the entire population but within finely matched case-control sets. (iii) Hybrid Conditional Logistic (HCL): This method is suitable if nearest neighbor matching (see the reference by Bhattacharjee et al. 2010) is performed without regard to case-control status. The likelihood (like CCL) assumes G-G/G-E independence within matched sets but in addition borrows some information across matched sets by using a parametric model to account for heterogeneity in disease across strata.
执行有条件的可能性为基础的配对病例对照通常模拟一个特定的SNP和一组,其中可能包括环境变项和/或其他遗传变量的协变量的数据分析。三种可供选择的分析选项包括:(一)条件Logistic回归(CLR):这种方法是古典CLR不尝试利用GG或GE允许独立的协变量的联合分布模型是完全不受限制的(非参数)(二)约束条件Logistic(覆铜板):此方法执行的情况下,控制数据的分析基因与环境的假设下(和/或基因的基因),而不是在整个人口,但在精心匹配的情况下控制套独立的CLR 。 (三)混合条件Logistic“(HCL):这个方法是合适的,如果近邻匹配(见2010 Bhattacharjee等参考。)执行,而不考虑到的情况下控制状态。的可能性(如覆铜板)假定匹配集内的GG / GE公司的独立性,但除了借用一些跨匹配套通过跨阶层疾病的异质性参数模型帐户信息。
用法----------Usage----------
snp.matched(data, response.var, snp.vars, main.vars=NULL, int.vars=NULL,
cc.var=NULL, nn.var=NULL, op=NULL)
参数----------Arguments----------
参数:data
Data frame containing all the data. No default.
数据框包含的所有数据。没有默认值。
参数:response.var
Name of the binary response variable coded as 0 (controls) and 1 (cases). No default.
编码为0(对照组)和1个(件)的二进制响应变量的名称。没有默认值。
参数:snp.vars
A vector of variable names or a formula, generally coding a single SNP variable (see details). No default.
一个变量名或一个公式的向量,通常编码一个单一的单核苷酸多态性变量(见详情)。没有默认值。
参数:main.vars
Vector of variable names or a formula for all covariates of interest which need to be included in the model as main effects. The default is NULL, so that only the snp.vars will be included as main effect(s) in the model.
向量变量名或利益的需要为主要的影响包括在模型中的所有协变量的公式。默认是空的,所以只有snp.vars将被包括在该模型的主要作用(S)。
参数:int.vars
Character vector of variable names or a formula for all covariates of interest that will interact with the SNP variable. The default is NULL, so that no interactions will be in the model.
特征向量的变量名或感兴趣的所有协变量的公式,将与SNP的变量。默认是空的,所以没有交互模型。
参数:cc.var
Integer matching variable with at most 10 subjects per stratum (e.g. CC matching using getMatchedSets) Each stratum has one case matched to one or more controls (or one control matched to one or more cases). The default is NULL.
整数匹配的变量与最多10个科目,每阶层(如消委会匹配使用getMatchedSets)每个阶层有一个相匹配的一个或多个控制(或控制相匹配的一个或多情况例)的情况。默认值为NULL。
参数:nn.var
Integer matching variable with at most 8 subjects per stratum (e.g. NN matching using getMatchedSets) Each stratum can have zero or more cases and controls. But entire data set should have both cases and controls. The default is NULL. At least one of cc.var or nn.var should be provided.
整数匹配与地层科目至多8%的变量(如神经网络匹配使用getMatchedSets)各阶层可以拥有零个或更多的病例和对照。但整个数据集,应该有这两种情况下,和控制。默认值为NULL。应提供至少cc.var或nn.var之一。
参数:op
Control options for Newton-Raphson optimizer. List containing members "maxiter" (default 100) and "reltol" (default 1e-5).
为牛顿 - 拉夫森优化控制的选项。列表,其中包含成员的“maxiter”(默认为100)和的“RELTOL”(默认是1e-5)。
Details
详情----------Details----------
To compute HCL, the data is first fit using standard logistic regression. The estimated parameters from the standard logistic regression are then used as the initial estimates for Newton-Raphson iterations with exact gradient and hessian. Similarly for CCL, the data is first fit using clogit using cc.var to obtain the CLR estimate as an intial estimate and Newton-Raphson is used to maximize the likelihood.
计算盐酸,数据首先符合使用标准的logistic回归。从标准的logistic回归估计参数,然后使用精确的梯度和Hessian Newton-Raphson迭代的初步估计。同样,覆铜板,该数据首次适合使用clogit用cc.var获得作为intial估计和牛顿 - 拉夫森是用来最大化的可能性CLR估计。
While snp.logistic parametrically models the SNP variable, this function is non-parametric and hence offers somewhat more flexibility. The only constraint on snp.vars is that it is independent of int.vars within homogenous matched sets. It can be any genetic or non-genetic variable or a collection of those. For example 3 SNPs coded as general, dominant and additive can be specified through a single formula e.g., "snp.vars= ~ (SNP1==1) + (SNP1 == 2) + (SNP2 >= 1)+ SNP3." However, when multiple variables are used in snp.vars results should be interpreted carefully. Summary function snp.effects can only be applied if a single SNP variable is coded. <br>
虽然snp.logisticSNP的变量的参数化模型,这个函数的非参数,从而提供较为灵活。 snp.vars的唯一约束int.vars内同质匹配套,这是独立的。它可以是任何遗传或非遗传变量或收集这些。例如3个SNPs一般,优势和添加剂,可以通过一个单一的公式,例如指定的编码,“snp.vars =~(SNP1 == 1)+(SNP1 == 2)+(SNP2> = 1)SNP3的“然而,当多个变量snp.vars结果应解释仔细。汇总函数snp.effects只能应用于一个单一的单核苷酸多态性变量编码。参考
Note that int.vars consists of variables that interact with the SNP variable and can be assumed to be independent of snp.vars within matched sets. Those interactions for which independence is not assumed can be included in main.vars (as product of appropriate variables). <br>
请注意,int.vars互动与SNP变量的变量可以被假定为独立snp.vars内匹配套组成。那些不承担独立的相互作用可以包含在main.vars(如适当的变量产品)。参考
Both CCL and HCL provide considerable gain in power compared to standard CLR. CCL derives more power by generating pseudo-controls under the assumption of G-G/G-E independence within matched case-control sets. HCL makes the same assumption but allows each matched set to have any number of cases and controls unlike classical case-control matching. By comparing across matched sets, it is able to estimate the intercept parameter and improve efficiency of estimating main effects compared to CLR and CCL. At the same time behaves similar to CCL for interactions by assuming G-G/G-E independence only within mathced sets. For both these methods, the power increase for interaction depends on sizes of the matched sets in nn.var, which is currently limited to 8, to avaoid both memory and speed issues. <br>
CCL和HCL标准CLR相比,提供了相当大的功率增益。覆铜板的GG / GE在匹配的情况下,控制套独立的前提下产生的伪控制取得更多的权力。 HCL使得相同的假设,但允许每一个匹配的一套有任何经典病例对照匹配的情况下和不同的控制。通过跨匹配套相比,它能够拦截参数估计和提高效率的估计相比,CLR和CCL的主要影响。在同一时间的行为类似的GG / GE公司的独立性假设只有在mathced套相互作用覆铜板。对于这两种方法,增加互动的力量取决于匹配套的大小,nn.var,目前仅限于8 avaoid内存和速度问题。参考
The authors would like to acknowledge Bijit Kumar Roy for his help in designing the internal data structure and algorithm for HCL/CCL likelihood computations.
笔者想Bijit库马尔罗伊承认他在盐酸/ CCL的可能性计算的内部数据结构和算法设计的帮助。
值----------Value----------
A list containing sublists with names CLR, CCL, and HCL. Each sublist contains the parameter estimates (parms), covariance matrix (cov), and log-likelihood (loglike).
名称CLR,覆铜板,和HCL列表,其中包含子列表。每个子表包含的参数估计(PARMS),协方差矩阵(COV),log可能性(loglike的)。
参考文献----------References----------
Increased power for detecting associations, interactions and joint-effects. Genetic Epidemiology 2005; 28:138-156. <br>
Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only studies. American Journal of Human Genetics 2010, 86(3):331-342. <br>
of case-control studies." 1980, Lyon: IARC Sci Publ;(32):247-279.
参见----------See Also----------
getMatchedSets, snp.logistic
getMatchedSets,snp.logistic
举例----------Examples----------
# Use the ovarian cancer data[使用卵巢癌数据]
data(Xdata, package="CGEN")
# Fake principal component columns[假冒的主要组成部分列]
set.seed(123)
Ydata <- cbind(Xdata, PC1=rnorm(nrow(Xdata)), PC2=rnorm(nrow(Xdata)))
# Match using PC1 and PC2[匹配使用PC1和PC2]
mx <- getMatchedSets(Ydata, CC=TRUE, NN=TRUE, ccs.var="case.control", dist.vars=c("C1","C2"), size = 4)
# Append columns for CC and NN matching to the data[附加CC和NN匹配的数据列]
Zdata <- cbind(Ydata, CCStrat=mx$CC, NNStrat=mx$NN)
# Fit using variable names[适合使用的变量名]
ret1 <- snp.matched(Zdata, "case.control",
snp.vars = "BRCA.status",
main.vars=c("oral.years", "n.children"),
int.vars=c("oral.years", "n.children"),
cc.var="CCStrat", nn.var="NNStrat")
# Compute a Wald test for the main effect of BRCA.status and its interactions[1 Wald检验计算的BRCA.status及其相互作用的主要作用]
getWaldTest(ret1, c("BRCA.status", "BRCA.statusral.years", "BRCA.status:n.children"))
# Fit the same model as above using formulas.[符合上述使用公式相同的模式。]
ret2 <- snp.matched(Zdata, "case.control", snp.vars = ~ BRCA.status,
main.vars=~oral.years + n.children,
int.vars=~oral.years + n.children,
cc.var="CCStrat",nn.var="NNStrat")
# Compute a summary table for the models[模型计算汇总表]
getSummary(ret2)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|