R语言 CGEN包 snp.scan.logistic()函数中文帮助文档(中英文对照)

                                        Logistic regression analysis for an array of SNPs

Performs a logistic regression analysis of case-control data with three alternative analysis options: (i) Unconstrained maximum-likelihood: This  method is equivalent to prospective logistic regression analysis and corresponds  to maximum-likelihood analysis of case-control data allowing the joint distribution of the covariates in the model to be completely unrestricted (non-parametric) (ii) Constrained maximum-likelihood: This method performs maximum-likelihood analysis of case-control data under the assumption of gene-environment (or/and gene-gene) independence and Hardy-Weinberg-Equilibrium for the underlying population. The analysis allows the assumptions to be  valid conditional on a stratification variable (iii) Empirical-Bayes: This method uses an  empirical-Bayes type "shrinkage estimation" technique to trade-off bias and variance between the constrained and unconstrained maximum-likelihood estimators.      


snp.scan.logistic(snp.list, pheno.list, op=NULL)


See snp.list. No default.  

See pheno.list. No default.  

See details for this list of options. The default is NULL.



To use this function, the data must be stored in files as defined in snp.list and pheno.list. See the examples on how to create these lists. The genotype data is read in from the file(s)  snp.list$file, and the variables for the main effects and interactions are read in from the file pheno.list$file.  The subjects to be included in the model are defined in pheno.list. For an included subject with id sub.id, there must be the same id in the genotype data file(s). The genotype data file(s) can contain more subject ids than in pheno.list$file, and the ids do not have to be in any particular order.  Once the data is read in, all missing values are removed and  the function snp.logistic is called for each SNP in the genotype data file(s). By default, output files are not created and only the  analysis from the last SNP is returned from this function;  so to save the results for all the SNPs, the user must specify op$out.file or op$out.dir.   <br> <br>
要使用此功能,数据必须存储在文件中定义snp.list和pheno.list。请参阅如何创建这些名单的例子。 snp.list$file,变量的主效应和交互读取文件pheno.list$file在阅读文件(S)的基因型数据。被包括在模型中的对象定义在pheno.list。 IDsub.id的主题,必须有相同的ID基因型数据文件(S)。基因型数据文件(S)可以包含比pheno.list$file多学科的IDS,IDS没有在任何特定的顺序。一旦数据被读取,删除所有缺失值的功能snp.logistic呼吁每个SNP基因型数据文件(S)。默认情况下,输出文件不会创建,只有从过去的单核苷酸多态性分析,从这个函数返回,所以保存所有的SNPs的结果,用户必须指定op$out.file或op$out.dir。 <BR> <BR>

Options list op: Below are the names for the options list op. All names have default values if they are not specified.

genetic.model 0-3: The genetic model for the SNP. 0=trend, 1=dominant,  2=recessive, 3=general.
genetic.model0-3:为SNP的遗传模型。 0 =趋势,1 =显性隐性,2 =,3 =一般。

tests List of character vectors that will be used in Wald tests. For example, tests=list(c("x1", "x2"), c("x1", "x4", "x9")), will compute a 2 df Wald test involving the variables x1 and x2,  and will compute a 3 df Wald test for the variables x1, x4, and x9. The variable name for the main effect of each SNP is called "SNP\_", and the variable names that interact with each SNP are of the form "SNP\_x1", "SNP\_gender", etc. In the output, these tests will  labeled as "test1", "test2", etc.  The default is NULL.
tests瓦尔德测试将使用的字符向量名单。例如,tests=列表(C(“X1”,“X2”),C(“X1”,“X4”,“X9”)),将计算2 DF Wald检验涉及的变量X1和X2,将计算3 DF瓦尔德变量X1,X4,X9测试。每个SNP的主要影响变量名被称为“单核苷酸多态性\ _”,每个SNP与互动的变量名的形式是“单核苷酸多态性\ _x1”,“单核苷酸多态性\ _gender”等,在输出,这些测试将标示为“TEST1”,“TEST2”等默认值为NULL。

tests.1df Character vector of variable names to compute 1 degree of  freedom Wald tests for.  The default is NULL.

effects List for joint/stratified effects. The default is NULL. Names in the list must be:

var Variable name to compute the effects with the SNP variable. This variable must be a main effect. No default.

type 1, 2 or c(1, 2), 1 = joint, 2 = stratified. The default is 1.
type1,2或c(1,2),1 =联合,2 =分层。默认值是1。

var.levels (Only for continuous var). Numeric vector of the  levels to be used in the calculation. The default is 0.

var.base (Only for continuous var). Baseline level. The default is 0.

snp.levels A vector containing any of the values 0, 1, 2 to use as the levels of each SNP. The default is 1.

method Character vector containing any of the following: "UML", "CML", "EB". The default is c("UML", "CML", "EB").

out.file NULL or file name to save summary information for each SNP. The output will at least contain the columns "SNP" and "MAF". MAF is the minor allele frequency from the controls. Additional columns in this file are based on the values of tests and  tests.1df. The default is NULL.

out.dir NULL or the output directory to store the output lists for each SNP. A seperate file will be created for each SNP in the SNP data set, so this option should only be used for analyzing a small number of SNPs. The file names will be out\_<SNP>.rda. The load() function must be used to read these files into R. The object names are called "ret". The default is NULL.
out.dirNULL或输出目录来存储每个SNP的输出列表。将每个SNP的SNP数据集创建一个单独的文件,所以这个选项应该只用于单核苷酸多态性分析少数。文件名会出\ _ <SNP>。RDA。 load()函数必须使用阅读对象的名字被称为“RET”入河这些文件。默认值为NULL。

reltol Stopping tolerance. The default is 1e-6.

maxiter Maximum number of iterations. The default is 100.

optimizer One of "BFGS", "CG", "L-BFGS-B", "Nelder-Mead", "SANN". The default is "BFGS".


A list from the LAST analysis performed. This list will contain the estimated parameters, covariance matrices, SNP name, and possibly the results of any Wald tests.


An empirical Bayes approach to trade-off between bias and efficiency. Biometrics 2008, 64(3):685-94.
type I error, power and designs. Genetic Epidemiology, 2008, 32:615-26.
exploting gene-environment independence in case-control studies. Biometrika, 2005, 92, 2, pp.399-418.
case-control studies. Journal of the American Statistical Association, 2009, 104: 220-233.
Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only studies. American Journal of Human Genetics, 2010, 86(3):331-342. <br>

参见----------See Also----------



# Define the list for the genotype data. [定义为基因型数据的列表。]
snp.list <- list()
snp.list$file <- system.file("sampleData", "SNPdata.rda", package="CGEN")
snp.list$file.type <- 1   
snp.list$delimiter <- "|"
snp.list$in.miss <- "NA"

# Only process the first 5 SNPs in the file[只处理文件中的首5个SNPs]
snp.list$start.vec <- 1
snp.list$stop.vec <- 6

# Define pheno.list[定义pheno.list]
pheno.list <- list()
pheno.list$file <- system.file("sampleData", "Xdata.txt", package="CGEN")
pheno.list$file.type <- 3
pheno.list$delimiter <- "\t"
pheno.list$id.var <- "id"

# Define the variables in the model[定义模型中的变量]
pheno.list$response.var <- "case.control"
pheno.list$strata.var <- "ethnic.group"
pheno.list$main.vars <- c("age.group", "oral.years", "n.children")
pheno.list$int.vars <- "n.children"

# Define the list of options[定义选项列表]
op <- list()

# Omnibus Wald test for the main effect of the SNP and the interaction variables, and[综合Wald检验的SNP和交互变量的主效应,]
#  a seperate Wald test for "age.group" and "oral.years". [一个单独的Wald检验“age.group”和“oral.years”。]
op$tests <- list(c("SNP_", "SNP_:n.children"), c("age.group", "oral.years"))

# Specifying out.dir will create a separate .rda file for each SNP[指定out.dir将创建一个每个SNP的独立。RDA文件]
#op$out.dir &lt;- "./"[运算元out.dir < - “/。”]
# Specifying out.file will create one output file[指定out.file将创建一个输出文件]
#op$out.file &lt;- "out.txt"[运$ out.file < - “out.txt”]

# For this model, all variables are continuous[对于这个模型,所有的变量是连续的]
# temp &lt;- snp.scan.logistic(snp.list, pheno.list, op=op)[临时< -  snp.scan.logistic的(OP = OP snp.list,pheno.list)]

