getMatchedSets(CGEN)
getMatchedSets()所属R语言包:CGEN
Case-Control and Nearest-Neighbor Matching
病例对照研究和最近邻匹配
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Obtain matching of subjects based on a set of covariates (e.g., principal components of population stratification markers). Two types of matcing are allowed 1) Case-Control(CC) matching and/or 2) Nearest-Neighbour(NN) matching.
获得匹配的一组变项(例如,人口分层标志物的主要组成部分)的基础科目。被允许两种类型的matcing 1)病例控制(CC)匹配和/或2)最近邻居(NN)匹配。
用法----------Usage----------
getMatchedSets(x, CC, NN, ccs.var=NULL, dist.vars=NULL, strata.var=NULL,
size=2, ratio=1, fixed=FALSE)
参数----------Arguments----------
参数:x
Either a data frame containing variables to be used for matching, or an object returned by dist or daisy or a matrix coercible to class dist. No default.
要么返回一个数据框包含被用来匹配的变量或对象dist或daisy或强制转换矩阵类区。没有默认值。
参数:CC
Logical. TRUE if case-control matching should be computed, FALSE otherwise. No default.
逻辑。 TRUE,如果应计算的病例对照匹配,否则返回FALSE。没有默认值。
参数:NN
Logical. TRUE if nearest-neighbor matching should be computed, FALSE otherwise. No default. At least one of CC and NN should be TRUE.
逻辑。 TRUE,如果应计算近邻匹配,否则返回FALSE。没有默认值。 CC和NN至少有一个应该是真实的。
参数:ccs.var
Variable name, variable number, or a vector for the case-control status. If x is dist object, a vector of length same as number of subjects in x. This must be specified if CC=TRUE. The default is NULL.
变量名,变量的数量,或向量的情况下,控制状态。 x如果对象是县,向量和x科目的数量相同的长度。如果CC = TRUE,必须指定。默认值为NULL。
参数:dist.vars
Variables numbers or names for computing a distance matrix based on which matching will be performed. Must be specified if x is a data frame. Ignored if x is a distance. Default is NULL.
变量的数字或计算它匹配基于距离矩阵的名称将被执行。必须指定如果x是一个数据框。如果x是距离忽略。默认值为NULL。
参数:strata.var
Optional stratification variable (such as study center) for matching within strata. A vector of mode integer or factor if x is a distance. If x is a data frame, a variable name or number is allowed. The default is NULL.
可选的分层变量(如研究中心)地层内的匹配。模式整数或因素,如果x是距离向量。 x如果是一个数据框,一个变量的名称或号码是允许的。默认值为NULL。
参数:size
Exact size or maximum allowable size of a matched set. This can be an integer greater than 1, or a vector of such integers that is constant within each level of strata.var. The default is 2.
确切的大小或一个匹配集的最大允许大小。这可以是一个大于1的整数,或者是这些整数的向量,就是每个strata.var水平不断。默认是2。
参数:ratio
Ratio of cases to controls for CC matching. Currently ignored if fixed = FALSE. This can be a positive number, or a numeric vector that is constant within each level of strata.var. The default is 1.
情况的比例用于CC匹配的控制。目前被忽略,如果固定= FALSE。这可以是一个正数,或一个数字的向量,是恒定每个strata.var水平。默认值是1。
参数:fixed
Logical. TRUE if "size" should be interpreted as "exact size" and FALSE if it gives "maximal size" of matched sets. The default is FALSE.
逻辑。 TRUE,如果“大小”应解释为“确切的大小”和FALSE,如果它给匹配套的“最大规模”。默认值为FALSE。
Details
详情----------Details----------
If a data frame and dist.vars is provided, dist along with the euclidean metric is used to compute distances assuming conituous variables. For categorical, ordinal or mixed variables using a custom distance matrix such as that from daisy is recommended. If strata.var is provided both case-control (CC) and nearest-neighbor (NN) matching are performed within strata. size can be any integer greater than 1 but currently the matching obtained is usable in snp.matched only if size is 8 or smaller, due to memory and speed limitations. <br>
如果一个数据框和dist.vars是dist随着欧氏度量用于计算假设conituous变量的距离。对于类别,序号或混合变量,如daisy建议使用自定义的距离矩阵。 strata.var如果提供两个病例控制(CC)和最近邻(NN)匹配阶层内进行。 size可以是任何大于1的整数,但目前获得的匹配可用snp.matched只有size是8或更小,由于内存和速度的限制。“参考
When fixed=FALSE, NN matching is computed using a modified version of hclust, where clusters are not allowed to grow beyond the specified size. CC matching is computed similarly with the further constraint that each cluster must have at least one case and one control. Clusters are then split up into 1:k or k:1 matched sets, where k is at most size - 1 (known as full matching). For exactly optimal full matching use package optmatch.<br>
当固定= FALSE时,NN匹配计算使用修改后的版本hclust,聚类不得超出指定size增长。 CC匹配计算的进一步约束,同样,每个聚类必须至少有一情况例和一个控制。聚类,然后分割成1:k或k:1匹配套,其中k顶多是size - 1(称为完全匹配)。为正是最佳匹配使用包optmatch。参考
When fixed=TRUE, both CC and NN use heuristic fixed-size clustering algorithms. These algorithms start with matches in the periphery of the data space and proceed inward. Hence prior removal of outliers is recommended. For CC matching, number of cases in each matched set is obtained by rounding size * [ratio/(1+ratio)] to the nearest integer. The matching algorithms for fixed=TRUE are faster, but in case of CC matching large number of case or controls may be discarded with this option.
当固定= TRUE时,CC和NN使用启发式固定大小的聚类算法。这些算法在数据空间的外围比赛开始,并继续向内。因此,事先去除离群建议。用于CC匹配,在每一个匹配集的情况下得到四舍五入size* [ratio/(1 +ratio)]到最接近的整数。匹配算法fixed=TRUE速度更快,但在匹配的情况下或控制的大量CC的情况下,可以用此选项丢弃。
值----------Value----------
A list with names "CC", "tblCC", "NN", and "tblNN". "CC" and "NN" are vectors of integer labels defining the matched sets, "tblCC" and "tblNN" are matrices summarizing the size distribution of matched sets across strata. i'th row corresponds to matched set size of i and columns represent different strata. The order of strata in columns may be different from that in strata.var, if strata.var was not coded as successive integers starting from 1.
一个名称“CC”的名单,“tblCC”,“神经网络”,“tblNN”。 “抄送”和“神经网络”的定义相匹配的套,“tblCC”和“tblNN”矩阵总结跨阶层的匹配集的粒度分布的整数标签的向量。 i日行对应匹配i和列集的大小代表着不同的阶层。列阶层的顺序,如果不是从1开始的连续整数编码strata.var是从strata.var不同。
参考文献----------References----------
highlights causal variants. Amer Jour Hum Genet, 2008, 82(2):453-63. <br>
Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only studies. American Journal of Human Genetics, 2010, 86(3):331-342.
参见----------See Also----------
snp.matched
snp.matched
举例----------Examples----------
# Use the ovarian cancer data[使用卵巢癌数据]
data(Xdata, package="CGEN")
# Add fake principal component columns.[添加假的主要组成部分的栏目。]
set.seed(123)
Xdata <- cbind(Xdata, PC1 = rnorm(nrow(Xdata)), PC2 = rnorm(nrow(Xdata)))
# Assign matched set size and case/control ratio stratifying by ethnic group[分配匹配的大小和案例/控制比例由族群分层]
size <- ifelse(Xdata$ethnic.group == 3, 2, 4)
ratio <- sapply(Xdata$ethnic.group, switch, 1/2 , 2 , 1)
mx <- getMatchedSets(Xdata, CC=TRUE, NN=TRUE, ccs.var="case.control", dist.vars=c("PC1","PC2") , strata.var="ethnic.group",
size = size, ratio = ratio, fixed=TRUE)
mx$NN[1:10]
mx$tblNN
# Example of using a dissimilarity matrix using catergorical covariates with Gower's distance[用高尔的距离catergorical协变量使用相异矩阵的例子]
library("cluster")
d <- daisy(Xdata[, c("age.group","BRCA.history","gynSurgery.history")] , metric = "gower")
# Specify size = 4 as maximum matched set size in all strata[指定size = 4最大匹配设置在各阶层的规模]
mx <- getMatchedSets(d, CC = TRUE, NN = TRUE, ccs.var = Xdata$case.control, strata.var = Xdata$ethnic.group, size = 4,
fixed = FALSE)
mx$CC[1:10]
mx$tblCC
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|