correctBias(yaImpute)
correctBias()所属R语言包:yaImpute
Correct bias by selecting different near neighbors
通过选择不同的近邻正确的偏置
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Change the neighbor selections in a yai object such that bias (if any) in the average value of an expression of one or more variables is reduced to be within a defined confidence interval.
更改的邻居在yai对象的选择,偏见等(如果有的话)中的一个或多个变量的表达式的平均值降低到一个定义的置信区间内。
用法----------Usage----------
correctBias(object,trgVal,trgValCI=NULL,nStdev=1.5,excludeRefIds=NULL,trace=FALSE)
参数----------Arguments----------
参数:object
an object of class yai with k > 1.
对象的类yaik > 1。
参数:trgVal
an expression defining a variable or combination of variables that is applied to each member of the population (see details). If passed as a character string it is coerced into an expression. The expression can refer to one or more X- and Y-variables defined for the reference observations.
expression定义一个变量或变量组合应用到每个成员的人口(见详情)。如果通过一个字符串,它被强制转换成一个表达式。该表达式可以参考一个或多个X-和Y-为基准定义的变量观测。
参数:trgValCI
The confidence interval that should contain the mean(trgVal). If the mean falls within this interval, the problem is solved. If NULL, the interval is based on nStdev.
信心的时间间隔,应包含mean(trgVal)。如果平均下降:在该时间间隔内,该问题得以解决。 NULL如果,间隔的基础上nStdev。
参数:nStdev
the number of standard deviations in the vector of values used to compute the confidence interval when one is computed, ignored if trgValCI is not NULL.
用于计算的置信区间1计算时,在向量中的值的标准偏差,忽略,如果trgValCI是不是NULL的数目。
参数:excludeRefIds
identities of reference observations to exclude from the population, if coded as "all" then all references are excluded (see details).
身份的参考观测到的人口中排除,如果编码为“全部”,所有的引用都排除在外(见详情)。
参数:trace
if TRUE, detailed output is produced.
如果TRUE,产生详细的输出。
Details
详细信息----------Details----------
Imputation as it is defined in yaImpute can yield biased results. Lets say that you have a collection of reference observations that happen to be selected in a non-biased way among a population. In this discussion, population is a finite set of all individual sample units of interest; the reference plus target observations often represent this population (but this need not be true, see below). If the average of a measured attribute is computed from this random sample, it is an unbiased estimate of the true mean. <br><br> Using yai, while setting k=1, values for each of several attributes are imputed from a single reference observation to a target observation. Once the imputation is done over all the target observations, an average of any one measured attribute can be computed over all the observations in the population. There is no guarantee that this average will be within a pre-specified confidence interval. <br><br> Experience shows that despite any lack of guarantee, the results are accurate. This tends to hold true when the reference data contains samples that cover the variation of the targets, even when they are not a random sample, and even if some of the reference observations are from sample units that are outside the target population. <br><br> Because there is no guarantee, and because the reference observations might profitably come from sample units beyond the those in the population (so as to insure all kinds of targets have a matching reference), it is necessary to test the imputation results for bias. If bias is found, it would be helpful to do something to correct it. <br><br> The correctBias() function is designed to check for, and correct discovered bias by selecting alternative nearby reference observations to be imputed to targets that contribute to the bias. The idea is that even if one reference is closest to a target, its attribute(s) of interest might be greater (or less) than the mean. An alternative neighbor, one that may be almost as close, might reduce the overall bias if it were used instead. If this is the case, correctBias() switches the nighbor selections. It makes as many switches as it can until the mean among the population targets falls within the specified confidence interval. There is no guarantee that the goal will be met. <br><br> The details of the method are: <br><br> 1. An attribute of interest is established by naming one in the call with argument tarVal. Note that this can be a simple variable name enclosed in quotations marks or it can be an expression of one or more variables. If the former, it is converted into an expression that is executed in the environment of the reference observations (both the X- and Y-variables). A confidence interval is computed for this value under the assumption that the reference observations are an unbiased sample of the target population. This may not be the case. Regardless, a confidence interval is necessary and it can alternatively be supplied using trgValCI. <br><br> 2. One of several possible passes through the data are taken to find neighbor switches that will result in the bias being corrected. A pass includes computing the attribute of interest by applying the expression to values imputed to all the targets, under the assumption that the next neighbor is used in place of the currently used neighbor. This computation results in a vector with one element for each target observation that measures contribution toward reducing the bias that would be made if a switch were made. The target observations are then ordered into increasing order of how much the distance from the currently selected reference would increase if the switch would were to take place. Enough switches are made in this order to correct the bias. If the bias is not corrected by the first pass, another pass is done using the next neighbors. The number of possible passes is equal to k-1 where k is set in the original call to yai. Note that switches are made among targets only, and never among reference observations that may make up the population. That is, reference observations are always left to represent themselves with k=1. <br><br> 3. Here are details of the argument excludeRefIds. When computing then mean of the attribute of interest (using the expression), correctBias() must know which observations represent the population. Normally, all the target observations would be in this set, but perhaps not all of the reference observations. When excludeRefIds is left NULL, the population is made of all reference and all target observations. Reference observations that should be left out of the calculations because they are not part of the population can be specified using the excludeRefIds argument as a vector of character strings identifing the rownames to leave out, or a vector of row numbers that identify the row numbers to leave out. If excludeRefIds="all", all reference observations are excluded.
插补,因为它是定义在yaImpute可以产生偏差。比方说,你有一个集合的参考意见,即发生中选择一个不带偏见的方式之间的人口。在这次讨论中,人口是一组有限的个别样本单位的利益;,参考加目标的意见往往代表着这个人口(但不必是真实的,见下文)。如果从这个随机样本计算的平均值的测量属性,它是真实平均值的无偏估计。参考参考使用yai,同时设定k = 1时,为每个几个属性的值插补从一个单一的参考观察到观察的目标。插补做了所有目标的意见,所有的意见,在人群中的任何一个测量属性的平均值可以计算。但不保证这个平均值将一个预先指定的置信区间内。 <BR>参考的经验表明,尽管缺乏任何担保,结果是准确的。这往往会成立时的参考数据包含样本覆盖变化的目标,即使他们是不是随机抽样,即使一些参考意见以外的目标人群的样本单位。 <br> <br>由于没有保证,因为参考观测可能有利可图来自样本单位以外的那些在人群中(以便保证所有种目标有一个相匹配的参考),它是必要的,以测试归集结果偏差。如果发现偏置,这将有助于做一些事情来纠正它。参考参考correctBias()函数的目的是检查并纠正发现的偏见必须估算的偏见的目标,选择附近的参考意见。我们的想法是,即使有一个引用最接近目标,它的属性(s)的兴趣可能会对更大的比的平均值(或更少)。如果它被用来代替另一种的邻居,一个可能几乎接近,可能会降低整体偏。如果是这样的情况下,correctBias()切换的Nighbor选择。尽可能多地开关,因为它可以直到在人群中的目标范围内指定的置信区间的平均。有没有保证,我们的目标能够达到。参考参考的细节的方法是:参考<BR> 1。利益的一个属性,建立的命名在调用参数tarVal。请注意,这可以是一个简单括在引号中的变量名,或者它可以是一个expression中的一个或多个变量。如果是前者,它被转换成reference观测(无论是X-和Y变量)的环境中被执行的表达式。置信区间的计算此值的假设条件下的参考意见是无偏样本的目标人群。这可能不是这种情况。无论如何,置信区间是必要的,它也可以提供使用trgValCI。参考参考2。几种可能穿过数据之一被找到邻居交换机,这将导致在偏置校正。一个通包括应用expression值估算到的所有目标,下一个邻居是用来代替目前使用的邻居的假设下,计算利息的属性。在每个观察的目标,措施,如果一台交换机,将减少偏见的贡献对一个元素矢量计算结果。目标观测,然后下令进入递增的顺序从当前选择的参考距离会增加多少,如果交换机发生。足够的开关是由以该顺序以纠正偏置。如果偏差不纠正的第一道关口,另一次是在未来的邻居。可能的通行证的数目等于到k-1,其中k被设置在原始呼叫yai。请注意开关唯一目标之间,从来没有的参考观测值之间可以弥补人口。也就是说,观测总是留给自己的代表与k=1。参考参考3。下面是详细的参数excludeRefIds。当计算则表示感兴趣的属性(使用expression)correctBias()必须知道的观测点代表的人口。通常情况下,所有的目标观察,在这一套,但也许不是所有的参考意见。当excludeRefIds被留为NULL,人口的所有参考和所有目标的观测。行名到离开的字符串identifing,作为一个向量或向量的行号标识,可以使用excludeRefIds参数指定的参考意见,即应被排除在外,因为他们不属于人口的计算行号离开了。如果excludeRefIds="all",所有的参考意见被排除在外。
值----------Value----------
An object of class yai where k = 1 and the neighbor selections have been changed as described above. In addition, the call element is changed to show both the original call to yai and the call to this function. A new list, called biasParameters is added to the yai object with these tags:
类的一个对象yai其中k = 1和邻居的选择已经改变如上所述。此外,call元素的改变,同时显示原始呼叫yai和调用这个函数。一个新的列表,名为biasParameters的yai对象添加到的标签:
参数:trgValCI
the target CI.
CI的目标。
参数:curVal
the value of the bias that was achieved.
被实现的偏置的值。
参数:npasses
the number of passes through the data taken to achieve the result.
的数目来实现的结果而采取的数据穿过。
参数:oldk
the old value of k.
的旧值k。
(作者)----------Author(s)----------
Nicholas L. Crookston <a href="mailto:ncrookston.fs@gmail.com">ncrookston.fs@gmail.com</a> <br>
参见----------See Also----------
yai
yai
实例----------Examples----------
data(iris)
set.seed(12345)
# form some test data[形成一些测试数据。]
refs=sample(rownames(iris),50)
x <- iris[,1:3] # Sepal.Length Sepal.Width Petal.Length[Sepal.Length Sepal.Width Petal.Length]
y <- iris[refs,4:5] # Petal.Width Species[Petal.Width物种]
# build an msn run, first build dummy variables for species.[建立一个MSN运行,首先建立虚拟变量的物种。]
sp1 <- as.integer(iris$Species=="setosa")
sp2 <- as.integer(iris$Species=="versicolor")
y2 <- data.frame(cbind(iris[,4],sp1,sp2),row.names=rownames(iris))
y2 <- y2[refs,]
names(y2) <- c("Petal.Width","Sp1","Sp2")
# find 5 refernece neighbors for each target[找到的5 refernece邻居每个目标]
msn <- yai(x=x,y=y2,method="msn",k=5)
# check for and correct for bias in mean "Petal.Width". Neighbor [检查和纠正偏向于平均“Petal.Width”。邻居]
# selections will be changed as needed to bring the imputed values [选择将根据需要改变带来的估算值]
# into line with the CI. In this case, no changes are made (npasses [到的CI线。在这种情况下,不进行任何更改(npasses]
# returns as zero).[返回值为零)。]
msnCorr = correctBias(msn,trgVal="Petal.Width")
msnCorr$biasParameters
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|