yai(yaImpute)
yai()所属R语言包:yaImpute
Find K nearest neighbors
查找K近邻
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Given a set of observations, yai
给定一组观测,yai
aeparates the observations into reference and target observations,
aeparates参考和目标观测到的意见,
applies the specified method to project X-variables into a Euclidean space (not always, see argument method), and
X-变量投射到欧氏空间(并非总是适用于指定的方法,请参见参数method)和
finds the k-nearest neighbors within the referenece observations and between the reference and target observations.
发现k-最近邻居,在referenece观察和之间的参考和目标观测。
An alternative method using randomForest classification and regression trees is provided for steps 2 and 3. Target observations are those with values for X-variables and not for Y-variables, while reference observations are those with no missing values for X-and Y-variables (see Details for the exception).
步骤2和3提供的一种替代方法使用randomForest分类和回归树。目标观测是那些用于X的变量,而不是为Y变量的值,而参考观测是那些没有缺失值的X-和Y-变量(见异常的详细信息)。
用法----------Usage----------
yai(x=NULL,y=NULL,data=NULL,k=1,noTrgs=FALSE,noRefs=FALSE,
nVec=NULL,pVal=.05,method="msn",ann=TRUE,mtry=NULL,ntree=500,
rfMode="buildClasses")
参数----------Arguments----------
参数:x
1) a matrix or data frame containing the X-variables for all observations with row names are the identification for the observations, or 2) a one-sided formula defining the X-variables as a linear formula. If a formula is coded for x, one must be used for y as well, if needed.
1)矩阵或数据框包含的所有观测变量X-行名称的识别码的观察,或2)是一种片面的公式定义的X-变量的线性公式。一个公式编码x的如果,一个人必须被用于y,如果需要的话。
参数:y
1) a matrix or data frame containing the Y-variables for the reference observations, or 2) a one-sided formula defining the Y-variables as a linear formula.
1)矩阵或数据框包含的Y变量的参考观测,或2)一个片面的式定义的Y变量的线性公式。
参数:data
when x and y are formulas, then data is a data frame or matrix that contains all the variables with row names are the identification for the observations. The observations are split by yai into two sets.
当x和y是公式,那么数据是一个数据框或矩阵,包含所有行的名称是变量的识别码的观察。观察分裂yai分成两组。
参数:k
the number of nearest neighbors; default is 1.
最近的邻居的数量,默认为1。
参数:noTrgs
when TRUE, skip finding neighbors for target observations.
TRUE时,跳过邻居发现为目标的观测。
参数:noRefs
when TRUE, skip finding neighbors for reference observations.
TRUE时,跳过邻居发现,以供参考的意见。
参数:nVec
number of canonical vectors to use (methods msn and msn2), or number of independent of X-variables reference data when method mahalanobis. When NULL, the number is set by the function.
的典型向量使用(方法msn和msn2),或一些独立的X-变量引用数据的方法mahalanobis。 NULL时,数设置的功能。
参数:pVal
significant level for canonical vectors, used when method is msn or msn2.
显着水平为典型的向量,用时method是msn或msn2。
参数:method
is the strategy finding neighbors; the options are the quoted key words (see details):
是战略的邻居发现的选项是引用的关键词(见详情):
euclideandistance is computed in a normalized X space.
euclidean距离计算标准化的X空间。
rawlike euclidean, except no normalization is done.
raw欧氏一样,除了没有标准化的工作。
mahalanobisdistance is computed in its namesakes space.
mahalanobis距离计算在其同名空间。
icalike mahalanobis, but based on Independent Component Analysis using package fastICA.
ica喜欢马氏,但独立分量分析的基础上,使用包fastICA。
msndistance is computed in a projected canonical space.
msn距离计算的投影的标准空间。
msn2like msn, but with variance weighting (canonical regression rather than correlation).
msn2如MSN,但与方差的权重(而不是相关的标准回归)。
gnn distance is computed using a projected ordination of Xs found using canonical correspondence analysis (cca from package vegan).
gnn距离计算发现使用典型相关分析时,X(cca包vegan)预计协调。
randomForestdistance is one minus the proportion of randomForest trees where a target observation is in the same terminal node as a reference observation (see randomForest).
randomForest距离为1减去比例randomForest树木的目标的观察是在同一个终端节点作为参考观察(参见randomForest)。
randomlike raw except that the X space is a single vector of uniform random [0,1] numbers generated using runif, results in random assignement of neighbors, and forces ann to be FALSE.
random喜欢原始的X是一个向量空间的均匀分布的随机[0,1]生成的数字runif,结果在随机设定在邻居和力量ann是FALSE除外。
参数:ann
TRUE if ann is used to find neighbors, FALSE if a slow search is used.
TRUE如果ann是用来发现邻居,FALSE如果一个缓慢的搜索。
参数:mtry
the number of X-variables picked at random when method is randomForest, see randomForest, default is sqrt(number of X-variables).
X-变量的数量随机方法挑选randomForest,randomForest,默认情况下是sqrt(X-变量数)。
参数:ntree
the number of classification and regression trees when method is randomForest. When more than one Y-variable is used, the trees are divided among the variables. Alternatively, ntree can be a vector of values corresponding to each Y-variable.
分类和回归树的数目,当方法是randomForest。当使用一个以上的Y变量,树木划分之间的变量。或者,ntree可以是每个Y变量对应的值的矢量。
参数:rfMode
when buildClasses and method is randomForest, continuous variables are internally converted to classes forcing randomForest to build classification trees for the variable. Otherwise, regression trees are built if your version of randomForest is newer than 4.5-18.
buildClasses和方法是randomForest,连续变量在内部转换为类的,迫使randomForest建立分类树的变量。否则,回归树是如果您的版本的randomForest是新的比4.5-18。
Details
详细信息----------Details----------
See the paper at http://www.jstatsoft.org/v23/i10 (it includes examples).
在http://www.jstatsoft.org/v23/i10的文件(包括例子)。
The following information is in addition to the content in the papers.
下面的信息是除了在文件中的内容。
You need not have any Y-variables to run yai for the following methods: euclidean, raw, mahalanobis, ica, random, and randomForest (in which case unsupervised classification is performed). However, normally yai classifies reference observations as those with no missing values for X- and Y- variables and target observations are those with values for X- variables and missing data for Y-variables. When Y is NULL (there are no Y-variables), all the observations are considered references. See newtargets for an example of how to use yai in this situation.
你不需要有任何Y变量运行艾山以下方法:euclidean,raw,mahalanobis,ica,random和<X >(在这种情况下,进行非监督分类)。然而,通常randomForest,将参考观测那些没有缺失值的X-和Y变量和目标观测的是那些与用于X-变量和丢失的数据的Y变量的值。当Y为NULL(有没有Y变量),所有的意见被认为是引用。见yai的一个例子在这种情况下,如何使用合艾。
值----------Value----------
An object of class yai, which is a list with the following tags:
类的一个对象yai,这是一个与下面的标签列表:
参数:call
the call.
该呼叫。
参数:yRefs, xRefs
matrices of the X- and Y-variables for just the reference observations (unscaled). The scale factors are attached as attributes.
矩阵的X-和Y-只是参考观测(未缩放)的变量。的比例因子作为属性附加。
参数:obsDropped
a list of the row names for observations dropped for various reasons (missing data).
观测的行的名称的列表下降因各种原因(丢失数据)。
参数:trgRows
a list of the row names for target observations as a subset of all observations.
目标观测,所有观测数据的一个子集的列名称的列表。
参数:xall
the X-variables for all observations.
X-变量的所有观测值。
参数:cancor
returned from cancor function when method msn or msn2 is used (NULL otherwise).
返回cancor功能的方法msn或msn2使用(否则为NULL)。
参数:ccaVegan
an object of class cca (from package vegan) when method gnn is used.
CCA的类的对象(包vegan)使用方法GNN。
参数:ftest
a list containing partial F statistics and a vector of Pr>F (pgf) corresponding to the canonical correlation coefficients when method msn or msn2 is used (NULL otherwise).
一个列表,其中包含部分F统计量和相应的Pr> F(PGF)的向量的典型相关系数时的方法是使用MSN或MSN2(否则为NULL)。
参数:yScale, xScale
scale data used on yRefs and xRefs as needed.
规模对yRefs和外部参照需要使用的数据。
参数:k
the value of k.
k的值。
参数:pVal
as input; only used when method msn or msn2 is used.
作为输入,只用时的方法msn或msn2使用。
参数:projector
NULL when not used. For methods msn, msn2, gnn and mahalanobis, this is a matrix that projects normalized X-variables into a space suitable for doing Eculidian distances.
NULL时不使用。对于方法MSN,MSN2,GNN和马哈拉诺比斯,这是一个矩阵,突出成的空间适于做Eculidian距离的标准化的X变量。
参数:nVec
number of canonical vectors used (methods msn and msn2), or number of independent X-variables in the reference data when method mahalanobis is used.
使用的典型向量(方法msn和msn2),或独立的X-变量在引用数据的方法mahalanobis使用。
参数:method
as input, the method used.
作为输入,所使用的方法。
参数:ranForest
a list of the forests if method randomForest is used. There is one forest for each Y-variable, or just one forest when there are no Y-variables.
如果方法randomForest用于森林中的列表。有一个森林每个Y变量,或只是一个森林时,有没有Y-变量。
参数:ICA
a list of information from fastICA when method ica is used.
的列表信息fastICA的方法ica时使用。
参数:ann
the value of ann, TRUE when ann is used, FALSE otherwise.
人工神经网络的价值,TRUE时ann使用,否则返回FALSE。
参数:xlevels
NULL if no factors are used as predictors; otherwise a list of predictors that have factors and their levels (see lm).
NULL如果没有因素被用来作为预测值;否则因素和水平的预测(见列表lm“)。
参数:neiDstTrgs
a data frame of distances between a target (identified by its row name) and the k references. There are k columns.
的数据框之间的距离的目标(所确定的行的名称)和第k引用。有k列。
参数:neiIdsTrgs
a data frame of reference identifications that correspond to neiDstTrgs.
一个数据框参考,对应到neiDstTrgs的标识。
参数:neiDstRefs, neiIdsRefs
counterparts for references.
同行参考。
(作者)----------Author(s)----------
Nicholas L. Crookston <a href="mailto:ncrookston.fs@gmail.com">ncrookston.fs@gmail.com</a> <br>
Andrew O. Finley <a href="mailto:finleya@msu.edu">finleya@msu.edu</a>
实例----------Examples----------
require (yaImpute)
data(iris)
# set the random number seed so that example results are consistent[设置随机数种子,这样的例子的结果是一致的]
# normally, leave out this command[通常情况下,离开了这个命令]
set.seed(12345)
# form some test data, y's are defined only for reference[形成一些测试数据,y的定义仅供参考]
# observations.[观测。]
refs=sample(rownames(iris),50)
x <- iris[,1:2] # Sepal.Length Sepal.Width[Sepal.Length Sepal.Width]
y <- iris[refs,3:4] # Petal.Length Petal.Width[Petal.Length Petal.Width]
# build yai objects using 2 methods[建立合艾对象,采用2种方法]
msn <- yai(x=x,y=y)
mal <- yai(x=x,y=y,method="mahalanobis")
# running the following examples will load packages vegan[运行下面的例子将加载软件包的素食主义者]
# and randomForest, and is more complicated.[和randomForest,并且是更复杂的。]
data(MoscowMtStJoe)
# convert polar slope and aspect measurements to cartesian[将极坡度和坡向测量到笛卡尔]
# (which is the same as Stage's (1976) transformation).[(这是舞台的变换(1976)的相同)。]
polar <- MoscowMtStJoe[,40:41]
polar[,1] <- polar[,1]*.01 # slope proportion[坡度比例]
polar[,2] <- polar[,2]*(pi/180) # aspect radians[方面弧度]
cartesian <- t(apply(polar,1,function (x)
{return (c(x[1]*cos(x[2]),x[1]*sin(x[2]))) }))
colnames(cartesian) <- c("xSlAsp","ySlAsp")
x <- cbind(MoscowMtStJoe[,37:39],cartesian,MoscowMtStJoe[,42:64])
y <- MoscowMtStJoe[,1:35]
mal <- yai(x=x, y=y, method="mahalanobis", k=1)
gnn <- yai(x=x, y=y, method="gnn", k=1)
msn <- yai(x=x, y=y, method="msn", k=1)
plot(mal,vars=yvars(mal)[1:16])
# reduce the plant community data for randomForest.[降低植物群落的数据randomForest的位置。]
yba <- MoscowMtStJoe[,1:17]
ybaB <- whatsMax(yba,nbig=7) # see help on whatsMax[帮助whatsMax]
rf <- yai(x=x, y=ybaB, method="randomForest", k=1)
# build the imputations for the original y's[建立的原始y的估算]
rforig <- impute(rf,ancillaryData=y)
# compare the results[比较结果]
compare.yai(mal,gnn,msn,rforig)
plot(compare.yai(mal,gnn,msn,rforig))
# build another randomForest case forcing regression[建立另一个randomForest的情况下,迫使回归]
# to be used for continuous variables. The answers differ[用于连续变量。这些问题的答案是不同的]
# but one is not clearly better than the other.[但人是不是明显优于其他。]
rf2 <- yai(x=x, y=ybaB, method="randomForest", rfMode="regression")
rforig2 <- impute(rf2,ancillaryData=y)
compare.yai(rforig2,rforig)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|