R语言 spa包 spa()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-30 14:22:05

spa(spa)
spa()所属R语言包：spa

                                       sequential prediction algorithm
                                       连续预测算法

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This performs the sequential predictions algorithm "spa" in R as described in the references below.  It can fit a graph only estimate (y=f(G)) and graph-based semi-parametric estimate (y=Xb+f(G)). Refer to the example below.
执行顺序的预测算法“水疗”在R中的说明下面的参考资料。它可以适应一个图只估计（Y = F（G））和基于图形的半参数估计（Y = XB + F（G））。请参考下面的例子。

The approach distinguishes between inductive prediction (predict function) and transductive prediction (update function).  This is documented in the user manual and references.
的方法区分感性的预测（predict功能）和转导的预测（update功能）。这是记录在用户手册和参考。

用法----------Usage----------

spa(y,x,graph,type=c("soft","hard"),kernel=function(r,lam=0.1){exp(-r/lam)},
global,control,...)

参数----------Arguments----------

参数：y
response of length m<=n
响应长度为m <= N

参数：x
n by p predictor data set (assumed XU is in space spanned by XL).
Ň由p预估数据集（假设徐空间是由XL）。

参数：graph
n by n dissimilarity matrix for a graph.
N由n相异度矩阵图。

参数：type
whether soft labels (squared error), or hard labels (exponential). Soft is default.
是否（方差）软标签，硬标签（指数）。软是默认的。

参数：kernel
kernel function (default=heat)
核函数（默认值=热）

参数：global
(optional) the global estimate to lend weight to (default is mean of known responses).
（可选）在全球估计借给重量（默认情况下是指已知的反应）。

参数：control
spa control parameters (refer to spa.control for more information)
水疗中心的控制参数（更多信息，参考spa.control）

参数：...
Currently ignored
目前被忽略

Details

详细信息----------Details----------

If the response is continuous the algorithm only uses soft labels (hard labels is not appropriate or sensible).
如果响应是连续的，算法只使用软标签（硬标签是不恰当或合理的）。

In classification the algorithm distinguishes between hard and soft labeled versions.  To use hard labels both type="hard" must be set and the response must be two leveled (note it does not have to be a factor, also classification of a set-aside x data is not possible).  The main issue between these involves rounding the PCE at each iteration (hard=yes, soft=no).  If soft labels are used then the base algorithm converges to a closed form solution, which results in fast approximations for GCV, and direct implementation of that solution as opposed to iteration (currently implemented).  For hard labels this is not the case. As a result approximate GCV and full GCV are not properly developed and if specified the procedure performs  them with the soft version for parameter estimation.
在分类的算法区分软，硬标记版本。要使用硬标签两种类型=“硬”，必须设置和响应必须是平整（注意它不必是一个因素，也是分类的一组预留X数据是不可能的）。在这些主要的问题涉及到舍入PCE在每一次迭代（硬= yes时，软=）。如果使用软标签，然后基算法收敛到一个封闭的形式的解决方案，这导致在GCV快速逼近，该解决方案，而不是迭代（当前实施的）和直接执行。对于硬盘的标签，这是并非如此。因此近似GCV和全GCV不正确制定和如果指定了程序执行的软版本的参数估计。

The update function also employs a distinction between hard/soft labels.  For hard labels the algorithm employs the pen=hlasso (hyperbolic l1 penalty) whereas soft labels employs the pen=ridge. One can also use the ridge penalty with hard labels but it is uncertain why this would be considered.
更新功能，还采用了软/硬标签之间的区别。对于硬标签，该算法采用了的笔= hlasso（双曲l1罚），而软标签采用的笔=脊。你也可以使用硬标签的脊处罚，但为什么这会被认为是不确定的。

The code provides semi-supervised graph-based support for R.
该代码提供了基于图的半监督支持R.

注意----------Note----------

To control parameter estimation, the parameters lmin, lmax and ldepth are set through spa.control.  For this procedure GCV is used as the criteria, where unlabeled data influence GCV.  Use spa.control to set this as well.  Options include, agcv for approximate transductive gcv, fgcv for gcv applied to the full smoother, lgcv for labeled data only or supervised gcv, and  tgcv for pure transductive gcv (slow).  The fgcv flag has been depreciated.  Refer to spa.control and the references below for more.
为了控制参数估计，参数lmin的，Lmax和ldepth的的设置通过spa.control。对于此过程GCV的使用为标准，在未标记的数据影响丙氧鸟苷（GCV）。使用spa.control设置这一点。选项包括，agcv近似转导GCV，fgcv为GCV的充分平滑，lgcv：只为标记的数据或监督GCV，并tgcv为：纯直推式GCV（慢）。 fgcv标志已经贬值。请参阅spa.control和下面的参考资料更多的。

参考文献----------References----------

M. Culp (2011). spa: A Semi-Supervised R Package for Semi-Parametric Graph-Based Estimation. Journal of Statistical Software, 40(10), 1-29. URL http://www.jstatsoft.org/v40/i10/.
M. Culp and G. Michailidis (2008) Graph-based Semi-supervised Learning. IEEE Pattern Analysis And Machine Intelligence.  30

1)

实例----------Examples----------

## SPA in Multi-view Learing -- (Y,X,G) case.[＃SPA在多浏览式学习 - （Y，X，G）的情况下。]
## (refer to coraAI help page for more information).[＃（参考coraAI帮助页面获取更多信息）。]
## 1) fit model Y=f(G)+epsilon[＃1）拟合模型Y = F（G）+ EPSILON]
## 2) fit model Y=XB+f(G)+epsilon[2）拟合模型Y = XB + F（G）+ EPSILON]

data(coraAI)
y=coraAI$class
x=coraAI$journals
g=coraAI$cite

##remove papers that are not cited[＃删除文件，并没有被]
keep<-which(as.vector(apply(g,1,sum)>1))
y<-y[keep]
x<-x[keep,]
g=g[keep,keep]

##set up testing/training data (3.5% stratified for training)[＃设置测试/训练数据（3.5％分层培训）]
set.seed(100)
n<-dim(x)[1]
Ns<-as.vector(apply(x,2,sum))
Ls<-sapply(1:length(Ns),function(i)sample(which(x[,i]==1),ceiling(0.035*Ns[i])))
L=NULL
for(i in 1:length(Ns)) L=c(L,Ls[[i]])
U<-setdiff(1:n,L)
ord<-c(L,U)
m=length(L)
y1<-y
y1[U]<-NA

##Fit model on G[合适的模型在G]
A1=as.matrix(g)
gc=spa(y1,graph=A1,control=spa.control(dissimilar=FALSE))
gc

##Compute error rate for G only[计算错误率只为G]
tab=table(fitted(gc)[U]>0.5,y[U])
1-sum(diag(tab))/sum(tab)

##Note problem[＃注意的问题]
sum(apply(A1[U,L],1,sum)==0)/(n-m)*100 ##Answer: 39.79849[答：39.79849]

##39.8% of unlabeled observations have no connection to a labeled one.[39.8％有没有连接到标记的未标记的观测。]

##Use Transuductive prediction with SPA to fix this with parameters k,l [＃使用Transuductive的预测与SPA来解决这个参数K，L]
pred=update(gc,ynew=y1,gnew=A1,dat=list(k=length(U),l=Inf))
tab=table(pred[U]>0.5,y[U])
1-sum(diag(tab))/sum(tab)

##Replace earlier gj with the more predictive transductive model[更直推式模型预测，取代了早期的GJ]
gc=update(gc,ynew=y1,gnew=A1,dat=list(k=length(U),l=Inf),trans.update=TRUE)
gc

## (Y,X,G) case to fit Y=Xb+f(G)+e[＃（X，Y，G）的情况下，适合Y = XB + F（G）+ E]
gjc<-spa(y1,x,g,control=spa.control(diss=FALSE))
gjc

##Apply SPA as transductively to fix above problem[应用SPA transductively来解决上述问题]
gjc1=update(gjc,ynew=y1,xnew=x,gnew=A1,dat=list(k=length(U),l=Inf),trans.update=TRUE)
gjc1

##Notice that the unlabeled transductive adjustment provided new estimates[＃请注意，未标记的直推式的调整提供了新的估计]
sum((coef(gjc)-coef(gjc1))^2)

##Check testing performance to determine the best model settings[＃检查测试，以确定最好的模型设置]
tab=table((fitted(gjc)>0.5)[U],y[U])
1-sum(diag(tab))/sum(tab)

tab=table((fitted(gjc1)>0.5)[U],y[U])
1-sum(diag(tab))/sum(tab)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册