R语言 BioSeqClass包 classify()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 13:40:56

classify(BioSeqClass)
classify()所属R语言包：BioSeqClass

                                    Classification with Specific Features and Cross-Validation
                                       具有特定功能和交叉验证的分类

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Classification with selected features and cross-validation. It supports 10 classification algorithms, feature selection by Weka, cross-validation and leave-one-out test.
分类与选择的功能和交叉验证。它支持10个分类算法，WEKA的选择，交叉验证，并留出测试功能。

用法----------Usage----------

  classify(data,classifyMethod="libsvm",cv=10,
               features, evaluator, search, n=200,
               svm.kernel="linear",svm.scale=FALSE,
               svm.path, svm.options="-t 0",
               knn.k=1,
               nnet.size=2, nnet.rang=0.7, nnet.decay=0, nnet.maxit=100)

参数----------Arguments----------

参数：data
a data frame including the feature matrix and class label. The last column is a vector of class label comprising of "-1" or "+1";  Other columns are features.
一个数据框，包括特征矩阵和类标号。最后一列是一个阶级标签的向量，“-1”或“+1”的组成;其他列的功能。

参数：classifyMethod
a string for the classification method. This must be one  of the strings "libsvm", "svmlight", "NaiveBayes", "randomForest", "knn", "tree", "nnet", "rpart", "ctree", "ctreelibsvm", "bagging".
分类方法的字符串。这必须是一个字符串“libsvm的”，“svmlight”，“NaiveBayes”，“randomForest”，“KNN”，“树”，“nnet”，“软件rpart”，“ctree” “ctreelibsvm”，“套袋”。

参数：cv
an integer for the time of cross validation, or a string "leave\_one\_out"  for the jackknife test.
交叉验证的时间，或一个字符串“假\ _one \ _OUT”折刀测试的整数。

参数：features
an integer vector for the index of interested columns in data, which will be used as features for build classification model.
为整数向量感兴趣的数据列的索引，这将被用于构建分类模型的特点。

参数：evaluator
a string for the feature selection method used by WEKA. This  must be one of the strings "CfsSubsetEval", "ChiSquaredAttributeEval",  "InfoGainAttributeEval", or "SVMAttributeEval".
WEKA的使用功能选择方法的字符串。这必须是一个琴弦“CfsSubsetEval”，的“ChiSquaredAttributeEval”，“InfoGainAttributeEval”，或“SVMAttributeEval”。

参数：search
a string for the search method used by WEKA. This must be one  of the strings "BestFirst" or "Ranker".
WEKA的使用的搜索方法的字符串。这必须是一个字符串“BestFirst”或“RANKER”。

参数：n
an integer for the number of selected features.
所选功能的整数。

参数：svm.kernel
a string for kernel function of SVM.
内核SVM的函数的字符串。

参数：svm.scale
a logical vector indicating the variables to be scaled.
逻辑向量表示要缩放的变量。

参数：svm.path
a character for path to SVMlight binaries (required, if path  is unknown by the OS).
的SVMlight二进制文件的路径（需要，如果路径是未知的操作系统）的特点。

参数：svm.options
Optional parameters to SVMlight. For further details see:  "How to use" on http://svmlight.joachims.org/. (e.g.: "-t 2 -g 0.1"))
SVMlight可选参数。为进一步的详细信息，请参阅：“如何使用”上http://svmlight.joachims.org/~~V。（例如：“-T-G 0.1”））

参数：nnet.size
number of units in the hidden layer. Can be zero if there are  skip-layer units.
在隐藏层单位数目。如果有跳层单位，可以是零。

参数：nnet.rang
Initial random weights on [-rang, rang]. Value about 0.5 unless  the inputs are large, in which case it should be chosen so that  rang * max(|x|) is about 1.
初始随机权[响了，响了]。 *最大（价值约0.5除非投入大，在这种情况下，应选择使响| X |）1。

参数：nnet.decay
parameter for weight decay.
参数重量衰变。

参数：nnet.maxit
maximum number of iterations.
最大迭代次数。

参数：knn.k
number of neighbours considered in function classifyModelKNN.
在功能classifyModelKNN考虑的邻居数。

Details

详情----------Details----------

classify employ feature selction method in Weka and  diverse classification model in other R packages to perfrom classification. "Cross Validation" is controlled by parameter "cv";  "Feature Selection" is controlled by parameter "features", "evaluator", "search",  and "n"; "Classification Model Building" is controlled by parameter "classifyMethod".
classify聘请WEKA的功能和多样化的分类模型，在其他R包亲热分类selction方法。 “交叉验证”控制参数“CV”;“功能选择”参数“功能”控制，“评估”，“搜索”和“n”，“分类模型的建立”控制由参数“classifyMethod”。

Parameter "evaluator" supportes three feature selection methods provided by WEKA: "CfsSubsetEval": Evaluate the worth of a subset of attributes by considering  the individual predictive ability of each feature along with  the degree of redundancy between them. "ChiSquaredAttributeEval": Evaluate the worth of an attribute by computing the  value of the chi-squared statistic with respect to  the class. "InfoGainAttributeEval": Evaluate attributes individually by measuring  information gain with respect to the class. "SVMAttributeEval": Evaluate the worth of an attribute by using an SVM classifier.  Attributes are ranked by the square of the weight assigned  by the SVM. Attribute selection for multiclass problems is  handled by ranking attributes for each class seperately  using a one-vs-all method and then "dealing" from the top  of each pile to give a final ranking.
参数“评估”supportes WEKA内提供的三个特征选择方法：“CfsSubsetEval”：评估值得考虑每个功能的预测能力，以及它们之间的冗余程度的个人属性的一个子集。 “ChiSquaredAttributeEval”计算卡方统计值类属性的价值评估。 “InfoGainAttributeEval”：单独评估测量类信息增益的属性。 “SVMAttributeEval”：使用SVM分类属性的价值评估。属性的排名由SVM的分配重量的平方。处理多类问题的属性选择为每个类分开使用一个VS-所有的方法，然后从每根桩的顶部的“交易”给一个最终的排名，排名的属性。

Parameter "search" supportes three feature subset search methods provided by WEKA: "BestFirst":  Searches the space of attribute subsets by greedy hillclimbing  augmented with a backtracking facility. Setting the number of  consecutive non-improving nodes allowed controls the level of  backtracking done. Best first may start with the empty set of  attributes and search forward, or start with the full set of  attributes and search backward, or start at any point and search  in both directions (by considering all possible single attribute  additions and deletions at a given point). "Ranker": Ranks attributes by their individual evaluations.
参数“搜索”supportes三个特征子集搜索WEKA内提供的方法：“BestFirst”：搜索的属性子集的由回溯设施增强贪婪hillclimbing空间。设置连续非改善允许节点数量控制回溯完成的水平。最好先与空集的属性和向前搜索可能会开始，或开始全套的属性和向后搜索，或在任何时候和在两个方向上的搜索开始（在某一时间点考虑所有可能的单一属性增删）。 “RANKER”：由他们个人的评价队伍的属性。

Parameter "classifyMethod" supports multiple classification model: "libsvm": Employ classifyModelLIBSVM to perform Support Vecotr  Machine by LibSVM. Package "e1071" is required. "svmlight": Employ classifyModelSVMLIGHT to Support Vecotr Machine  by SVMLight. Package "klaR" is required. "NaiveBayes": Employ classifyModelNB to perform Naive Bayes  classification. Package "klaR" is required. "randomForest": Employ classifyModelRF to perform random forest classification. Package "randomForest" is required. "knn": Employ classifyModelKNN to perform k Nearest Neighbor  algorithm. Package "class" is required. "tree": Employ classifyModelTree to perform tree classification. Package "tree" is required. "nnet": Employ classifyModelNNET to perform neural net algorithm.  Bundle "VR" is required. "rpart": Employ classifyModelRPART to perform Recursive  Partitioning and Regression Trees. Package "rpart" is required. "ctree": Employ classifyModelCTREE to perform Conditional  Inference Trees. Package "party" is required. "ctreelibsvm": Employ classifyModelCTREELIBSVM to combine Conditional  Inference Trees and Support Vecotr Machine for classification.  For each node in the tree, one SVM model will be constructed using train data in this node. Test data will be firstly classified  to one node of the tree, and then use corresponding SVM to do  classification. Package "party" and "e1071" is required. "bagging": Employ classifyModelBAG to perform bagging for  classification trees. Package "ipred" is required.
参数“classifyMethod”支持多个分类模型：“libsvm的”雇用classifyModelLIBSVM由LIBSVM支持Vecotr的机执行。要求包“e1071”。 “svmlight”：聘请classifyModelSVMLIGHT支持的SVMLight Vecotr的机器。要求包“克拉尔”。 “NaiveBayes”：雇用classifyModelNB执行朴素贝叶斯分类。要求包“克拉尔”。 “randomForest”：雇用classifyModelRF执行随机森林分类。包“randomForest”是必需的。 “KNN”：聘请classifyModelKNNk近邻算法来执行。包“类”是必需的。 “树”：采用classifyModelTree执行树分类。包“树”是必需的。 “nnet”：雇用classifyModelNNET执行神经网络算法。捆绑“虚拟现实”是必需的。 “软件rpart”的聘请classifyModelRPART执行递归分割和回归树。要求包“软件rpart”。 “ctree”：雇用classifyModelCTREE执行条件推理树。包“党”是必需的。 “ctreelibsvm”：聘请classifyModelCTREELIBSVM分类相结合的条件推理树和支持Vecotr机。树中的每个节点，将兴建一个SVM模型，在此节点中使用的训练数据。试验数据将首先分类树的一个节点，然后使用相应的SVM来做分类。包“党”和“e1071”是必需的。 “套袋”：雇用classifyModelBAG执行分类树套袋。包“ipred的”是必需的。

作者（S）----------Author(s)----------

Hong Li

举例----------Examples----------

  ## read positive/negative sequence from files.[＃从文件中读取正/负序列。]
  tmpfile1 = file.path(.path.package("BioSeqClass"), "example", "acetylation_K.pos40.pep")
  tmpfile2 = file.path(.path.package("BioSeqClass"), "example", "acetylation_K.neg40.pep")
  posSeq = as.matrix(read.csv(tmpfile1,header=FALSE,sep="\t",row.names=1))[,1]
  negSeq = as.matrix(read.csv(tmpfile2,header=FALSE,sep="\t",row.names=1))[,1]
  seq=c(posSeq,negSeq)
  classLable=c(rep("+1",length(posSeq)),rep("-1",length(negSeq)) )
  data = data.frame(featureBinary(seq),classLable)

  ## Use LibSVM and 5-cross-validation to classify.[＃使用的LIBSVM及5交叉验证分类。]
  LIBSVM_CV5 = classify(data,classifyMethod="libsvm",cv=5,
               svm.kernel="linear",svm.scale=FALSE)
  ## Features selection is done by envoking "CfsSubsetEval" method in WEKA.             [＃特点选择做通过envoking在WEKA的“CfsSubsetEval”的方法。]
  FS_LIBSVM_CV5 = classify(data,classifyMethod="libsvm",cv=5,evaluator="CfsSubsetEval",
               search="BestFirst",svm.kernel="linear",svm.scale=FALSE)

  if(interactive()){

KNN_CV5 = classify(data,classifyMethod="knn",cv=5,knn.k=1)

RF_CV5 = classify(data,classifyMethod="randomForest",cv=5)

TREE_CV5 = classify(data,classifyMethod="tree",cv=5)

NNET_CV5 = classify(data,classifyMethod="nnet",cv=5)

RPART_CV5 = classify(data,classifyMethod="rpart",cv=5,evaluator="")

CTREE_CV5 = classify(data,classifyMethod="ctree",cv=5,evaluator="")

BAG_CV5 = classify(data,classifyMethod="bagging",cv=5,evaluator="")

  }

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册