trainClassifier(SlimPLS)
trainClassifier()所属R语言包:SlimPLS
Creates a classification model using a labeled expression matrix and a
创建一个使用标记的表达矩阵和分类模型
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function creates a binary classification model using various supervised learning algorithms and the set of features selected or extracted by selectFeatures selectFeaturesSlimPLS which output a featureSet object. The model can be later used to classify new instances using get_prdeiction function. The function wraps svm and random forest classifiers supplied by external packages
这个函数创建一个二元分类模型,使用各种监督学习算法和设置功能选择或提取selectFeaturesselectFeaturesSlimPLS的输出一个featureSet的对象。该模型可以用于,使用get_prdeiction功能进行分类的新实例。功能包装的SVM和随机森林分类器所提供的外部包
用法----------Usage----------
trainClassifier(learn_method, exp_mat, num_class_a=0, num_class_b=0, class_a="",
class_b="", feature_set, use_components_as_features = FALSE)
参数----------Arguments----------
参数:learn_method
learn_method can receive one of three strings denoting differen classification algorithms: "SVM_LINEAR" - a linear kerenl svm, "SVM_RADIAL" - a radial kerenl svm, "RANDOM_FOREST" - a random forest based collection of decision trees. The svm variants chooses the cost parameter by leave one out cross validation. The random forest size bulit is at most 1500 trees.
learn_method可以接收的三个字符串表示不同的分类算法之一:"SVM_LINEAR" - 线性kerenl SVM,"SVM_RADIAL" - 径向kerenl的支持向量机,"RANDOM_FOREST"“ - 一个随机的森林收集的决策树。支持向量机的变体选择留一交叉验证的成本参数。的的随机森林面积储存卡是在最1500株。
参数:exp_mat
exp_mat is an expression matrix of type expMat, usually created by reading an expression matrix from a file using readExpMat. The matrix is supposed to have all samples belonging to class 1 grouped together in the first columns, following all samples from class 2 grouped together as well.
exp_mat是一个表达式,矩阵型expMat,通常创建从文件读取表达矩阵readExpMat。矩阵应该有属于组合在一起,在第一列的第1类,2类组合在一起,并从所有样品所有样品。
参数:num_class_a
num_class_a is the number of samples belonging to class 1
num_class_a是属于1类的样本数
参数:num_class_b
num_class_b is the number of samples belonging to class 2
num_class_b是属于第2级的数目的样本
参数:class_a
class_a is the class label of class a. May be used if the expression matrix has labels in its second row.
class_a是一类类的标签。表达矩阵可以使用,如果在第二行有标签。
参数:class_b
class_b is the class label of class b. May be used if the expression matrix has labels in its second row.
class_b是类B类的标签。表达矩阵可以使用,如果在第二行有标签。
参数:feature_set
feature_set is an object of type featureSet, created by invoking either selectFeatures or selectFeaturesSlimPLS.
feature_set是类型featureSet,通过调用是selectFeatures或selectFeaturesSlimPLS创建的对象。
参数:use_components_as_features
The use_components_as_features parameter is only applicable for SlimPLS features. If FALSE features are used as is. If TRUE new features are constructed from the basic features. New features are a set of linear combinations of the basic features.
use_components_as_features参数只适用于SlimPLS功能。如果FALSE功能使用了。如果TRUE新的功能构造的基本特征。新功能是一组线性组合的基本特征。
Details
详细信息----------Details----------
The selection method by feature_set is created is transparent to the trainClassifier function. The feature_set already holds both information about the set of selected features, and for SlimPLS based features it also holds their weights, and information required for constructing new features which are linear combinations (aka compnents) of the selected features. Learning is based on external packages (e1071 and randomForest. The function simply reduces the dimension of the given expression matrix using the selected features supplied, and invokes a learning algorithm on the reduced matrix. The learning is supervised, based on two classes, so the user must supply a class parameter for every sample. This is done by providing a sorted matrix, where all the samples from class a precede all the samples from class b. In addition to the sorted matrix, the user must provide the number of samples from class a and b. Alternatively the user may provide an unsorted matrix, with labels in its second row, denoting the class of each sample. In this case the user must also provide two labels to the selection function - one for class a and one for class b.
feature_set创建的选择方法是透明的trainClassifier功能。 feature_set已经同时拥有信息的一组选定的功能,并为SlimPLS为基础的功能,它也拥有自己的权重,并兴建新的特点,所选功能的线性组合(又名compnents)所需的信息。学习是基于外部的包(e1071和randomForest。的功能只是减少使用所选功能提供给定的表达矩阵的维数,减少矩阵的学习算法,并调用。学习是监督,基于两个类的,所以用户必须提供一类参数:对于每个样本,这是通过提供一个排序的矩阵,其中所有的样品从类a b类的所有样本之前,在除了排序条件矩阵,用户必须提供从类a和b的样本的数目。另外,用户可以提供一个未排序的矩阵,在其第二行中的标签,表示各样品的类,在这种情况下,用户还必须提供两个标签,以选择功能 - 一个A类和B类。
值----------Value----------
The function returns a classificationModel object, which holds the actual model created by the external packages, and the feature set used for building the model. The object can later be used to get prediction from new instances.
该函数返回一个classificationModel对象,拥有实际的模型由外部包,用于建立模型的功能集。后来,该对象可以被用于预测新的实例。
(作者)----------Author(s)----------
Michael Gutkin, Ofer Lavi
参考文献----------References----------
In COLT '92: Proceedings of the fifth annual workshop on Computational Learning Theory, pages 144-152, New York, NY, USA, 1992. ACM.
2. Liaw, A. and Wiener M., Classification and Regression by randomForest, R News 2(3), 18-22, 2002.
参见----------See Also----------
selectFeatures, selectFeaturesSlimPLS, getClassification, readExpMat
selectFeatures,selectFeaturesSlimPLS,getClassification,readExpMat
实例----------Examples----------
# reads an expression matrix with class labels into exp_mat2[矩阵类标签读取表达exp_mat2]
## Not run: [#不运行:]
exp_mat2 <- readExpMat("golub_leukemia_data_with_classes_training.csv", TRUE)
## End(Not run)[#(不执行)]
# selects a set of features into the features2 variable. The matrix we read has a class[选择一套功能到features2变量。的矩阵,我们有一个类]
# label for each sample in its second row. Labels are either "AML" or "ALL".[每个样品在其第二行中的标签。标签是“反垄断法”或“ALL”。]
# Selection is done using the SlimPLS method. Up to two components with 25 features in[选择做使用SlimPLS方法。两部件25功能]
# each component will be selected.[每个组件将被选中。]
## Not run: [#不运行:]
features2 <- selectFeaturesSlimPLS(exp_mat, class_a="AML", class_b="ALL",
num_features=50, component_size=25,
p_value_threshold=0)
## End(Not run)[#(不执行)]
# train a SVM classifier with linear kernel on the expression matrix using the[训练SVM分类与表达矩阵的线性核]
# individual feature that are part of the components selected earlier, and are now in[单独的功能,是先前选定的组件的一部分,现在在]
# features2.[features2。]
## Not run: [#不运行:]
model_t <- trainClassifier("SVM_LINEAR", exp_mat2, 0,0, "AML", "ALL", features2,
FALSE )
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|