selectFeatures(SlimPLS)
selectFeatures()所属R语言包:SlimPLS
Selects the best features using various scoring methods from a given expression matrix
选择最佳的功能,使用不同的评分方法从一个给定的表达矩阵
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The function selects the top scoring features according to the given scoring method in a supervised manner. Every scoring method computes a score for each feature in the expression matrix, with respect to a given labeling of the samples. The selected features can later be used for reducing the dimension of an expression matrix
功能选择得分最高的功能,根据给定的评分的方法,在有监督的方式。每个评分的方法,计算出的得分为表达基质中的每个功能,相对于一个给定的标签的样品。购买所选择的特征可以被用于减少的表达矩阵的维数
用法----------Usage----------
selectFeatures(select_method, exp_mat, num_class_a=0, num_class_b=0, class_a="",
class_b="", num_features)
参数----------Arguments----------
参数:select_method
select_method can be one of the following strings, denoting the scoring function for a feature: "CORREL" - Pearson correlation score of the expression levels and the labeling vector. "TTEST" - Two samples Student's t-test score for the set of expression levels from class 1 and the set of expression levels from class 2. "FC" - Fold change score between the average of values of samples from class 1 and samples from class 2 "GOLUB" - Golub criterion score. "MI" - Mutual information between the expression levels and the labeling vector. Can be used only if bioDist is installed.
select_method可以是以下字符串之一,表示打分函数的功能:"CORREL" - Pearson相关性得分的表达水平和标签的向量。 "TTEST" - 两个样本的T-测试得分的组的表达水平从1级和2级组的表达水平。 "FC" - 倍数变化值从1级的样品和样品从2级"GOLUB" - 戈卢布标准评分的平均得分之间。 "MI" - 互信息之间的表达水平和标签的向量。可用于仅当bioDist安装。
参数:exp_mat
exp_mat is an expression matrix of type expMat, usually created by reading an expression matrix from a file using readExpMat. The matrix is supposed to have all samples belonging to class 1 grouped together in the first columns, following all samples from class 2 grouped together as well.
exp_mat是一个表达式,矩阵型expMat,通常创建从文件读取表达矩阵readExpMat。矩阵应该有属于组合在一起,在第一列的第1类,2类组合在一起,并从所有样品所有样品。
参数:num_class_a
num_class_a is the number of samples belonging to class 1
num_class_a是属于1类的样本数
参数:num_class_b
num_class_b is the number of samples belonging to class 2
num_class_b是属于第2级的数目的样本
参数:class_a
class_a is the class label of class a. May be used if the expression matrix has labels in its second row.
class_a是一类类的标签。表达矩阵可以使用,如果在第二行有标签。
参数:class_b
class_b is the class label of class b. May be used if the expression matrix has labels in its second row.
class_b是类B类的标签。表达矩阵可以使用,如果在第二行有标签。
参数:num_features
num_features is the total number of features to be selected.
num_features功能被选中的总数。
Details
详细信息----------Details----------
The function selects num_features features that achieve the best scoring using the selected scoring method. The selection is supervised, based on two classes, so the user must supply a class parameter for every sample. This is done by providing a sorted matrix, where all the samples from class a precede all the samples from class b. In addition to the sorted matrix, the user must provide the number of samples from class a and b. Alternatively the user may provide an unsorted matrix, with labels in its second row, denoting the class of each sample. In this case the user must also provide two labels to the selection function - one for class a and one for class b. For the mutual information selection filter, bioDist needs to be installed first. See http://www.bioconductor.org/packages/release/bioc/html/bioDist.html
该函数选择num_features的的功能,实现了最好的得分使用选定的得分方法。的选择负责监督,两班的基础上,因此,用户必须为每个样品提供一类参数。这是提供一个排序矩阵,其中所有样品从A类之前所有的样品B类。除了排序条件矩阵时,用户必须提供从类a和b的样本的数目。另外,用户可以提供一个未排序的矩阵,在其第二行中的标签,表示各样品的类。在这种情况下,用户也必须提供两个标签,以选择功能 - 类A和一个B类。选择滤光器的互信息,bioDist首先需要安装。见http://www.bioconductor.org/packages/release/bioc/html/bioDist.html
值----------Value----------
returns a featureSet object, which holds the selected features, and can later be used for learning a classification model by trainClassifier
返回一个featureSet对象,用于保存选定的功能,并可以在以后用于学习一个分类模型的trainClassifier
(作者)----------Author(s)----------
Michael gutkin, Ofer Lavi
参考文献----------References----------
Coller H, Loh ML, Downing JR, Caligiuri MA et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537.
2. Hamming RW: Coding and Information Theory: Prentice-Hall Inc.; 1980.
3. Everitt BS, Hothorn T: A Handbook of Statistical Analyses Using R:
参见----------See Also----------
selectFeatures, selectFeaturesSlimPLS, getClassification, readExpMat
selectFeatures,selectFeaturesSlimPLS,getClassification,readExpMat
实例----------Examples----------
# reads an expression matrix with no class labels into exp_mat[到exp_mat没有类标签读取表达矩阵]
## Not run: [#不运行:]
exp_mat <- readExpMat("golub_leukemia_data_training.csv", FALSE)
## End(Not run)[#(不执行)]
# selects a set of features into the features variable. The matrix we read is sorted by[选择一组功能的功能变数。我们读的矩阵进行排序]
# classes, and it is known to have 41 samples of class a and 20 samples of class b.[类,它是已知的有41个样品B类和第20类样品。]
# The selection is done using best 100 two-samples t-test scores.[选择完成使用最佳100两样本t-测试成绩。]
## Not run: [#不运行:]
features <- selectFeatures("TTEST", exp_mat, num_class_a=41, num_class_b=20,
num_features=100)
## End(Not run)[#(不执行)]
# reads an expression matrix with class labels into exp_mat2[矩阵类标签读取表达exp_mat2]
## Not run: [#不运行:]
exp_mat2 <- readExpMat("golub_leukemia_data_with_classes_training.csv", TRUE)
## End(Not run)[#(不执行)]
# selects a set of features into the features2 variable. The matrix we read has a class[选择一套功能到features2变量。的矩阵,我们有一个类]
# label for each sample in its second row. Labels are either "AML" or "ALL". [每个样品在其第二行中的标签。标签是“反垄断法”或“ALL”。]
# Selection is done using the fold change score, selecting the 200 features with the [选择使用的倍数变化得分,选择200个特征]
# largest fold change between the "AML" samples and the "ALL" samples.[“反垄断法”样本和“ALL”的样本之间的最大倍数变化。]
## Not run: [#不运行:]
features2 <- selectFeatures("FC", exp_mat, num_class_a=41, num_class_b=20,
num_features=200)
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|