找回密码
 注册
查看: 513|回复: 0

R语言 flowClust包 flowClust()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 17:37:49 | 显示全部楼层 |阅读模式
flowClust(flowClust)
flowClust()所属R语言包:flowClust

                                        Robust Model-based Clustering for Flow Cytometry
                                         基于模型的鲁棒聚类流式单元仪

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

This function performs automated clustering for identifying cell populations in flow cytometry data.  The approach is based on the t mixture model with the Box-Cox transformation, which provides a unified framework to handle outlier identification and data transformation simultaneously.
此功能进行识别的单元群,流式单元仪数据自动聚类。该方法是基于t混合模型与Box-Cox变换,它提供了一个统一的框架,同时处理离群的识别和数据转换。


用法----------Usage----------


flowClust(x, expName="Flow Experiment", varNames=NULL, K, B=500,
          tol=1e-5, nu=4, lambda=1, nu.est=0, trans=1,
          min.count=10, max.count=10, min=NULL, max=NULL,
          level=0.9, u.cutoff=NULL, z.cutoff=0, randomStart=10,
          B.init=B, tol.init=1e-2, seed=1, criterion="BIC",
          control=NULL)



参数----------Arguments----------

参数:x
A numeric vector, matrix, data frame of observations, or object of class flowFrame.  Rows correspond to observations and columns correspond to variables.
一个数值向量,矩阵,观测的数据框,或类flowFrame的对象。行对应到的意见和列对应至变数。


参数:expName
A character string giving the name of the experiment.
一个字符串,使实验的名称。


参数:varNames
A character vector specifying the variables (columns) to be included in clustering.  When it is left unspecified, all the variables will be used.
指定一个字符向量聚类变量(列)。当它被指定,将使用所有的变量。


参数:K
An integer vector indicating the numbers of clusters.
一个整数向量表示数字聚类。


参数:B
The maximum number of EM iterations.
的EM迭代的最大数量。


参数:tol
The tolerance used to assess the convergence of the EM.
公差用于评估的EM的收敛。


参数:nu
The degrees of freedom used for the t distribution.  Default is 4.  If nu=Inf, Gaussian distribution will be used.
用于t分布的自由度。默认值为4。如果nu=Inf,将用于高斯分布。


参数:lambda
The initial transformation to be applied to the data.
被应用到数据的初步转型。


参数:nu.est
A numeric indicating whether nu is to be estimated or not.  May take 0 (no estimation, default), 1 (estimation) or 2 (cluster-specific estimation).
一个数值是否nu是估计或不。可能会采取0(没有估计,默认),1(估计)或2(特定于聚类的估计)。


参数:trans
A numeric indicating whether the Box-Cox transformation parameter is estimated from the data.  May take 0 (no estimation), 1 (estimation, default) or 2 (cluster-specific estimation).
一个数字表明是否Box-Cox转换参数估计数据。可能没有估计(0),1(估计,默认)或2(特定于聚类的估计)。


参数:min.count
An integer specifying the threshold count for filtering data points from below.  The default is 10, meaning that if 10 or more data points are smaller than or equal to min, they will be excluded from the analysis.  If min is NULL, then the minimum of data as per each variable will be used.  To suppress filtering, set it as -1.
一个整数,指定用于过滤从下面的数据点的阈值计数。默认为10,这意味着,如果10个或更多的数据点都小于或等于min,他们将被排除在分析之外。如果min是NULL,则最低,为每一个变量的数据将被使用。为了抑制过滤,设置为-1。


参数:max.count
An integer specifying the threshold count for filtering data points from above.  Interpretation is similar to that of min.count.
一个整数,指定用于过滤从上面的数据点的阈值计数。解释是类似min.count。


参数:min
The lower boundary set for data filtering.  Note that it is a vector of length equal to the number of variables (columns), implying that a different value can be set as per each variable.
下边界设置用于过滤数据。请注意,它是一个长度等于变量的数目(列)向量,这意味着可以为每个变量设置一个不同的值。


参数:max
The upper boundary set for data filtering.  Interpretation is similar to that of min.
上边界设置用于过滤数据。解释是类似min。


参数:level
A numeric value between 0 and 1 specifying the threshold quantile level used to call a point an outlier.  The default is 0.9, meaning that any point outside the 90% quantile region will be called an outlier.
一个0和1之间的数值指定阈值位数水平,用于调用一个点离群。默认值是0.9,即90%的分量区域以外的任何一点被称为离群。


参数:u.cutoff
Another criterion used to identify outliers.  If this is NULL, then level will be used.  Otherwise, this specifies the threshold (e.g., 0.5) for u, a quantity used to measure the degree of “outlyingness” based on the Mahalanobis distance.  Please refer to Lo et al. (2008) for more details.
另一个标准用来确定离群。如果这是NULL,则level将使用。否则,此指定为u阈值(如0.5),用来衡量程度“outlyingness”基于马氏距离的数量。请参阅Lo等人。 (2008年)的更多细节。


参数:z.cutoff
A numeric value between 0 and 1 underlying a criterion which may be used together with level/u.cutoff to identify outliers.  A point with the probability of assignment z (i.e., the posterior probability that a data point belongs to the cluster assigned) smaller than z.cutoff will be called an outlier.  The default is 0, meaning that assignment will be made no matter how small the associated probability is, and outliers will be identified solely based on the rule set by level or cutoff.
0和1之间的一个数值,基本一个可用于与level/u.cutoff识别离群的标准。 A点与转让的概率z(即一个数据点属于分配的聚类)比z.cutoff小,后验概率将被称为离群。默认为0,这意味着,转让,将关联的概率是不管多么小,离群将确定只根据level或cutoff设置规则上。


参数:randomStart
A numeric value indicating how many times a random parition of the data is generated for initialization.  The default is 10, meaning that 10 random partitions of the data will be generated, each of which is followed by a short EM run.  The partition leading to the highest likelihood value will be adopted to be the initial partition for the eventual long EM run.   </table>
多少次随机的数据分区之前初始化产生一个数值。默认是10,也就是说,10随机分区的数据将产生,其中每一个简短的EM运行。导致的可能性价值最高的分区将被采纳,为最终实现术语的EM运行的初始分区。 </ TABLE>


参数:B.init
The maximum number of EM iterations following each random partition in random initialization.
随机随机初始化每个分区的最大数量的EM迭代。


参数:tol.init
The tolerance used as the stopping criterion for the short EM runs in random initialization.
作为新兴的短期随机初始化运行的停止准则的宽容。


参数:seed
An integer giving the seed number used when randomStart>0.
一个整数,种子数时使用randomStart>0。


参数:criterion
A character string stating the criterion used to choose the best model.  May take either "BIC" or "ICL".  This argument is only relevant when length(K)>1.
一个字符串说明的标准来选择最好的模型。可以采取任何"BIC"或"ICL"。这种说法只是有关length(K)>1。


参数:control
An argument reserved for internal use.
保留供内部使用的一个参数。


Details

详情----------Details----------

Estimation of the unknown parameters (including the Box-Cox parameter) is done via an Expectation-Maximization (EM) algorithm.  At each EM iteration, Brent's algorithm is used to find the optimal value of the Box-Cox transformation parameter.  Conditional on the transformation parameter, all other estimates can be obtained in closed form.  Please refer to Lo et al. (2008) for more details.
未知参数的估计(包括箱考克斯参数)是通过期望最大化(EM)算法。在每个EM迭代,布伦特的算法用来寻找Box-Cox转换参数的最佳值。转换参数条件下,所有其他的估计,可以在封闭的形式获得。请参阅Lo等人。 (2008年)的更多细节。

The flowClust package makes extensive use of the GSL as well as BLAS.  If an optimized BLAS library is provided when compiling the package, the flowClust package will be able to run multi-threaded processes.
flowClust包广泛使用,的吉斯达以及作为的BLAS。如果编制方案时,提供一个优化的BLAS库,flowClust包将能够运行多线程的进程。

Various operations have been defined for the object returned from flowClust.  These include:
各种操作已被定义为从flowClust返回的对象。这些措施包括:

In addition, to facilitate the integration with the flowCore package for processing flow cytometry data, the flowClust operation can be done through a method pair (tmixFilter and filter) such that various methods defined in flowCore can be applied on the object created from the filtering operation.
此外,为方便一体化flowCore处理流式单元仪数据包,flowClust操作方法对(tmixFilter和filter)等可以通过flowCore定义的各种方法,可应用于从过滤操作中创建的对象。


值----------Value----------

If K is of length 1, the function returns an object of class flowClust containing the following slots, where K is the number of clusters, N is the number of observations and P is the number of variables:
K如果长度为1,函数返回一个类的对象flowClust包含以下插槽,K是数字聚类,N的若干意见和P是变量的数目:


参数:expName
Content of the expName argument.
内容expName论点。


参数:varNames
Content of the varNames argument if provided; generated if available otherwise.
varNames参数,如果提供的内容,如果可用,否则产生的。


参数:K
An integer showing the number of clusters.
一个整数,显示的数字聚类。


参数:w
A vector of length K, containing the estimates of the K cluster proportions.
一个向量长度K,K聚类比例估计。


参数:mu
A matrix of size K x P, containing the estimates of the K mean vectors.
一个矩阵的大小K x P,包含K估计意味着向量。


参数:sigma
An array of dimension K x P x P, containing the estimates of the K covariance matrices.
维数组K x P x P,K协方差矩阵的估计。


参数:lambda
The Box-Cox transformation parameter estimate.
Box-Cox变换参数的估计。


参数:nu
The degrees of freedom for the t distribution.
t分布的自由度。


参数:z
A matrix of size N x K, containing the posterior probabilities of cluster memberships.  The probabilities in each row sum up to one.
大小N x K,包含聚类籍的后验概率矩阵。在每一行的概率总结之一。


参数:u
A matrix of size N x K, containing the &ldquo;weights&rdquo; (the contribution for computing cluster mean and covariance matrix) of each data point in each cluster.  Since this quantity decreases monotonically with the Mahalanobis distance, it can also be interpreted as the level of &ldquo;outlyingness&rdquo; of a data point.  Note that, when nu=Inf, this slot is used to store the Mahalanobis distances instead.
一个矩阵的大小N x K,每个聚类中的每个数据点(计算聚类的均值和协方差矩阵的贡献)的“砝码”。由于这个数量与马氏距离单调下降,它也可以被解释为“outlyingness”数据点的水平。请注意,当nu=Inf,此插槽用于存储马氏距离,而不是。


参数:label
A vector of size N, showing the cluster membership according to the initial partition (i.e., hierarchical clustering if randomStart=0 or random partitioning if randomStart>0).  Filtered observations will be labelled as NA.  Unassigned observations (which may occur since only 1500 observations at maximum are taken for hierarchical clustering) will be labelled as 0.
如果N或随机分区的大小randomStart=0向量,呈现出聚类成员,根据最初的分区(即层次聚类如果randomStart>0)。过滤的意见将被标记为NA。未分配的意见(这可能发生,因为只有1500最大的意见是采取分层聚类)将标记为0。


参数:uncertainty
A vector of size N, containing the uncertainty about the cluster assignment.  Uncertainty is defined as 1 minus the posterior probability that a data point belongs to the cluster to which it is assigned.
大小N的一个向量,包含有关聚类分配的不确定性。被定义为1减去后一个数据点所属的聚类,它被分配到的概率不确定性。


参数:ruleOutliers
A numeric vector of size 3, storing the rule used to call outliers.  The first element is 0 if the criterion is set by the level argument, or 1 if it is set by u.cutoff.  The second element copies the content of either the level or u.cutoff argument.  The third element copies the content of the z.cutoff argument.  For instance, if points are called outliers when they lie outside the 90% quantile region or have assignment probabilities less than 0.5, then ruleOutliers is c(0, 0.9, 0.5).  If points are called outliers only if their &ldquo;weights&rdquo; in the assigned clusters are less than 0.5 regardless of the assignment probabilities, then ruleOutliers becomes c(1, 0.5, 0).
一个大小为3的数字向量,存储规则用于调用离群。第一个元素是0,如果标准由level的说法,或1,如果它是由u.cutoff。第二个元素复制level或u.cutoff参数的内容,。第三个元素复制的z.cutoff参数的内容。例如,如果点被称为离群值时,它们就躺在以外的90%分位数的区域,或有分配的概率小于0.5,那么ruleOutliers是c(0, 0.9, 0.5)。如果点被称为离群值,只有当他们在分配聚类的“砝码”无论转让概率小于0.5,那么ruleOutliers变c(1, 0.5, 0)。


参数:flagOutliers
A logical vector of size N, showing whether each data point is called an outlier or not based on the rule defined by level/u.cutoff and z.cutoff.
一个逻辑向量大小N,每个数据点是否被称为离群或不对level/u.cutoff和z.cutoff定义的规则的基础。


参数:rm.min
Number of points filtered from below.
过滤从下面点的数量。


参数:rm.max
Number of points filtered from above.
从上面的过滤点的数量。


参数:logLike
The log-likelihood of the fitted mixture model.
对数似然拟合的混合模型。


参数:BIC
The Bayesian Information Criterion for the fitted mixture model.
装混合模型的贝叶斯信息准则。


参数:ICL
The Integrated Completed Likelihood for the fitted mixture model.
综合完成拟合混合模型的似然。

If K has a length >1, the function returns an object of class flowClustList.  Its data part is a list with the same length as K, each element of which is a flowClust object corresponding to a specific number of clusters.  In addition, the resultant flowClustList object contains the following slots:<br>
如果K有一个长度大于1,函数返回一个对象的类flowClustList。其数据部分是一个长度相同的列表K,其中的每个元素是一个flowClust对象对应一个簇的具体数量。此外,由此产生的flowClustList对象包含以下插槽:参考

index An integer giving the index of the list element corresponding to the best model as selected by criterion.<br> criterion The criterion used to choose the best model &ndash; either "BIC" or "ICL".<br>
index给由criterion参考选择最好的模型对应的列表元素的索引整数criterion的标准来选择最好的模型 - 无论是<X >或"BIC"。参考

Note that when a flowClustList object is used in place of a flowClust object, in most cases the list element corresponding to the best model will be extracted and passed to the method/function call.
请注意,当flowClustList对象用于在地方flowClust对象,在大多数情况下,列表中的元素对应的最佳模式,将提取的方法/函数调用传递给。


作者(S)----------Author(s)----------



Raphael Gottardo &lt;<a href="mailto:raph@stat.ubc.ca">raph@stat.ubc.ca</a>&gt;, Kenneth Lo &lt;<a href="mailto:c.lo@stat.ubc.ca">c.lo@stat.ubc.ca</a>&gt;




参考文献----------References----------



参见----------See Also----------

summary, plot, density, hist, Subset, split, ruleOutliers, Map, SimulateMixture
summary,plot,density,hist,Subset,split,ruleOutliers,Map,SimulateMixture


举例----------Examples----------


data(rituximab)

### cluster the data using FSC.H and SSC.H[#聚类的数据使用FSC.H和SSC.H]
res1 <- flowClust(rituximab, varNames=c("FSC.H", "SSC.H"), K=1)

### remove outliers before proceeding to the second stage[#删除,然后再进行第二阶段的离群]
# %in% operator returns a logical vector indicating whether each[%%操作符返回一个逻辑向量指示是否每个]
# of the observations lies within the cluster boundary or not[聚类内的边界或不在于观测]
rituximab2 <- rituximab[rituximab %in% res1,]
# a shorthand for the above line[上述线路的简写]
rituximab2 <- rituximab[res1,]
# this can also be done using the Subset method[这也可以使用子集方法]
rituximab2 <- Subset(rituximab, res1)

### cluster the data using FL1.H and FL3.H (with 3 clusters)[#使用FL1.H和FL3.H(3聚类与聚类的数据)]
res2 <- flowClust(rituximab2, varNames=c("FL1.H", "FL3.H"), K=3)
show(res2)
summary(res2)

# to demonstrate the use of the split method[证明使用分割法]
split(rituximab2, res2)
split(rituximab2, res2, population=list(sc1=c(1,2), sc2=3))

# to show the cluster assignment of observations[显示聚类分配的意见]
table(Map(res2))

# to show the cluster centres (i.e., the mean parameter estimates[显示聚类中心(即平均参数估计]
# transformed back to the original scale)[转换回原来的规模)]
getEstimates(res2)$locations

### demonstrate the use of various plotting methods[#展示使用各种绘图方法]
# a scatterplot[散点图]
plot(res2, data=rituximab2, level=0.8)
plot(res2, data=rituximab2, level=0.8, include=c(1,2), grayscale=TRUE,
    pch.outliers=2)
# a contour / image plot[轮廓/图像图]
res2.den <- density(res2, data=rituximab2)
plot(res2.den)
plot(res2.den, scale="sqrt", drawlabels=FALSE)
plot(res2.den, type="image", nlevels=100)
plot(density(res2, include=c(1,2), from=c(0,0), to=c(400,600)))
# a histogram (1-D density) plot[直方图(一维密度)图]
hist(res2, data=rituximab2, subset="FL1.H")

### to demonstrate the use of the ruleOutliers method[#展示的使用的ruleOutliers方法]
summary(res2)
# change the rule to call outliers[改变规则调用离群]
ruleOutliers(res2) <- list(level=0.95)
# augmented cluster boundaries lead to fewer outliers[增强聚类的边界导致更少的离群]
summary(res2)

# the following line illustrates how to select a subset of data [下面一行演示了如何选择一个数据子集]
# to perform cluster analysis through the min and max arguments;[min和max参数来执行,通过聚类分析;]
# also note the use of level to specify a rule to call outliers[还注意到使用指定的规则来调用离群水平]
# other than the default[比默认]
flowClust(rituximab2, varNames=c("FL1.H", "FL3.H"), K=3, B=100,
    min=c(0,0), max=c(400,800), level=0.95, z.cutoff=0.5)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-9 07:22 , Processed in 0.020145 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表