mdqc(mdqc)
mdqc()所属R语言包:mdqc
MDQC: Mahalanobis Distance Quality Control
MDQC:马氏距离的质量控制
译者:生物统计家园网 机器人LoveR
描述----------Description----------
MDQC is a multivariate quality assessment method for microarrays based on quality control (QC) reports.
MDQC是一个多元的芯片质量评估方法,质量控制(QC)的报告为基础。
用法----------Usage----------
mdqc(x, method=c("nogroups", "apriori", "global", "cluster", "loading"),
groups=NULL, k=NULL, pc=NULL,
robust=c("S-estimator","MCD", "MVE"), nsamp=10*nrow(x))
参数----------Arguments----------
参数:x
a numeric matrix or data frame containing the quality measures (columns) for each array (rows). The number of rows must exceed the number of columns.
数字矩阵或数据框为每个阵列包含质量的措施(列)(行)。必须超过行数列数。
参数:method
The Mahalanobis Distances (MDs) can be computed on all the quality measures in the QC report (this is the default method given by method="nogroups"), on the first k principal components resulting from a principal component analysis (PCA) of the QC report ("global") or on subsets of quality measures in the QC report ("apriori": groups defined by the user, "cluster": groups resulting from a cluster analysis, or "loading": groups resulting from a cluster analysis in the space of the loadings of a PCA). While the first two methods compute a single MD for each array, the last three compute one MD within each created group of quality measures.
马氏距离(MDS)可以计算的QC报告(这是默认的方法method="nogroups"),前k个主成分,主成分分析(PCA)对所有的质量措施QC报告("global"),或在QC报告质量的措施(子集"apriori":由用户定义的群体,"cluster":从聚类分析组,或"loading"的:从PCA的负荷空间中的聚类分析)的群体。虽然前两种方法计算出每个阵列的MD单,过去三年计算质量的措施,在每个创建的组的MD。
参数:groups
A list to specify the groups of quality measures when the “apriori” method is chosen. E.g. groups = list(c(1,2), c(4,6)) puts column 1,2 as one group and 4,6 as a second.
选择一个列表,以指定的群体质量的措施时,“先验”的方法。例如groups = list(c(1,2), c(4,6))为一组,4,6为第二列1,2。
参数:k
An integer to specify the number of clusters (or groups) to be used in the cluster analysis when “cluster” or “loading” methods are chosen.
一个整数来指定聚类(或团体)在聚类分析时使用“聚类”或“加载”的方法选择。
参数:pc
An integer to specify the number of principal components analyzed from the PCA when “global” or “loading” methods are chosen.
一个整数,指定主成分分析的PCA时,“全球”或“装载”的方法选择。
参数:robust
A robust multivariate location/spread estimator (choice of S-estimator, MCD or MVE). The default method uses S-estimators with a 25% breakdown point.
一个强大的多元位置/扩散估计(选择的S-估计,MCD或姆韦)。默认的方法,使用的S-估计有25%的击穿点。
参数:nsamp
The number of subsamples that the robust estimator should use. This defaults to 10 times the number of rows in the matrix.
强劲的估计应该使用的子样本的数量。这个矩阵中的行数10倍的违约。
Details
详情----------Details----------
MDQC flags potentially low quality arrays based on the idea of outlier detection, that is, it flags those arrays whose quality attributes jointly depart from those of the bulk of the data.
MDQC标记潜在的低质量的阵列,基于异常检测,也就是说,它标志的阵列,其质量从大量的数据的属性共同离开的想法。
This function computes a distance measure, the Mahalanobis Distance, to summarize the quality of each array. The use of this distance allows us to perform a multivariate analysis of the information in QC reports taking the correlation structure of the quality measures into account. In addition, by using robust estimators to identify the typical quality measures of good-quality arrays, the evaluation is not affected by the measures of outlying arrays.
此函数计算距离的措施,马氏距离,总结每个阵列的质量。使用这个距离,让我们在质检报告,以考虑到质量的措施的相关结构进行了信息的多因素分析。此外,通过使用强大的估计,以确定优质阵列的典型质量的措施,评估没有受到外围阵列的措施。
MDQC can be based on all the quality measures simultaneously (using method="nogroups"), on subsets of them (using method="apriori", "cluster", or "loading"), or on a transformed space with a lower dimension (using method="global").
MDQC可以基于所有的质量措施,同时(使用method="nogroups"),他们的子集(使用method="apriori","cluster"或"loading"),或转换空间较低的尺寸(使用method="global")。
In the “apriori” approach the user forms groups of quality measures on the basis of an a priori interpretation of them and according to the quality aspect they represent. The “cluster” and the “loading” methods are two data-driven methods to form the groups. The former groups the quality measures using clustering analysis, and the latter uses the loadings of a principal component analysis to identify the quality measures that contain similar information and group them. It is important to note that the “apriori”, the “cluster”, and the “loading” methods create groups of the original quality measures of the report and compute one MD within each group. Finally, the “global” method computes a single MD based on the reduced space of the first k principal components from a robust PCA. The number k of PCs can be chosen using a scree plot.
在“先验”的做法,他们先验解释的基础上形成用户群体质量的措施,并根据他们所代表的质量方面。 “聚类”和“加载”的方法是两个数据驱动的方法,以形成群体。前群体质量的措施,利用聚类分析,而后者则采用主成分分析的负荷,以确定质量的措施,包含类似信息,并将它们。重要的是要注意的是“先验”,“聚类”,“加载”的方法创建组报告的原始质量的措施,并计算各组内的一个MD。最后,“全球性”的方法计算的基础上,从鲁棒PCA的前k个主成分减少空间的一个单一的MD。数k的电脑可以选择用卵石图。
More details on each method are given in <CITE>Cohen Freue et al. (2007)</CITE>
每种方法的详情载<CITE>科恩Freue等。 (2007)</引用>
值----------Value----------
An object of class "“mdqc”" (with associated plot, print and summary methods) with components
(相关图,打印和总结的方法)与组件对象类的“mdqc”
参数:ngroups
Number of groups in which the MDs have been computed
在已计算的MDS组
参数:groups
column numbers corresponding to the quality measures in each group
列数字,相应的质量措施,在每个组
参数:mdqcValues
Mahalanobis Distance(s) for each array
马氏距离(S)为每个阵列
参数:x
dataset containing the numeric quality measures in the report
数据集包含在报告中的数字质量的措施
参数:method
method used to group or transform the quality measures before computing the MD for each array
方法用于组或改造质量的措施之前,计算每个阵列的MD
参数:pc
number of principal components used in the robust PCA.
鲁棒PCA使用的主要组成部分的数量。
参数:k
number of clusters used in the cluster analysis.
在聚类分析中使用的簇的数目。
注意----------Note----------
We thank Christopher Croux for providing us a MATLAB code that
我们感谢为我们提供了MATLAB代码,克里斯托弗Croux
作者(S)----------Author(s)----------
Justin Harrington <a href="mailto:harringt@stat.ubc.ca">harringt@stat.ubc.ca</a> and Gabriela
V. Cohen Freue <a href="mailto:gcohen@stat.ubc.ca">gcohen@stat.ubc.ca</a>.
参考文献----------References----------
R. and Scherer, A. and McManus, B. and Keown, P. and McMaster, W. R. and Ng, R. T. (2007) ‘MDQC: A New Quality Assessment Method for Microarrays Based on Quality Control Reports’. Bioinformatics 23, 3162 – 3169.
K. and Cope, L. and Irizarry R. A. and Speed T. P. (2005) ‘Quality assessment of Affymetrix GeneChip data.’ In Gentleman R. and Carey C. J. and Huber W. and Irizarry R. A. and Dudoit S. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer.
T. P. (2007) ‘Quality assessment for short oligonucleotide arrays’. Forthcoming in Technometrics (with Discussion).
Girtman, K. and Williams, W. K. and Liu, H. and Mahfouz, R. and Raimondi, S. C. and Lenny, N. and Patel, A. and Downing, J. R. (2003) ‘Classification of pediatric acute lymphoblastic leukemia by gene expression profiling.’ Blood 102, 2951–9.
参见----------See Also----------
prcomp.robust,pam,
prcomp.robust,pam
举例----------Examples----------
data(allQC)
## Contains the QC report obtained using Bioconductor's simpleaffy package[#包含使用Bioconductor的simpleaffy方案获得的QC报告]
## for a subset of arrays from a large acute lymphoblastic leukemia (ALL)[#为阵列从一个大的急性淋巴单元白血病(ALL)的子集]
## study (Ross et al., 2004).[#研究(Ross等人,2004年)。]
## This dataset has been also studied by Bolstad et al. (2005) and[#这集也已研究由Bolstad等。 (2005)]
## Brettschneider et al. (2007).[#Brettschneider等。 (2007年)。]
## For further information see allQC.[#allQC如需进一步信息。]
#### No Groups method[###无组方法]
# Figure 2 in Cohen Freue et al. (2007):[图2,科恩Freue等。 (2007年):]
# Results of MDQC based on all measures of the QC report.[MDQC结果的基础上的QC报告的所有措施。]
mdout <- mdqc(allQC, method="nogroups")
plot(mdout)
print(mdout)
summary(mdout)
#### A-Priori grouping method[###一个先验分组方法]
# Figure 3 in Cohen Freue et al. (2007):[图3,科恩Freue等。 (2007年):]
# Results of MDQC using the apriori grouping method.[结果MDQC使用先验分组方法。]
mdout <- mdqc(allQC, method="apriori", groups=list(1:5, 6:9, 10:11))
plot(mdout)
#### Global PCA method[###全球PCA方法]
# Figure 4 in Cohen Freue et al.(2007):[科恩在图4 Freue等(2007)。]
# Results of MDQC using the global PCA method.[MDQC结果使用全球PCA方法。]
mdout <- mdqc(allQC, method="global", pc=4)
plot(mdout)
#### Clustering grouping method[###聚类分组方法]
# Figure 4 in Supplementary Material of Cohen Freue et al. (2007):[图4在补充材料,科恩Freue等。 (2007年):]
# Results of MDQC using a cluster analysis to form[使用聚类分析,形成MDQC结果]
# 3 groups of quality measures.[3组质量的措施。]
mdout <- mdqc(allQC, method="cluster", k=3)
plot(mdout)
#### Loading grouping method[###载入中分组方法]
# Figure 4 in Supplementary Material of Cohen Freue et al. (2007):[图4在补充材料,科恩Freue等。 (2007年):]
# Results of MDQC using a cluster analysis on the first[MDQC结果在第一次使用聚类分析]
# k=4 loading vectors from a robust PCA to form 3 groups of quality measures.[K = 4载荷向量从一个强大的PCA形成3组质量的措施。]
mdout <- mdqc(allQC, method="loading", k=3, pc=4)
plot(mdout)
### To get the raw MD distances[##为了获得原始的MD距离]
mdout$mdqcValues
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|