R语言 clippda包 clippda-package()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 15:05:40

clippda-package(clippda)
clippda-package()所属R语言包：clippda

                                    A package for clinical proteomics profiling data analysis
                                       临床蛋白质组学分析数据分析的软件包

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This package is still under development but it is intended to provide  a range of tools for analysing clinical genomics,  methylation and proteomics, data with the non-standard repeated  expression measurments arising from technical replicates. Most of these studies are observational case-control by design and the results of analyses must be  appropriately adjusted for confounding factors and  imbalances in the data.  This regression-type problem is different from the  regression problem in  limma, in which all the covariates are some kind  of contrasts and are  therefore important. Our method is specifically suitable for  analysing single-channel microarrays and proteomics data with repeated  probe, or peak  measurements,  especially  in the case where there is no one-to-one correspondence between cases and controls and the data cannot be analysed as log-ratios. In the current version (version 0.1.0),  we are more concerned with the problem of sample size calculations for these data sets. But some tools for pre-processing of the repeated peaks data,  including tools for checking for the consistency in the number  of replicates across samples, the consistency of  the peak information between replicate  spectra and tools for data formatting  and averaging, are included.  clippda also implements a routine for evaluating differential-expression between cases and controls, especially for data in which each sample is assayed more than once, and are obtained from  studies which are observational, or  those for which the data are heterogeneous (e.g. data for cancer studies in which controls are not directly sampled, but are obtained  from samples from suspected cases that turn out to be benign disease,  after an operation, for example.  In this case there could be serious imbalances in demographics between the cases and controls).  The test statistics considered are derived from the methods developed by Nyangoma et al. (2009).  These new methods for evaluating differential-expression  are compared with the empirical Bayes method in the  limma package.  To limit the number of false positive discoveries, we control the tail probability of the proportion of false positives, (TPPFP).  Further details can be found in the package vignette.
这包是仍在发展，但它的目的是提供一系列的工具，为临床基因组学，甲基化和蛋白质组学分析，与非标准反复表达产生的观测值从技术数据复制。这些研究大多是观察病例对照设计和分析的结果，必须适当调整混杂因素和数据的失衡。这个回归式的问题不同的是从limma的回归问题，在所有的covariates一些contrasts，因此重要的。我们的方法是具体分析，重复探测，或峰值测量单通道芯片和蛋白质组学数据，特别是情况，适用于任何人，以-1情况和控制和数据之间的对应的可以不被记录，比分析。在目前的版本（版本0.1.0），我们更关心这些数据集的样本大小的计算问题。但前处理的重复峰值数据，包括重复整个样本数量的一致性检查工具的一些工具，包括复制光谱和工具之间的数据，格式化和平均峰值信息的一致性。 clippda还实现了评价病例组和对照组之间的差异表达，尤其是对每个样品检测一次以上，并从观测，或那些数据是异构的研究获得的数据，常规（如癌症控件没有直接取样，但变成是良性的疾病，例如，手术后的疑似病例样本得到的研究数据。在这种情况下有可能严重失衡之间的情况和人口统计控制）。认为该测试统计来自尼昂戈马等人开发的方法。（2009年）。这些评估表现差的新方法比limma包中的经验Bayes方法。要限制假阳性发现，我们控制的尾概率，误报的比例（TPPFP）。进一步的细节，可以发现在包vignette。

Details

详情----------Details----------

This package provides a method for calculating the sample size required when planning  proteomic profiling studies using repeated peak measurements. At the planning stage,  an experimenter typically does not yet have information on the heterogeneity of the data expected. We provide a method which makes it possible to input, and adjust for the effect of,  the expected heterogeneity in the sample size  calculations. The code for calculating sample size is  the sampleSize function. It requires the computation of the between- and within-sample variations, the differences in mean between  cases and controls, the intraclass correlations between duplicate peak data, and the heterogeneity correction factor. These can be computed  from  pilot data using the functions: betweensampleVariance, withinsampleVariance, replicateCorrelations and FisherInformation, respectively. Before the data can be analysed using these functions, it must be adequately  pre-processed, and this package provides a number of tools for doing this.  We provide a grid of the clinically important differences versus protein variances (with superimposed sample size contours). On this grid, we  have plotted sample sizes computed using parameters from several real-life  proteomic data from a range of cancer-types, fluid-types, cancer stages and experimental  protocols of SELDI and MALDI. These values provide sample size  ranges which may be used to estimate the number of samples required. The example below takes you through some of the processes of sample size calculations using this package.
这个软件包提供了蛋白质组学分析研究，采用重复峰值测量规划时，需要的样本量的计算方法。在规划阶段，实验者通常并不尚未有预期的数据异质性的信息。我们提供了一种方法，这使得它可以输入，调整，预计在样本大小计算异质的效果。计算样本大小的代码是sampleSize功能。它需要计算之间和样本内的变化，病例组和对照组内重复的峰值数据之间的相关，异质性校正因子之间的平均差异。这些都可以从使用功能的试点数据计算：betweensampleVariance，withinsampleVariance，replicateCorrelations和FisherInformation，分别。前的数据，可以使用这些功能的分析，它必须充分预加工，这个软件包提供了这样的工具。我们提供了一个网格（与叠加的样本大小轮廓）与蛋白质变异的临床上重要的差异。在此格，我们绘制的样本大小，癌症类型，流体类型，癌症阶段和SELDI技术和MALDI实验协议的范围从几个现实生活中的蛋白质组数据的计算参数。这些价值观提供样本大小可以用来估计所需样品的数量范围。下面的例子中，需要通过一些样本大小的计算，使用此包的过程。

作者（S）----------Author(s)----------

Stephen Nyangoma

Maintainer: S Nyangoma <a href="mailto:s.o.nyangoma@bham.ac.uk">s.o.nyangoma@bham.ac.uk</a>

参考文献----------References----------

SELDI-TOF MS Proteomic Data. Stat Appl Genet Mol Biol 2006,  5.1
Billingham LJ: Sample size calculations for planning clinical proteomic profiling studies using mass spectrometry.  Bioinformatics (Submitted)
Compound-Related  Peaks and Chromatograms from High Frequency Noise, Spikes and Solvent-Based Noise in LC - MS  Data Sets. Stat Appl Genet Mol Biol 2007, 6, 1, Article 23
expression in microarray experiments. Bioinformatics 2005, 21, 2067 - 75
in microarray experiments. Stat Appl Genet Mol Biol 2004, 3, 1, Article 3

举例----------Examples----------

#########################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# The routine for calculating sample size required when planning a clinical proteomic[规划时，需要临床蛋白质组学常规计算样本大小]
# profiling study is provided in the sampleSize function. First, this functions performs [分析研究中的采样功能。首先，这个函数执行]
# computations for sample size parameters, that include: the biological variance, the[计算样本大小参数，包括：生物变异，]
# tecnical variance, the differences to be estimated, the intraclass correlation [tecnical方差估计的差异，组内相关]
# (if unknown). These computations are done ase follows: [（未知）。这些计算是做ASE如下：]
#########################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

#########################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# biological variation, difference to be estimated, and the p-values for differential-[生物变异，要估计的差异，为p-值差]
# expression are computed using the generic function: betweensampleVariance[表达式的计算使用的通用功能：betweensampleVariance]
# It requires data of a aclinicalProteomicsData class, as input [它需要一个aclinicalProteomicsData类的数据，作为输入]
########################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

###################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# Creating data of a aclinicalProteomicsData class[创建数据一个aclinicalProteomicsData类]
###################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

data(liverdata)

data(liver_pheno)

OBJECT=new("aclinicalProteomicsData")

OBJECT@rawSELDIdata=as.matrix(liverdata)

OBJECT@covariates=c("tumor" , "sex")

OBJECT@phenotypicData=as.matrix(liver_pheno)

OBJECT@variableClass=c('numeric','factor','factor')

OBJECT@no.peaks=53

Data=OBJECT

#################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# Data manipulation carried out internally by the betweensampleVariance function [数据处理进行了内部由betweensampleVariance功能]
#################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

rawData <- proteomicsExprsData(Data)

no.peaks <- Data@no.peaks

JUNK_DATA <- sampleClusteredData(rawData,no.peaks)

JUNK_DATA=negativeIntensitiesCorrection(JUNK_DATA)

# we use the log base 2 expression values[我们使用log碱基2个表达式的值]

LOG_DATA <- log2(JUNK_DATA)

#######################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# compute biological variation, difference to be estimated, and the p-values [计算生物学变异，要估计的差异，p值]
#######################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

BiovarDiffSig <- betweensampleVariance(OBJECT)

BiovarDiffSig

#####################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# technical variance[技术差异]
#####################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

sample_technicalVariance(Data)

#######################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# heterogeneity correction-factor is the second diagonal element of the output [异质性校正因子是输出的第二个对角线元素]
# matrix from the fisherInformation function, i.e. from the expected Fisher Information[矩阵从fisherInformation功能，即从预期的Fisher信息]
########################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

Z <- as.vector(fisherInformation(Data)[2,2])/2
Z

###################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# The outputs of these functions are converted into statistics used in[这些函数的输出被转换成使用统计]
# the sample size calculations using a wraper function sampleSizeParameters[样本大小的计算，使用wraper功能sampleSizeParameters]
# it gives the consensus parameter values. You must specify the p-value and the [它给人的共识参数值。你必须指定的P-值和]
# intraclass correation, cutoff. The description of how these parameters[组内correation，截止。描述这些参数如何]
# are chosen is given in Nyangoma, et al. (2009).  [被选择尼昂戈马，等。（2009年）。]
###################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

intraclasscorr  <-  0.60 #cut-off for intraclass correlation[切断内相关]

signifcut <- 0.05    #significance cut-off[意义切断]

sampleparameters=sampleSizeParameters(Data,intraclasscorr,signifcut)

#######################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
# SAMPLE SIZE CALCULATIONS[样本数计算]
#The function sampleSize calculates the protein variance, difference to be estimated, [函数中的采样计算蛋白质的变异，要估计的差异，]
# the technical varaince. These parameters are computed from statistics of peaks with [技术varaince。这些参数计算峰统计]
# medium biological variation. [媒介生物的变化。]
# It also gives sample sizes for beta=c(0.90,0.80,0.70) and alpha = c(0.001, 0.01,0.05)[这也给样本量为β= C（0.90,0.80,0.70）和α= C（0.001，0.01,0.05）]
#######################################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

samplesize <- sampleSize(OBJECT,intraclasscorr,signifcut)
samplesize

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册