normalizeFragmentLength(aroma.light)
normalizeFragmentLength()所属R语言包:aroma.light
Normalizes signals for PCR fragment-length effects
标准化的信号扩增片段长度的影响
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Normalizes signals for PCR fragment-length effects. Some or all signals are used to estimated the normalization function. All signals are normalized.
标准化PCR扩增片段长度影响的信号。部分或所有的信号被用来估计标准化功能。所有信号都标准化。
用法----------Usage----------
参数----------Arguments----------
参数:y
A numeric vector of length K of signals to be normalized across E enzymes.
一个numericvector长度跨é酶标准化的信号ķ的。
参数:fragmentLengths
An integer KxE matrix of fragment lengths.
KXEinteger片段长度matrix。
参数:targetFcns
An optional list of E functions; one per enzyme. If NULL, the data is normalized to have constant fragment-length effects (all equal to zero on the log-scale).
一个可选的list电子function的每酶之一。如果NULL,数据标准化有恒定片段长度的影响(log规模上都等于零)。
参数:subsetToFit
The subset of data points used to fit the normalization function. If NULL, all data points are considered.
用于数据点,以适应标准化的功能子集。如果NULL,被认为是所有数据点。
参数:onMissing
Specifies how data points for which there is no fragment length is normalized. If "ignore", the values are not modified. If "median", the values are updated to have the same robust average as the other data points.
指定数据点,其中有没有片段长度标准化。如果"ignore",值不会被修改。如果"median",值更新,有相同的其他数据点的强劲平均。
参数:.isLogged
A logical.
Alogical。
参数:...
Additional arguments passed to lowess.
额外的参数传递到lowess。
参数:.returnFit
A logical.
Alogical。
值----------Value----------
Returns a numeric vector of the normalized signals.
返回numericvector归的信号。
多酶标准化----------Multi-enzyme normalization----------
It is assumed that the fragment-length effects from multiple enzymes added (with equal weights) on the intensity scale. The fragment-length effects are fitted for each enzyme separately based on units that are exclusively for that enzyme. If there are no or very such units for an enzyme, the assumptions of the model are not met and the fit will fail with an error. Then, from the above single-enzyme fits the average effect across enzymes is the calculated for each unit that is on multiple enzymes.
据推测,从多种酶片段长度的影响,增加强度规模(平等权)。配备的酶,酶是专门的单位分别根据每个片段长度的影响。如果有一种酶没有或非常这样的单位,该模型的假设不符合,适合将失败与错误。然后,适合从上面的单酶的酶之间的平均效果是每个单位,是多种酶的计算。
目标函数----------Target functions----------
It is possible to specify custom target function effects for each enzyme via argument targetFcns. This argument has to be a list containing one function per enzyme and ordered in the same order as the enzyme are in the columns of argument fragmentLengths. For instance, if one wish to normalize the signals such that their mean signal as a function of fragment length effect is contantly equal to 2200 (or the intensity scale), the use targetFcns=function(fl, ...) log2(2200) which completely ignores fragment-length argument 'fl' and always returns a constant. If two enzymes are used, then use targetFcns=rep(list(function(fl, ...) log2(2200)), 2).
它是可以通过参数targetFcns指定自定义的目标,每一种酶功能的影响。这种说法是一个list包含一个function每酶和酶相同的顺序排列,在参数列fragmentLengths。举例来说,如果想标准化的信号等,他们的平均信号作为片段长度效应的功能是contantly等于2200(或强度等级),使用targetFcns=function(fl, ...) log2(2200)这完全忽略片段长度参数FL始终返回一个常数。如果使用两种酶,然后使用targetFcns=rep(list(function(fl, ...) log2(2200)), 2)。
Note, if targetFcns is NULL, this corresponds to targetFcns=rep(list(function(fl, ...) 0), ncol(fragmentLengths)).
请注意,如果targetFcns是NULL,这相当于targetFcns=rep(list(function(fl, ...) 0), ncol(fragmentLengths))。
Alternatively, if one wants to only apply minimial corrections to the signals, then one can normalize toward target functions that correspond to the fragment-length effect of the average array.
另外,如果想只适用信号minimial的更正,然后可以标准化朝着目标函数对应的平均数组片段长度的影响。
作者(S)----------Author(s)----------
Henrik Bengtsson (<a href="http://www.braju.com/R/">http://www.braju.com/R/</a>)
参考文献----------References----------
<br>
举例----------Examples----------
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -[----------------------------------]
# Example 1: Single-enzyme fragment-length normalization of 6 arrays[例1:单酶片段长度标准化6阵列]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -[----------------------------------]
# Number samples[数样本]
I <- 9;
# Number of loci[位点的数量]
J <- 1000;
# Fragment lengths[片段长度]
fl <- seq(from=100, to=1000, length.out=J);
# Simulate data points with unknown fragment lengths[模拟与未知片段长度的数据点]
hasUnknownFL <- seq(from=1, to=J, by=50);
fl[hasUnknownFL] <- NA;
# Simulate data[模拟数据]
y <- matrix(0, nrow=J, ncol=I);
maxY <- 12;
for (kk in 1:I) {
k <- runif(n=1, min=3, max=5);
mu <- function(fl) {
mu <- rep(maxY, length(fl));
ok <- !is.na(fl);
mu[ok] <- mu[ok] - fl[ok]^{1/k};
mu;
}
eps <- rnorm(J, mean=0, sd=1);
y[,kk] <- mu(fl) + eps;
}
# Normalize data (to a zero baseline)[规范化数据(零基准)]
yN <- apply(y, MARGIN=2, FUN=function(y) {
normalizeFragmentLength(y, fragmentLengths=fl, onMissing="median");
})
# The correction factors[校正因子]
rho <- y-yN;
print(summary(rho));
# The correction for units with unknown fragment lengths[未知片段长度单位校正]
# equals the median correction factor of all other units[等于所有其他单位的平均校正因子]
print(summary(rho[hasUnknownFL,]));
# Plot raw data[绘制原始数据]
layout(matrix(1:9, ncol=3));
xlim <- c(0,max(fl, na.rm=TRUE));
ylim <- c(0,max(y, na.rm=TRUE));
xlab <- "Fragment length";
ylab <- expression(log2(theta));
for (kk in 1:I) {
plot(fl, y[,kk], xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab);
ok <- (is.finite(fl) & is.finite(y[,kk]));
lines(lowess(fl[ok], y[ok,kk]), col="red", lwd=2);
}
# Plot normalized data[绘制规范化的数据]
layout(matrix(1:9, ncol=3));
ylim <- c(-1,1)*max(y, na.rm=TRUE)/2;
for (kk in 1:I) {
plot(fl, yN[,kk], xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab);
ok <- (is.finite(fl) & is.finite(y[,kk]));
lines(lowess(fl[ok], yN[ok,kk]), col="blue", lwd=2);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -[----------------------------------]
# Example 2: Two-enzyme fragment-length normalization of 6 arrays[例2:双酶片段长度标准化6阵列]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -[----------------------------------]
set.seed(0xbeef);
# Number samples[数样本]
I <- 5;
# Number of loci[位点的数量]
J <- 3000;
# Fragment lengths (two enzymes)[片段长度(两种酶)]
fl <- matrix(0, nrow=J, ncol=2);
fl[,1] <- seq(from=100, to=1000, length.out=J);
fl[,2] <- seq(from=1000, to=100, length.out=J);
# Let 1/2 of the units be on both enzymes[让单位的1/2两种酶]
fl[seq(from=1, to=J, by=4),1] <- NA;
fl[seq(from=2, to=J, by=4),2] <- NA;
# Let some have unknown fragment lengths[让一些有未知片段长度]
hasUnknownFL <- seq(from=1, to=J, by=15);
fl[hasUnknownFL,] <- NA;
# Sty/Nsp mixing proportions:[麦粒肿/ NSP混合比例:]
rho <- rep(1, I);
rho[1] <- 1/3; # Less Sty in 1st sample[在第一样本少麦粒肿]
rho[3] <- 3/2; # More Sty in 3rd sample[在第3个样品更多麦粒肿]
# Simulate data[模拟数据]
z <- array(0, dim=c(J,2,I));
maxLog2Theta <- 12;
for (ii in 1:I) {
# Common effect for both enzymes[两种酶的共同作用]
mu <- function(fl) {
k <- runif(n=1, min=3, max=5);
mu <- rep(maxLog2Theta, length(fl));
ok <- is.finite(fl);
mu[ok] <- mu[ok] - fl[ok]^{1/k};
mu;
}
# Calculate the effect for each data point[计算每个数据点的影响]
for (ee in 1:2) {
z[,ee,ii] <- mu(fl[,ee]);
}
# Update the Sty/Nsp mixing proportions[更新麦粒肿/ NSP混合比例]
ee <- 2;
z[,ee,ii] <- rho[ii]*z[,ee,ii];
# Add random errors[新增随机误差]
for (ee in 1:2) {
eps <- rnorm(J, mean=0, sd=1/sqrt(2));
z[,ee,ii] <- z[,ee,ii] + eps;
}
}
hasFl <- is.finite(fl);
unitSets <- list(
nsp = which( hasFl[,1] & !hasFl[,2]),
sty = which(!hasFl[,1] & hasFl[,2]),
both = which( hasFl[,1] & hasFl[,2]),
none = which(!hasFl[,1] & !hasFl[,2])
)
# The observed data is a mix of two enzymes[观测到的数据是两种酶的混合]
theta <- matrix(NA, nrow=J, ncol=I);
# Single-enzyme units[单酶单位]
for (ee in 1:2) {
uu <- unitSets[[ee]];
theta[uu,] <- 2^z[uu,ee,];
}
# Both-enzyme units (sum on intensity scale)[这两种酶单位(强度等级的总和)]
uu <- unitSets$both;
theta[uu,] <- (2^z[uu,1,]+2^z[uu,2,])/2;
# Missing units (sample from the others)[缺少单位(从别人的样品)]
uu <- unitSets$none;
theta[uu,] <- apply(theta, MARGIN=2, sample, size=length(uu))
# Calculate target array[计算目标阵列]
thetaT <- rowMeans(theta, na.rm=TRUE);
targetFcns <- list();
for (ee in 1:2) {
uu <- unitSets[[ee]];
fit <- lowess(fl[uu,ee], log2(thetaT[uu]));
class(fit) <- "lowess";
targetFcns[[ee]] <- function(fl, ...) {
predict(fit, newdata=fl);
}
}
# Fit model only to a subset of the data[只适合模型数据的一个子集]
subsetToFit <- setdiff(1:J, seq(from=1, to=J, by=10))
# Normalize data (to a target baseline)[标准化的数据(目标基线)]
thetaN <- matrix(NA, nrow=J, ncol=I);
fits <- vector("list", I);
for (ii in 1:I) {
lthetaNi <- normalizeFragmentLength(log2(theta[,ii]), targetFcns=targetFcns,
fragmentLengths=fl, onMissing="median",
subsetToFit=subsetToFit, .returnFit=TRUE);
fits[[ii]] <- attr(lthetaNi, "modelFit");
thetaN[,ii] <- 2^lthetaNi;
}
# Plot raw data[绘制原始数据]
xlim <- c(0, max(fl, na.rm=TRUE));
ylim <- c(0, max(log2(theta), na.rm=TRUE));
Mlim <- c(-1,1)*4;
xlab <- "Fragment length";
ylab <- expression(log2(theta));
Mlab <- expression(M==log[2](theta/theta[R]));
layout(matrix(13*I), ncol=I, byrow=TRUE));
for (ii in 1:I) {
plot(NA, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab, main="raw");
# Single-enzyme units[单酶单位]
for (ee in 1:2) {
# The raw data[原始数据]
uu <- unitSets[[ee]];
points(fl[uu,ee], log2(theta[uu,ii]), col=ee+1);
}
# Both-enzyme units (use fragment-length for enzyme #1)[这两种酶单位(使用#1酶片段长度)]
uu <- unitSets$both;
points(fl[uu,1], log2(theta[uu,ii]), col=3+1);
for (ee in 1:2) {
# The true effects[真正的影响]
uu <- unitSets[[ee]];
lines(lowess(fl[uu,ee], log2(theta[uu,ii])), col="black", lwd=4, lty=3);
# The estimated effects[估计的影响]
fit <- fits[[ii]][[ee]]$fit;
lines(fit, col="orange", lwd=3);
muT <- targetFcns[[ee]](fl[uu,ee]);
lines(fl[uu,ee], muT, col="cyan", lwd=1);
}
}
# Calculate log-ratios[计算记录的比率]
thetaR <- rowMeans(thetaN, na.rm=TRUE);
M <- log2(thetaN/thetaR);
# Plot normalized data[绘制规范化的数据]
for (ii in 1:I) {
plot(NA, xlim=xlim, ylim=Mlim, xlab=xlab, ylab=Mlab, main="normalized");
# Single-enzyme units[单酶单位]
for (ee in 1:2) {
# The normalized data[规范化的数据]
uu <- unitSets[[ee]];
points(fl[uu,ee], M[uu,ii], col=ee+1);
}
# Both-enzyme units (use fragment-length for enzyme #1)[这两种酶单位(使用#1酶片段长度)]
uu <- unitSets$both;
points(fl[uu,1], M[uu,ii], col=3+1);
}
ylim <- c(0,1.5);
for (ii in 1:I) {
data <- list();
for (ee in 1:2) {
# The normalized data[规范化的数据]
uu <- unitSets[[ee]];
data[[ee]] <- M[uu,ii];
}
uu <- unitSets$both;
if (length(uu) > 0)
data[[3]] <- M[uu,ii];
uu <- unitSets$none;
if (length(uu) > 0)
data[[4]] <- M[uu,ii];
cols <- seq(along=data)+1;
plotDensity(data, col=cols, xlim=Mlim, xlab=Mlab, main="normalized");
abline(v=0, lty=2);
}
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|