R语言:survfit.formula()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-17 09:53:24

survfit.formula(survival)
survfit.formula()所属R语言包：survival

                                       Compute a Survival Curve for Censored Data
                                       截尾数据计算出的生存曲线

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Computes an estimate of a survival curve for censored data  using either the Kaplan-Meier or the Fleming-Harrington method.  For competing risks data it computes the cumulative incidence curve.
计算为使用的Kaplan-Meier法或菲林明哈灵顿方法审查的数据估计生存曲线。对于竞争风险的数据，它计算的累计发病率曲线。

用法----------Usage----------

## S3 method for class 'formula'[类formula的方法]
survfit(formula, data, weights, subset, na.action,
      etype, id, ...)

参数----------Arguments----------

参数：formula
a formula object, which must have a  Surv object as the response on the left of the ~ operator and, if desired, terms separated by + operators on the right.  One of the terms may be a strata object. For a single survival curve the right hand side should be ~ 1.
一个公式对象，其中必须有Surv对回应~运营商的左侧的对象，如果需要的话，+经营权分离的条款。条件之一，可能是一个strata对象。右侧为一个单一的生存曲线应该是~ 1。

参数：data
a data frame in which to interpret the variables named in the formula,  subset and weights arguments.
在解释命名的公式中的变量，subset和weights参数的数据框。

参数：weights
The weights must be nonnegative and it is strongly recommended that they be strictly positive, since zero weights are ambiguous, compared  to use of the subset argument.
权重必须是非负的，并强烈建议，他们是严格正的，因为零权含糊不清，使用subset参数相比。

参数：subset
expression saying that only a subset of the rows of the data  should be used in the fit.
表达说，只有一个数据行的一个子集，应适合使用。

参数：na.action
a missing-data filter function, applied to the model frame, after any  subset argument has been used.  Default is options()$na.action.
丢失数据过滤功能，应用模型框架，在任何subset参数已用于。默认options()$na.action。

参数：etype
a variable giving the type of event. Presence of this variable signals the program to compute the cumulative incidece estimate.  For each event status==1, the etype variable indicates the type of event.  For a censored observation the value of etype is ignored - but do not set it to NA, since that will cause na.action to delete the observation.
事件的类型变量。这个变量的存在，标志着程序计算累计incidece估计。对于每个事件status==1，ETYPE变量表示事件的类型。对于审查的观测值etype被忽略 - 但不将它设置为不适用，因为这将导致na.action删除观察。

参数：id
identifies individual subjects, when a given person can have multiple lines of data. when used with the etype variable, this allows the compuation of a cumulative prevalence estimate, i.e., the incidence over time.
识别个别科目，当一个特定的人可以有多个数据行。 etype变量使用时，这允许compuation累积患病率的估计，即随着时间的推移发生。

参数：...
The following additional arguments are passed to internal functions called by survfit.
以下额外的参数传递给survfit所谓的内部功能。

type a character string specifying the type of survival curve.  Possible values are "kaplan-meier",  "fleming-harrington" or "fh2"  if a formula is given.  This is ignored for competing risks or when the Turnbull estimator is used.
键入一个字符串指定的存活曲线类型。可能的值是"kaplan-meier"，"fleming-harrington"或"fh2"如果给出一个公式。这是竞争的风险或特恩布尔估计使用时忽略。

error a character string specifying the error.  Possible values are  "greenwood" for the Greenwood formula or  "tsiatis" for the Tsiatis formula,  (only the first character is  necessary).
指定错误错误字符串。可能的值是"greenwood"为格林伍德公式或"tsiatis"为Tsiatis公式，（只有第一个字符是必要的）。

conf.type One of "none", "plain", "log" (the default), or "log-log".  Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated.  The second causes the standard intervals curve +- k *se(curve), where k is determined from conf.int.  The log option calculates intervals based on the cumulative hazard or log(survival). The last option bases intervals on the log hazard or log(-log(survival)).
conf.type之一"none"，"plain"，"log"（默认），或"log-log"。只有足够的字符串唯一标识，这是必要的。第一个选项导致不能生成的置信区间。第二个导致标准的间隔curve +- k *se(curve)，其中K是从conf.int决定。日志选项计算基于累积性危害或日志（生存）的时间间隔。最后一个选项基地间隔日志危害或日志（日志（生存））。

conf.lower a character string to specify modified lower limits to the curve, the  upper limit remains unchanged. Possible values are "usual" (unmodified),  "peto",  and "modified".  T he modified lower limit  is based on an "effective n" argument.  The confidence  bands will agree with the usual calculation at each death time, but unlike  the usual bands the confidence interval becomes wider at each censored  observation.  The extra width is obtained by multiplying the usual  variance by a factor m/n, where n is the number currently at risk and  m is the number at risk at the last death time.  (The bands thus agree  with the un-modified bands at each death time.)  This is especially useful for survival curves with a long flat tail. The Peto lower limit is based on the same "effective n" argument as the  modified limit, but also replaces the usual Greenwood variance term with  a simple approximation.  It is known to be conservative.
conf.lower一个字符串，指定修改曲线的下限，上限保持不变。可能的值是"usual"（未修改），"peto"，"modified"。他修改下限的基础上的一个“有效”的说法。乐队的信心将同意与往常计算，在每个死亡时间，但不像平时带置信区间变成在每个审查更广泛的观察。获得额外的宽度乘以M / N，其中n是多少，目前在风险和m是在最后的死亡时间在风险因素通常方差。（带从而同意与未改性的乐队在每个死亡时间），这是长平尾的生存曲线特别有用。在皮托下限的基础上同一个“有效”修改后的限制的论点，但也通常格林伍德方差长期用一个简单的近似代替。它被称为是保守的。

start.time numeric value specifying a time to start calculating survival information. The resulting curve is the survival conditional on surviving to start.time.
start.time数值指定的时间开始计算生存信息。由此产生的曲线是上存活start.time的生存条件。

conf.int the level for a two-sided confidence interval on the survival curve(s).  Default is 0.95.
conf.int双面置信区间上的生存曲线（S）的水平。默认值是0.95。

se.fit a logical value indicating whether standard errors should be  computed.  Default is TRUE.
se.fit一个逻辑值，指示是否应计算标准误差。默认TRUE。

Details

详情----------Details----------

The estimates used are the Kalbfleisch-Prentice  (Kalbfleisch and Prentice, 1980, p.86) and the Tsiatis/Link/Breslow,  which reduce to the Kaplan-Meier and Fleming-Harrington estimates,  respectively, when the weights are unity.
估计使用的Kalbfleisch徒弟（Kalbfleisch和徒弟，1980年，第86页）和的Tsiatis /链接/布瑞斯罗夫，这减少的Kaplan-Meier和弗莱明 - 哈灵顿估计，分别时的重量是团结。

The Greenwood formula for the variance is a sum of terms  d/(n*(n-m)), where d is the number of deaths at a given time point, n  is the sum of weights for all individuals still at risk at that time, and  m is the sum of weights for the deaths at that time.  The  justification is based on a binomial argument when weights are all  equal to one; extension to the weighted case is ad hoc.  Tsiatis  (1981) proposes a sum of terms d/(n*n), based on a counting process  argument which includes the weighted case.
格林伍德方差公式是一个方面的总和D /（N *（NM）），其中d是死亡人数在特定的时间点，n是所有个人的重量的总和，还是在当时的风险， m是当时死亡的权重的总和。的理由是基于二项分布参数的权重时，都等于一;加权的情况下扩展到专案。 tsiatis（1981）提出的条款和D /（N * N），根据点票过程的参数，其中包括加权的情况下。

The two variants of the F-H estimate have to do with how ties are handled. If there were 3 deaths out of 10 at risk, then the first increments the hazard by 3/10 and the second by 1/10 + 1/9 + 1/8. For the first method S(t) = exp(H), where H is  the Nelson-Aalen cumulative hazard estimate, whereas the fh2 method will  give results S(t) results closer to the Kaplan-Meier.
这两个变种的跳频估计的关系如何处理。如果有3人死亡，10危险，那么危险的3/10和1/10 + 1/9 + 1/8第二首递增。第一种方法为S（T）= EXP（H），其中H是纳尔逊 - 阿伦累积风险估计，而fh2方法会给结果S（T）的Kaplan-Meier结果更接近。

When the data set includes left censored or interval censored data (or both), then the EM approach of Turnbull is used to compute the overall curve. When the baseline method is the Kaplan-Meier, this is known to converge to the maximum likelihood estimate.
当数据集包括左截尾或区间数据（或两者），然后电磁接近特恩布尔是用来计算整体的曲线。当基线的方法是采用Kaplan-Meier，这被称为收敛到最大似然估计。

The cumulative incidence curve is an alternative to the Kaplan-Meier for competing risks data. For instance, in patients with MGUS, conversion to an overt plasma cell malignancy occurs at a nearly constant rate among those still alive.    A Kaplan-Meier estimate, treating death due to other causes as censored, gives a 20 year cumulate rate of 33% for the 241 early patients of Kyle. This estimates the incidence of conversion, if other causes of death were removed.
累计发病曲线是替代竞争风险数据的Kaplan-Meier。例如，MGUS的，转换到一个明显的浆单元恶性肿瘤患者的发生率几乎不变之间那些仍然活着的。一个采用Kaplan-Meier估计，治疗死亡，由于审查的其他原因，使20年累积的241凯尔早期患者，33％的税率。这个估计的转换率，如果其他死亡原因被拆除。

The CI estimate, on the other hand, estimates the total number of conversions that will actually occur.  Because the population is older, this is much smaller than the KM, 22% at 20 years for Kyle's data. If there were no censoring, then CI(t) could very simply be computed as total number of patients with progression by time t divided by the sample size n.
CI的估计，另一方面，估计总数的转换，将实际发生的。由于人口老化，这是远远高于公里，22％凯尔在20年的数据要小得多。如果没有设限，那么CI（T）可以很简单地计算总数除以样本大小的时间t的进展患者列印。

值----------Value----------

an object of class "survfit". See survfit.object for  details. Methods defined for survfit objects are print, plot,  lines, and points.
对象类"survfit"。看到survfit.object详情。 survfit对象定义的方法是print，plot，lines，points。

参考文献----------References----------

intervals for survival probabilities.  Statistics in Medicine  6, 679-87.
survival distribution in censored data.  Comm. in Statistics 13, 2469-86.
The Statistical Analysis of Failure Time Data. New York:Wiley.
Moncolonal gammopathy of undetermined significance and solitary plasmacytoma. Implications for progression to overt multiple myeloma}, Hematology/Oncology Clinics N. Amer. 11, 71-87.
function using Cox's proportional hazards model with covariates.  Biometrics 40, 601-610.
function with doubly censored data. J Am Stat Assoc, 69, 169-173.

参见----------See Also----------

survfit.coxph for survival curves from Cox models.
survfit.coxph从Cox模型生存曲线。

print, plot, lines, coxph, Surv, strata.
print，plot，lines，coxph，Surv，strata。

举例----------Examples----------

#fit a Kaplan-Meier and plot it [适合Kaplan-Meier和绘制]
fit <- survfit(Surv(time, status) ~ x, data = aml)
plot(fit, lty = 2:3)
legend(100, .8, c("Maintained", "Nonmaintained"), lty = 2:3)

#fit a Cox proportional hazards model and plot the  [适合Cox比例风险模型和绘制]
#predicted survival for a 60 year old [一个60岁的预测生存]
fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
plot(survfit(fit, newdata=data.frame(age=60)),
   xscale=365.25, xlab = "Years", ylab="Survival")

# Here is the data set from Turnbull[这里是从特恩布尔的数据集]
#  There are no interval censored subjects, only left-censored (status=3),[有没有区间科目，只剩审查（状态= 3），]
#  right-censored (status 0) and observed events (status 1)[右删失（状态0），并观察事件（状态1）]
#[]
#                            Time[时间]
#                      1 2 3 4[1 2 3 4]
# Type of observation[观察的类型]
#          death       12 6 2 3[死亡12 6 2 3]
#       losses       3 2 0 3[亏损3 2 0 3]
#    late entry       2 4 2 5[逾期报名2 4 2 5]
#[]
tdata <- data.frame(time  =c(1,1,1,2,2,2,3,3,3,4,4,4),
                  status=rep(c(1,0,2),4),
                  n    =c(12,3,2,6,2,4,2,0,2,3,3,5))
fit  <- survfit(Surv(time, time, status, type='interval') ~1,
            data=tdata, weight=n)

#[]
# Time to progression/death for patients with monoclonal gammopathy[单克隆丙种球蛋白的患者进展/死亡时间]
#  Competing risk curves (cumulative incidence)[竞争风险曲线（累积发病率）]
fit1 <- survfit(Surv(stop, event=='progression') ~1, data=mgus1,
                  subset=(start==0))
fit2 <- survfit(Surv(stop, status) ~1, data=mgus1,
                  subset=(start==0), etype=event) #competing risks[竞争风险]
# CI curves are always plotted from 0 upwards, rather than 1 down[CI曲线绘制从0向上，而不是1下]
plot(fit2, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE,
         col=2:3, xlab="Years post diagnosis of MGUS")
lines(fit1, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE,
         conf.int=FALSE)
text(10, .4, "Competing Risk: death", col=3)
text(16, .15,"Competing Risk: progression", col=2)
text(15, .30,"KM:prog")

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言:survfit.formula()函数中文帮助文档(中英文对照)

浏览过的版块