R语言:model.matrix()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 18:04:01

model.matrix(stats)
model.matrix()所属R语言包：stats

                                    Construct Design Matrices
                                       构造设计矩阵

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

model.matrix creates a design (or model) matrix.
model.matrix创建一个设计（或模型）的矩阵。

用法----------Usage----------

model.matrix(object, ...)

## Default S3 method:[默认方法]
model.matrix(object, data = environment(object),
         contrasts.arg = NULL, xlev = NULL, ...)

参数----------Arguments----------

参数：object
an object of an appropriate class.  For the default method, a model formula or a terms object.
一个适当的类的对象。默认的方法，模型公式或一个terms对象。

参数：data
a data frame created with model.frame.  If another sort of object, model.frame is called first.
model.frame创建一个数据框。如果另一个对象排序，model.frame被称为第一。

参数：contrasts.arg
A list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors.
一个列表，其项值（命名功能的数字矩阵或字符串）被替换值contrasts替换功能用于他们的名字是data含factor列名S。

参数：xlev
to be used as argument of model.frame if data is such that model.frame is called.
作为参数使用model.frame如果data是model.frame被称为。

参数：...
further arguments passed to or from other methods.
通过进一步的论据或其他方法。

Details

详情----------Details----------

model.matrix creates a design matrix from the description given in terms(object), using the data in data which must supply variables with the same names as would be created by a call to model.frame(object) or, more precisely, by evaluating attr(terms(object), "variables").  If data is a data frame, there may be other columns and the order of columns is not important.  Any character variables are coerced to factors, with a warning.  After coercion, all the variables used on the right-hand side of the formula must be logical, integer, numeric or factor.
model.matrix创建从terms(object)，使用中的数据data必须提供相同的名称，将通过调用变量创建model.frame(object)描述的设计矩阵或者，更确切地说，通过评估attr(terms(object), "variables")。如果data是一个数据框，有可能是其他列和列的顺序并不重要。任何字符变量被强制的因素，一个警告。胁迫后，公式右侧上使用的所有变量必须是逻辑，整数，数字或因素。

If contrasts.arg is specified for a factor it overrides the default factor coding for that variable and any "contrasts" attribute set by C or contrasts.
如果contrasts.arg指定的一个因素，它覆盖了默认的编码，变量的因素和任何"contrasts"属性C或contrasts设置。

In an interaction term, the variable whose levels vary fastest is the first one to appear in the formula (and not in the term), so in ~ a + b + b:a the interaction will have a varying fastest.
在交互项，其水平变化最快的变量是第一个出现在公式（而不是在短期），所以在~ a + b + b:a的互动将有a变化最快的。

By convention, if the response variable also appears on the right-hand side of the formula it is dropped (with a warning), although interactions involving the term are retained.
按照惯例，如果响应变量，也对公式的右侧出现下降（警告），但涉及的长期的相互作用被保留。

值----------Value----------

The design matrix for a regression-like model with the specified formula and data.
对于一个指定的公式和数据回归模型设计矩阵。

There is an attribute "assign", an integer vector with an entry for each column in the matrix giving the term in the formula which gave rise to the column.  Value 0 corresponds to the intercept (if any), and positive values to terms in the order given by the term.labels attribute of the terms structure corresponding to object.
有一个属性"assign"整数向量，引起了列公式中的长期的矩阵中的每一列的条目。值0对应的拦截（如有），在给定的顺序term.labels属性相应的的terms结构object条款和正面的价值观。

If there are any factors in terms in the model, there is an attribute "contrasts", a named list with an entry for each factor.  This specifies the contrasts that would be used in terms in which the factor is coded by contrasts (in some terms dummy coding may be used), either as a character vector naming a function or as a numeric matrix.
如果有任何因素在模型方面，有属性"contrasts"，命名与各因素的条目列表。指定将用于在其中的因素是编码对比（一些假人编码可使用的术语）的条款，无论是作为一个字符命名的功能向量，或作为一个数字矩阵，对比。

参考文献----------References----------

Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

参见----------See Also----------

model.frame, model.extract, terms
model.frame，model.extract，terms

举例----------Examples----------

ff <- log(Volume) ~ log(Height) + log(Girth)
utils::str(m <- model.frame(ff, trees))
mat <- model.matrix(ff, m)

dd <- data.frame(a = gl(3,4), b = gl(4,1,12)) # balanced 2-way[均衡的2路]
options("contrasts")
model.matrix(~ a + b, dd)
model.matrix(~ a + b, dd, contrasts = list(a="contr.sum"))
model.matrix(~ a + b, dd, contrasts = list(a="contr.sum", b="contr.poly"))
m.orth <- model.matrix(~a+b, dd, contrasts = list(a="contr.helmert"))
crossprod(m.orth) # m.orth is  ALMOST  orthogonal[m.orth几乎是正交]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

Tiramisu023 · 发表于 2015-5-4 17:13:42

本帖最后由 Tiramisu023 于 2015-5-4 19:19 编辑

请问楼主，contr.helmert, contr.poly, contr.sum, contr.treatment 的用法是什么？

之前有一个协方差分析的案例，是针对观测值不等，或者数据缺失时，需要采用第三类的平方和，即Use the type 3(III) sums of squares.

For Unequal sample sizes, missing data and number of cases, do not use type 1 sums of squares.

Type 1 sums of squares assumes that the difference in number of subjects is meaningful and gives more weight to the values from larger groups.

The type 3 sums of square assumes that the data was supposed to be complete, and the difference in the number of subjects is not meaningful.
   ---Acts like standard multiple regression. Each main effect and interaction is assessed after all other main effects, interactions and covariates are controlled.
   ---Treats all groups the same – small group is weighted equally as a large group (sometimes called the unweighted approach).
   ---Are preferable in most cases since they correspond to the variation attributable to an effect after correcting for any other ffects in the model. They are unaffected by the frequency of observations.

# R code
> library(car)
> sample.data <- data.frame(IV = factor(rep(1:4, each = 20)),
> DV = rep(c(-3,0,1,3), each = 20) + rnorm(80))
> Anova(mod <- lm(DV ~ IV, data = sample.data, contrasts = list(IV = contr.poly)), type = "III")

请问那个constrasts=list(IV = contr.poly)是怎么用的呢？？

账号		自动登录	找回密码
密码			注册

R语言:model.matrix()函数中文帮助文档(中英文对照)

浏览过的版块