R语言:merge()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-17 10:01:36

merge(base)
merge()所属R语言包：base

                                    Merge Two Data Frames
                                       合并两个数据框

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Merge two data frames by common columns or row names, or do other versions of database join operations.
合并两个数据框由共同的列或行的名称，或做其他版本的数据库连接操作。

用法----------Usage----------

merge(x, y, ...)

## Default S3 method:[默认方法]
merge(x, y, ...)

## S3 method for class 'data.frame'
merge(x, y, by = intersect(names(x), names(y)),
   by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
   sort = TRUE, suffixes = c(".x",".y"),
   incomparables = NULL, ...)

参数----------Arguments----------

参数：x, y
data frames, or objects to be coerced to one.
被裹挟到一个数据框，或对象。

参数：by, by.x, by.y
specifications of the common columns.  See "Details".
规范的公共列。见“详细资料”。

参数：all
logical; all = L is shorthand for all.x = L and all.y = L, where L is either TRUE or FALSE.
逻辑;all = L是all.x = L和的简写all.y = L，其中L或者TRUE或FALSE。

参数：all.x
logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y.  These rows will have NAs in those columns that are usually filled with values from y.  The default is FALSE, so that only rows with data from both x and y are included in the output.
逻辑;如果TRUE，那么额外的行会被添加到输出，每行一个x没有y的匹配行。这些行会有NA的那些通常与充满y值的列。默认是FALSE，让两个只从数据行x和y包含在输出。

参数：all.y
logical; analogous to all.x above.
逻辑;类似于all.x以上。

参数：sort
logical.  Should the results be sorted on the by columns?
逻辑。应上by列进行排序的结果吗？

参数：suffixes
character(2) specifying the suffixes to be used for making non-by names() unique.
字符（2）指定用于非bynames()独特的后缀。

参数：incomparables
values which cannot be matched.  See match.
不能匹配的值。看到match。

参数：...
arguments to be passed to or from methods.
参数被传递到或从方法。

Details

详情----------Details----------

By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y.  Columns can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names.  The rows in the two data frames that match on the specified columns are extracted, and joined together.  If there is more than one match, all possible matches contribute one row each.  For the precise meaning of "match", see match.
默认情况下，数据框合并列上他们都有名称，但单独列规范可以通过by.x和by.y。列可以通过姓名，电话号码，或指定一个逻辑向量：名称"row.names"或数量0指定的行名。在两个数据框，在指定的列匹配的行提取，并结合在一起。如果有一个以上的比赛，所有可能的匹配，有助于每一个行。 “比赛”的确切含义，请参阅match。

If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y, i.e., dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).
如果by都by.x和by.y长度为0（零向量的长度或NULL），结果，r，是笛卡尔积x和y，即dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y))。

If all.x is true, all the non matching cases of x are appended to the result as well, with NA filled in the corresponding columns of y;  analogously for all.y.
all.x如果是真实的，所有非匹配的情况下x追加的结果，以及，NA的y的相应列填写;类似于all.y。

If the remaining columns in the data frames have any common names, these have suffixes (".x" and ".y" by default) appended to make the names of the result unique.
如果在数据框的其余列有共同的名字，这些都suffixes（".x"和".y"默认）追加到结果的独特的名称。

The complexity of the algorithm used is proportional to the length of the answer.
使用的算法的复杂性是答案的长度成正比。

In SQL database terminology, the default value of all = FALSE gives a natural join, a special case of an inner join. Specifying all.x = TRUE gives a left (outer) join, all.y = TRUE a right (outer) join, and both (all=TRUE a (full) outer join.  DBMSes do not match NULL records, equivalent to incomparables = NA in R.
在SQL数据库术语，all = FALSE的默认值给出了一个自然的联接，内部联接的特殊情况。指定all.x = TRUE给出了左（外）加入，all.y = TRUE右（外）加入，既（all=TRUE（满）外联接。DBMSes不匹配NULL记录，相当于incomparables = NA在R

值----------Value----------

A data frame.  The rows are by default lexicographically sorted on the common columns, but for sort = FALSE are in an unspecified order. The columns are the common columns followed by the remaining columns in x and then those in y.  If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has "automatic" row names.
一个数据框。行是默认字典上常见的列排序，但sort = FALSE在未指定的顺序。列在x剩余的列，然后在y共同列。如果匹配涉及的行名，称为Row.names一个多余的字符列在左边增加，结果在所有情况下，有“自动”行名。

参见----------See Also----------

data.frame, by, cbind
data.frame，by，cbind

举例----------Examples----------

## use character columns of names to get sensible sort order[＃使用字符列名，以得到合理的排序顺序]
authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney",
         "Ripley", "Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
            "Modern Applied Statistics ...",
            "LISP-STAT",
            "Spatial Statistics", "Stochastic Simulation",
            "Interactive Data Analysis",
            "An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
                  "Venables & Smith"))

(m1 <- merge(authors, books, by.x = "surname", by.y = "name"))
(m2 <- merge(books, authors, by.x = "name", by.y = "surname"))
stopifnot(as.character(m1[,1]) == as.character(m2[,1]),
      all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]),
      dim(merge(m1, m2, by = integer(0))) == c(36, 10))

## "R core" is missing from authors and appears only here :[＃“ŕ核心”缺少作家，似乎只有在这里：]
merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)

## example of using 'incomparables'[＃例如使用“incomparables”]
x <- data.frame(k1=c(NA,NA,3,4,5), k2=c(1,NA,NA,4,5), data=1:5)
y <- data.frame(k1=c(NA,2,NA,4,5), k2=c(NA,NA,3,4,5), data=1:5)
merge(x, y, by=c("k1","k2")) # NA's match[NA的比赛]
merge(x, y, by=c("k1","k2"), incomparables=NA)
merge(x, y, by="k1") # NA's match, so 6 rows[NA的比赛，所以6行]
merge(x, y, by="k2", incomparables=NA) # 2 rows[2行]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册