R语言 zipfR包 vec2xxx()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-2 07:50:43

vec2xxx(zipfR)
vec2xxx()所属R语言包：zipfR

                                    Type-Token Statistics for Samples and Empirical Data (zipfR)
                                       类型令牌统计样本和经验数据（zipfR）

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Compute type-frequency list, frequency spectrum and vocabulary growth curve from a token vector representing a random sample or an observed sequence of tokens.
计算从令牌向量的令牌的随机抽样或观察到的序列类型的频率表，频谱和词汇的生长曲线。

用法----------Usage----------

  vec2tfl(x)

  vec2spc(x)

  vec2vgc(x, steps=200, stepsize=NA, m.max=0)

参数----------Arguments----------

参数：x
a vector of length N_0, representing a random sample or other observed data set of N_0 tokens.  For each token, the corresponding element of x specifies the type that the token belongs to.  Usually, x is a character vector, but it might also specify integer IDs in some cases.
一个向量的长度N_0，较N_0令牌的随机抽样或其他观察到的数据。对于每一个片段，相应的元素x指定的令牌所属的类型。通常情况下，x是一个字符向量，但在某些情况下，它也可以指定整数ID。

参数：steps
number of steps for which vocabulary growth data V(N) is calculated.  The values of N will be evenly spaced (up to rounding differences) from N=1 to N=N_0.
词汇增长数据V(N)计算的步骤数。 N的值是均匀分布的（四舍五入差异）N=1到N=N_0。

参数：stepsize
alternative way of specifying the steps of the vocabulary growth curve.  In this case, vocabulary growth data will be calculated every stepsize tokens.  The first step is chosen such that the last step corresponds to the full sample (N=N_0).  Only one of the parameters steps and stepsize may be specified.
指定的词汇生长曲线的步骤的另一种方法。在这种情况下，词汇增长的数据将被计算一次stepsize令牌。第一个步骤是选择的最后一个步骤，使得对应于完整的样品（N=N_0）。的参数中，只有1steps和stepsize可以指定。

参数：m.max
an integer in the range $1 ... 9$, specifying how many spectrum elements V_m(N) to include in the vocabulary growth curve.  By default only vocabulary size V(N) is calculated, i.e. m.max=0.
范围内的整数$ 1 ... 9 $，指定多少光谱元素V_m(N)，包括在词汇增长曲线。默认情况下，只有词汇量的大小V(N)计算，即m.max=0。

Details

详细信息----------Details----------

There are two main applications for the vec2xxx functions:
vec2xxx的函数主要有两个应用程序：

a) They can be used to calculate type-token statistics and vocabulary growth curves for random samples generated from a LNRE
a）它们可被用于计算从LNRE产生的随机样本类型令牌统计和词汇的生长曲线

b) They provide an easy way to process a user's own data without having to rely on external scripts to compute frequency spectra and vocabulary growth curves.  All that is needed is a text file in one-token-per-line formt (i.e. where each token is given on a separate line).  See "Examples" below for further
b）它们提供了一个简单的方法来处理一个用户自己的数据，而无需依靠外部脚本来计算频谱和词汇增长曲线。所有这一切都需要的是一个文本文件中的一个令牌每行formt（即每个令牌都是在一个单独的行）。请参阅“示例”，以获得更多的

Both applications work well for samples of up to approx. 1 million tokens.  For considerably larger data sets, specialized external software should be used, such as the Perl scripts provided on the zipfR homepage.
这两个应用程序的工作以及高达约样品。 100万的标记。对于更大的数据集，可以使用专门的外部软件，如zipfR网页上提供的Perl脚本。

值----------Value----------

An object of class tfl, spc or vgc, representing the type frequency list, frequency spectrum or vocabulary growth curve of the token vector x, respectively.
类的一个对象tfl，spc或vgc，代表型频率列表，频谱或词汇的令牌向量x，分别增长曲线。

参见----------See Also----------

tfl, spc and vgc for more information about type frequency lists, frequency spectra and vocabulary growth curves
tfl，spc和vgc型频率列表的更多信息，频谱和词汇的生长曲线

rlnre for generating random samples (in the form of the required token vectors) from a LNRE model
rlnre用于产生随机样本（在所需的令牌向量的形式）从LNRE模型

readLines and scan for loading token vectors from disk files
readLines和scan从磁盘文件加载令牌向量

实例----------Examples----------

## type-token statistics for random samples from a LNRE distribution[＃类型标记为随机样本的统计数据从LNRE分布]

model <- lnre("fzm", alpha=.5, A=1e-6, B=.05)
x <- rlnre(model, 100000)

vec2tfl(x)
vec2spc(x)  # same as tfl2spc(vec2tfl(x))[同样的为tfl2spc（vec2tfl（X））]
vec2vgc(x)

sample.spc <- vec2spc(x)
exp.spc <- lnre.spc(model, 100000)
## Not run:  plot(exp.spc, sample.spc) [＃不运行：的图（exp.spc，sample.spc）]

sample.vgc <- vec2vgc(x, m.max=1, steps=500)
exp.vgc <- lnre.vgc(model, N=N(sample.vgc), m.max=1)
## Not run:  plot(exp.vgc, sample.vgc, add.m=1) [＃不运行：的图（exp.vgc，sample.vgc，add.m = 1）]

## load token vector from a file in one-token-per-line format[＃加载令牌矢量从一个文件中在一个令牌每行格式]

## Not run:  x <- readLines(filename) [＃不运行X < -  readlines方法（文件名）]
## Not run:  x <- readLines(file.choose()) # with file selection dialog  [＃不运行：X < -  readlines方法（file.choose（））＃文件选择对话框]

## you can also perform whitespace tokenization and filter the data[＃你也可以标记化的空白，对数据进行筛选]

## Not run:  brown <- scan("brown.pos", what=character(0), quote="") [＃不运行：棕色 - 扫描（的“brown.pos”，什么字符（0），报价=“”）]
## Not run:  nouns <- grep("/NNS?$", brown, value=TRUE) [＃不运行：名词 -  grep（/ NNS？“，褐色，价值= TRUE的）]
## Not run:  plot(vec2spc(nouns)) [＃不运行图（vec2spc（名词））]
## Not run:  plot(vec2vgc(nouns, m.max=1), add.m=1) [＃不运行图（vec2vgc（名词，m.max = 1），add.m = 1）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册