tfl(zipfR)
tfl()所属R语言包:zipfR
Type Frequency Lists (zipfR)
类型频列表(zipfR)
译者:生物统计家园网 机器人LoveR
描述----------Description----------
In the zipfR library, tfl objects are used to represent a type frequency list, which specifies the observed frequency of each type in a corpus. For mathematical reasons, expected type frequencies are rarely considered.
zipfR库,tfl对象被用来代表一种类型的频率列表,每种类型的指定观察到的频率在语料库。对于数学上的原因,预计型的频率很少考虑。
With the tfl constructor function, an object can be initialized directly from the specified data vectors. It is more common to read a type frequency list from a disk file with read.tfl or, in some cases, derive it from an observed frequency spectrum with spc2tfl.
tfl构造函数,可以直接初始化一个对象从指定的数据向量。这是较常见read.tfl,或在某些情况下,从观测到的频谱,推导出它与spc2tfl从磁盘文件读取一个类型的频率表。
tfl objects should always be treated as read-only.
tfl对象应始终被视为只读。
用法----------Usage----------
tfl(f, k=1:length(f), type=NULL, f.min=min(f), f.max=max(f),
incomplete=!(missing(f.min) && missing(f.max)), N=NA, V=NA,
delete.zeros=FALSE)
参数----------Arguments----------
参数:k
integer vector of type IDs k (if omitted, natural numbers 1,2,… are assigned automatically)
整数向量类型ID k(如果省略,自然数1,2,…是自动分配)
参数:f
vector of corresponding type frequencies f_k
相应类型的频率的矢量f_k
参数:type
optional character vector of type representations (e.g. word forms or lemmata), used for informational and printing purposes only
可选的类型表示的字符向量(例如Word形式或lemmata的),只用于信息和印刷
参数:incomplete
indicates that the type frequency list is incomplete, i.e. only contains types in a certain frequency range (typically, the lowest-frequency types may be excluded). Incomplete type frequency lists are rarely useful.
表明型频率列表是不完整的,即只包含类型在一定频率范围内(典型地,在最低频率的类型可能被排除)。不完全类型的频率列表中是很少有用。
参数:N, V
sample size and vocabulary size corresponding to the type frequency list have to be specified explicitly for incomplete lists
样本量和词汇量的大小对应的类型频率“列表中被明确指定为不完整的列表
参数:f.min, f.max
frequency range represented in an incomplete type frequency list (see details below)
频率范围在一个不完整的类型频率“列表中(详见下文)
参数:delete.zeros
if TRUE, delete types with f=0 from the type frequency list, after assigning type IDs. This operation does not make the resulting tfl object incomplete.
如果TRUE,删除f=0型频率列表,后分配类型ID的类型。此操作不会使产生的tfl对象不完整的。
Details
详细信息----------Details----------
If f.min and f.max are not specified, but the list is marked as incomplete (with incomplete=TRUE), they are automatically determined from the frequency vector f (making the assumption that all types in this frequency range are listed). Explicit specification of either f.min or f.max implies an incomplete list. In this case, all types outside the specified range will be deleted from the list. If incomplete=FALSE is explicitly given, N and V will be determined automatically from the input data (which is assumed to be complete), but the resulting type frequency list will still be incomplete.
如果f.min和f.max没有指定,但标记为不完整的列表(用incomplete=TRUE),他们会自动确定的矢量变频f(假设,在这个频率范围内的所有类型列出)。显式指定的是f.min或f.max意味着一个不完整的列表。在这种情况下,所有类型的超出规定的范围内将被从列表中删除。 incomplete=FALSE如果明确给出,N和V将自动确定输入数据(这被认为是完整的),但生成的类型频率名单将仍然是不完整的。
If you just want to remove types with f=0 without marking the type frequency list as incomplete, use the option delete.zeros=TRUE.
如果你只是想删除类型f=0无标记的类型频率是不完整的列表,使用该选项delete.zeros=TRUE。
A tfl object is a data frame with the following variables:
Atfl对象是一个数据框以下因素:
k integer type ID k
k整数类型的ID k
f corresponding type frequency f_k
f相应类型的频率f_k
type optional: character vector with type
type字符可选:矢量型
The data frame always has to be sorted with respect to the k column (ascending order).
总是要排序的数据框方面的k列(升序)。
The following attributes are used to store additional information about the frequency spectrum:
以下属性的频谱,用于存储其他信息:
N, V sample size N and vocabulary size V corresponding to the type frequency list. For a complete list, these values could easily be determined from the f
N, V样本大小N和词汇量的大小V相应的类型频率列表。的完整列表,这些值可以很容易地确定从f
incomplete if TRUE, the type frequency list is incomplete, i.e. it lists only types in the frequency range given
incomplete如果TRUE,类型频率列表是不完整的,也就是说,它仅列出类型在给定的频率范围
f.min, f.max range of type frequencies represented in the list (should be ignored unless the
f.min,f.max类型列表中的频率代表(应被忽略,除非
hasTypes indicates whether or not the type
hasTypes表示是否type
值----------Value----------
An object of class tfl representing the specified type frequency list. This object should be treated as read-only (although such behaviour cannot be enforced in R).
对象的类tfl代表的指定类型的频率列表。这个对象应该被视为只读(虽然这种行为不能被强制执行,在R)。
参见----------See Also----------
read.tfl, write.tfl, plot.tfl, sample.tfl, spc2tfl, tfl2spc
read.tfl,write.tfl,plot.tfl,sample.tfl,spc2tfl,tfl2spc
Generic methods supported by tfl objects are print, summary, N, V and Vm.
支持的tfl对象的通用方法是print,summary,N,V和Vm。
Implementation details and non-standard arguments for these methods can be found on the manpages print.tfl, summary.tfl, N.tfl, V.tfl, etc.
这些方法的执行细节及非标参数上可以找到的手册页print.tfl,summary.tfl,N.tfl,V.tfl,等
实例----------Examples----------
## typically, you will read a tfl from a file[#通常情况下,你会从文件中读TFL]
## (see examples in the read.tfl manpage)[#(见的read.tfl联机帮助中的例子)]
## or you can load a ready-made tfl[#也可以加载一个现成的TFL]
data(Brown.tfl)
summary(Brown.tfl)
print(Brown.tfl)
## or create it from a spectrum (with different ids and[#或创建它的频谱(用不同的ID,]
## no type labels)[#任何类型的标签)]
data(Brown.spc)
Brown.tfl2 <- spc2tfl(Brown.spc)
## same frequency information as Brown.tfl[#相同的频率作为Brown.tfl的信息]
## but with different ids and no type labels[#,但不同的ID和任何类型的标签]
summary(Brown.tfl2)
print(Brown.tfl2)
## how to display draw a Zipf's rank/frequency plot[#如何显示的Zipf的职级/频率曲线]
## by extracting frequencies from a tfl[#提取频率从TFL]
plot(sort(Brown.tfl$f,decreasing=TRUE),log="y",xlab="rank",ylab="frequency")
## simulating a tfl[#模拟TFL]
Zipfian.tfl <- tfl(1000/(1:1000))
plot(Zipfian.tfl$f,log="y")
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|