R语言 YourCast包 yourprep()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-2 07:40:06

yourprep(YourCast)
yourprep()所属R语言包：YourCast

                                    Data object creation wizard for YourCast
                                       数据对象的创建向导，YourCast

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Builds the data object for yourcast function from files in working directory or other specified
建立yourcast功能从文件的工作目录或其他指定的数据对象

用法----------Usage----------

                  datalist=NULL,G.names=NULL,A.names=NULL,
                  T.names=NULL,adjacency=NULL,year.var=FALSE,
                  sample.frame=NULL,summary=FALSE,verbose=FALSE,

                  #lagging utility

参数----------Arguments----------

参数：dpath
String. Name of the directory where data files are stored.  If NULL then defaults to working directory. Default: NULL
字符串。数据文件存储的目录的名称。如果NULL然后默认工作目录。默认值：NULL

参数：tag
String. Group of characters placed before CSID code in filenames to indicate which files in dpath function should load. The tag can also be used to differentiate between different groups to be considered in separate analysis; for example, "m" for male deaths and "f" for female deaths. Default: "csid"
字符串。集团摆在CSID代码在文件名中的字符来表示文件中dpath函数加载。的tag也可以被用来区分不同的群体被认为是在单独的分析，例如，M的男性死亡和f为女性死亡。默认值："csid"

参数：index.code
String indicating how the CSID index variable is coded in the input data. Between 0 and 4 of the following two characters are used in this order: g for the geographic index (such as country) and a for a grouped continuous variable like an age group. For example, ggggaa would have the function interpret "245045" by using "2450" as the country code and "45" as the age group. Default: "ggggaa"
指示CSID索引变量如何在所输入的数据进行编码的字符串。 0和4，以下两个字符的顺序使用：g的GEO指标（如国家）和a像一个年龄组的分组连续变量。例如，ggggaa将具备的功能解释245045通过使用2450作为国家代码和45的年龄组。默认值："ggggaa"

参数：datalist
A list of cross section dataframes already loaded into the workspace to be added to the dataobj. Names of list elements should be the numerical CSID code for each cross section, and dataframes should be formated identically to files loaded from an external directory (see Details)
横截面的列表dataframes已经加载到工作区的dataobj。列表元素的名称应CSID的数值代码的每个横截面，并dataframes的格式必须是加载的文件从外部目录相同（见详情）

参数：A.names, G.names, T.names
String. Filename of optional two-column data files that list all valid numerical codes (in the first column) and corresponding alphanumeric names (optionally in the second column) for the indices corresponding to geographic areas in G.names, age groups in A.names, and time periods in T.names. Function will search dpath for file with specified name; please include column labels. The optional alphanumeric identifiers are most commonly only used for geographic areas since numerical values for age groups and time periods are usually meaingful on their own. However, if other grouped continuous variable used in place of ages, for example, specifying these labels will be important for output to be meaningful. NOTE: Auxiliary files will loaded automatically by yourprep() if they are saved in the dpath and labeled with the tag specified by the user. See "Details" section for more infromation. Default: NULL
字符串。文件名可选的两列数据文件，列出所有有效的数字代码（在第一列）和相应的字母数字名称（可选的第二列）对应的GEO区域G.names，<年龄组的指数X>，和时间段在A.names。将搜索功能T.names具有指定名称的文件，请包括列标签。可选的字母数字标识符是最常见的用于GEO区域，因为自己的年龄组，时间段的数值是通常meaingful的。但是，如果其他分组的连续变量使用到位的年龄，例如，指定这些标签将是重要的输出是有意义的。注意：辅助文件会自动加载dpath，如果它们被保存在yourprep()和dpath由用户指定的标记。请参见“详细信息”部分的更多infromation。默认值：tag

参数：adjacency
Data file with codes to construct the symmetric matrix (geographic region by geographic region) of proximity scores for geographic smoothing used by the "map" and "bayes" methods. The larger the relative score, the more proximate that pair of countries is in the prior; a zero element means the two geographic areas are unrelated (the diagonal is ignored).  Each row of the proximity file has three columns, consisting of geographic codes for two countries and a score indicating the proximity or similarity of the two geographic regions; please include column labels. For convenience, geographic regions that are unrelated (and would have zero entries in the symmetric matrix) may be omitted from proximity. In addition, proximity may include rows corresponding to geographic regions not included in the present analysis. Default: NULL
数据文件的代码构造对称矩阵（按GEO区域划分的GEO区域）的接近比分为GEO所使用的“图”和“贝叶斯方法的平滑。相对评分越大，越靠近，对国家是在现有的零元素装置的两个GEO区域是无关的（对角线被忽略）。 proximity文件的每行有三列，由GEO两国和一个分值，指明的两个GEO区域的接近或相似的代码;请包括列标签。为了方便起见，是不相关的GEO区域（和将具有的对称矩阵中的非零项）可以省略proximity。此外，proximity可以包括不包括在本分析中的GEO区域的行对应。默认值：NULL

参数：year.var
Boolean. Should be TRUE if year coded as separate variable rather than as rowname for cross section data files.  Function will look for year variable to use as rownames and then drop it from the dataframe. Change will only be made to dataframe if it does not already have rownames or if exisiting rownames are merely a "1...N" index of row numbers, so it is possible to apply correction even if some cross sections do not have a year variable and already have the correct rownames. Default: FALSE
布尔值。应该是TRUE如果year编码为单独的变量而不是作为rowname的横截面数据文件。功能寻找year变量用作行名，然后将其从数据框。变化将仅可向数据框，如果它不已经有行名或如果重新审视行名仅仅是1 ... N的行号的索引，所以它可以应用校正，即使一些横截面不具有year变量，而且已经有正确的行名。默认值：FALSE

参数：sample.frame
Optional four element vector containing, in order, the start and end time periods to be used for the observed data and the start and end time periods to be forecast. All cross sections do not have to begin at starting date, but must contain all years after the first observed value. Variables to be forecasted should be coded as NA in the out-of-sample period. Note that this makes it easy to reserve a range of values of the dependent variable for out-of-sample forecasting evaluation; our summary and plot functions in yourcast will make these comparisons automatically if the out-of-sample data are included. yourprep() uses this information only to verify that cross sections are correctly constructed, but it should also be included if one wants to use the lag utility. Default: NULL
可选的四个元素的向量，包含，开始和结束时间的期间中，为了将用于所观察到的数据的开始和结束的时间段以进行预测。所有的截面都没有开始的开始日期，但必须包含所有年后的第一个观测值。应编码为NA在样本期间要预测的变量。请注意，这可以很容易地保留了一系列的因变量的样本外预测评估值;我们的summary和plotyourcast功能将自动使这些比较样本数据都包括在内。 yourprep()使用这些信息只是为了验证截面的构造正确的，但它也应包括在内，如果一个人想使用滞后实用工具。默认值：NULL

参数：summary
Boolean. If TRUE, means for available observations on each variable are displayed for the cross sections read by yourprep(). Default: FALSE
布尔值。如果TRUE，意味着每个变量的观测资料显示的横截面读取yourprep()。默认值：FALSE

参数：verbose
Boolean. If TRUE, function prints name of each cross section or auxiliary file as it is read into the dataobj. Default: FALSE
布尔值。如果TRUE，函数可打印的每个横截面的名称或辅助文件，因为它是读入dataobj。默认值：FALSE

参数：lag
Number of years covariate data needs to be lagged from current position is cross section files. See "Details" for more information. Default: NULL
协变量的数据需要被滞后从当前位置的年数是截面文件。有关更多信息，请参阅“详细信息”。默认值：NULL

参数：formula
Formula. The formula that one will use in the subsequent run of yourcast(). This helps the lagging utility distinguish between the response variable (which will not be shifted between cross sections) and the covariates of interest that should be lagged and included in the final cross sections of the dataobj. If the covariate "index" is included in the formula, the lagging utility will include a variable in the cross sections that starts from 1 and counts the number of time periods since the start of the cross section. If a lag is requested, the formula argument must be specified. Default: NULL
公式。将使用的公式，在随后的运行yourcast()。这有助于滞后实用程序区分的响应变量（将不被移动之间的横截面）和协变量应滞后，并包括在最后的横截面的dataobj利息。如果协变量“索引”的滞后实用程序包括在式中，将包括从1开始的数量进行计数的时间周期的开始以来的横截面的横截面中的一个变量。如果一个滞后请求，公式参数必须指定。默认值：NULL

参数：vars.nolag
Vector of strings. Vector of variables to be included in the dataobj but not lagged. These variables do not need to be included in the formula, and if found there will not ignored when the other covariates are lagged.
向量的字符串。向量的变量包括在的dataobj，但并未落后。这些变量并不需要包括在公式中，如果发现有其他变量时，不忽略滞后。

Details

详细信息----------Details----------

Creates dataobj input for yourcast from files in working directory or other specified directory. Checks that all cross sections in data list titled properly and if all years up to last predicted year included in the dataframes (if sample.frame argument specified). Please note, however, that all cross sections from the same geographic area must have the same observation and prediction years in the dataframe (even if NA) for the graphing software plot.yourcast to work.
创建dataobj输入yourcast工作目录或其它指定的目录中的文件。检查所有的导线截面data名单题为正确的，如果所有年份的最后预测今年在dataframes（如果sample.frame参数）。然而，请注意，从同一个GEO区域中的所有横截面必须具有相同的观察和预测年中的数据框（即使NA）的绘图软件plot.yourcast工作。

The cross section files must be named according to the CSID identifiers for country code and age group, preceeded by the specified tag (default: "csid") so that yourprep() can identify the file from other files in the dpath. For example, for the USA (country code 2450) time series of 45 year old individuals, the file name should be "csid245045.txt" if the tag is left as the default. Files must have an extension so that the program can recognize how the data is coded. Currently, fixed width text files ("*.txt"), comma-separated values ("*.csv"), and Stata v.5-10 ("*.dta") files are supported, and multiple file types may be used in the same run of the program. "*.Rdata" objects can be included with the datalist option after they are loaded to a list in the workspace. yourprep() includes diagnostics to ensure that objects are properly named and not included accidentally, but users should examine the specified dpath before running yourprep() to minimize errors.
的横截面文件必须命名，前面有指定的标签（默认值：根据国家（区域）代码和年龄组的的CSID标识符为"csid"），所以这yourprep()可以识别的文件从其他文件中的dpath 。例如，美国45岁的个人（国家代码2450）的时间序列，文件名应该是csid245045.txt如果标签被保留为默认设置。文件的扩展名必须如何对数据进行编码，使程序可以识别。目前，固定宽度的文本文件（*.txt），逗号分隔值（*.csv），和Stata V.5-10（*.dta＆rsquo的;)的文件被支持，并在相同的运行的程序，也可以使用多个文件类型。 “”*.Rdata对象可以包含在datalist选项后，他们将被加载到在工作区中的列表。 yourprep()包括诊断，以确保正确命名的对象，不包括意外，但用户应该检查指定的dpath运行前yourprep()，以尽量减少错误。

Each cross section file should be labeled columns of time-series data for the dependent variable(s) (e.g., disease, pop) and the covariates that will be used in the forecast. The rownames for the dataframe should be the observation year (if the year is coded as a separate variable, set year.var=TRUE). The files must contain the full time series that will be specified in the sample.frame argument in yourcast after the first observed year. For instance, if sample.frame=c(1950,2000,2001,2030), then files would have observations that start between 1950 and 2000 and include all other years (even if the entries are NA) up to the last year of prediction, i.e., 2030.
每个横截面文件应被标记为依赖变量（s）（例如，疾病，流行音乐）和将在预测中使用的协变量的时间序列数据的列。为数据框的行名应该是观察年（如果当年被编码为一个独立的变量，year.var=TRUE“）。这些文件必须包含完整的时间序列，将在指定的sample.frame参数yourcast后的第一个观测到年中。例如，如果sample.frame=c(1950,2000,2001,2030)，然后文件将有观察，在1950年至2000年间开始，包括所有其他（即使的条目是NA）去年的预测，即，到2030年，年。

Optional auxiliary files such as G.names should be named according to the filename specified in the respective arguments. If specified, these files must have extensions and be coded in one of the three supported file types. However, these files will be automatically loaded by yourprep() if they are saved in the dpath and labeled with the tag specified by the user. The default names for these files must be used (e.g., "G.names" and "adjacency"). For example, if the tag is left as the default and there is a file in the dpath labeled "csid.G.names.txt", yourprep() will load this automatically and save the input as the G.names element of the "dataobj" list. yourprep() arguments such as G.names take precedence over "TAG.*" files in thedpath.
可选的辅助文件，如G.names应该被命名为根据的各个参数中指定的文件名。如果指定，这些文件必须有扩展和编码支持的文件类型之一。但是，这些文件将被自动加载yourprep()，如果它们被保存在dpath由用户指定的标记和标记。必须使用这些文件的默认名称（例如，“G.names”和“邻接”）。例如，如果tag保留为默认设置，有一个dpath标有“csid.G.names.txt，yourprep()会加载这个自动和保存文件输入为G.names元素dataobj“列表。 yourprep()参数，如G.names优先TAG.*文件，中的dpath。

yourprep() also includes a lagging utility (activated once one specifies a lag length with the "lag" argument). This utility is useful for when the data in each cross section is, for example, the response and covariates for 50 year olds in each year but the desired content for each cross section is the response for 50 year olds and the covariates for 25 year olds 25 years prior to each year (implying a lag of 25 years). In order to have yourprep() perform this lagging automatically, include cross sections for each age group with data starting the same number of years before the first observation year as the requested lag period. Thus if lag=25 and the first observation year is 1950, then the cross sections should all start at 1925. Age groups younger than the length of the lag will not retain covariate data (except perhaps an "index" variable) in the output object. The covariates lagged are the predictor variables specified in the formula argument.
yourprep()还包括一个滞后的实用程序（激活一次的“滞后”参数指定的滞后长度）。每个横截面数据时，在此实用程序是非常有用的，例如，50岁的儿童中，每年所需的内容，每个横截面的响应和协变量的响应，50岁的和25岁的协变量25年之前每年（这意味着一个滞后25岁）。为了有yourprep()自动执行这个落后的，包括各年龄组的截面数据开始之前，先观察一年，同样的时间要求的滞后期。因此，如果lag=25和观察第一年是1950年，然后，截面应开始于1925年。的滞后长度比年轻的年龄组将不会保留在输出对象的协变量数据（也许除了一个“索引”变量）。滞后协变量的公式参数中指定的预测变量。

If data for a cohort 25 years (in this case) younger is not available for some cohort over age 25, yourprep() will look for the closest cohort available and issue a warning message.
如果小数据队列25年（在这种情况下）不适用于一些25岁以上的人群，yourprep()寻找最接近的队列，并发出警告消息。

值----------Value----------

参数：dataobj
A list with several components:
有几个组件的列表：

dataA list with the cross-sectional data matrices as elements.
DATAA列表与截面的数据作为元素的矩阵。

proximityA symmetric matrix (geographic region by geographic region) of proximity scores for geographic smoothing used by the "map" and "bayes" methods. The larger each element of the matrix, the more proximate that pair of countries is in the prior; a zero element means the two geographic areas are unrelated (the diagonal is ignored). Each element of the symmetric matrix is created from one row of the proximity input to yourprep() (which is two country codes and a proximity score).
proximityA对称矩阵（按GEO区域划分的GEO区域）接近分数GEO所使用的“图”和“贝叶斯方法的平滑。较大的每个元素的矩阵，对国家更接近是在现有的零元素意味着两个GEO区域是无关的（对角线被忽略）。创建的对称矩阵的每个元素的接近输入从一行到yourprep()（这是两个国家码和的接近得分）。

G.names, A.names, T.namesOptional two-column dataframes that list all valid numerical codes (in the first column, labeled codes) and corresponding alphanumeric names (optionally in the second column, labeled name) for the indices corresponding to the geographic areas in G.names, age groups in A.names, and time periods in T.names.
g.names，A.names，T.namesOptional两列dataframes的列表全部有效的数值代码（在第一列中，标记的代码）和对应于所述的相应的字母数字名称（任选地在第二列中，标记的名称）的指数GEO区域G.names，A.names，和时间段在T.names年龄组的。

index.codeA string indicating how the index variable is coded in the input data. </table>
index.codeA字符串的索引变量是如何被编码在输入数据。 </ TABLE>

（作者）----------Author(s)----------

Jon Bischof <a href="mailto:jbischof@fas.harvard.edu">jbischof@fas.harvard.edu</a>

参考文献----------References----------

<h3>See Also</h3>  <code>yourcast</code> function and documentation

实例----------Examples----------

## Not run: [＃不运行：]
# Working directory automatically set to directory with cross[自动设置工作目录到目录的交叉]
# section and auxiliary files to begin. Files for this example[部分和辅助文件的开始。在这个例子中的文件]
# in 'data' folder of YourCast library.[在“数据”文件夹中的YourCast库。]

#Old working directory to be restored later[以后要恢复旧的工作目录]
oldwd <- getwd()
# Now setting wd to 'data' folder in YourCast library[设置WD“数据”文件夹中YourCast库]
setwd(system.file("data",package="YourCast"))

# Simple run of the function, using option that turns year variable[简单的运行的功能，使用选项，将今年的变量]
# into label in each cs. Use sample.frame argument for all diagnostics[在每个CS为标签。使用sample.frame所有诊断参数]
# to work[工作]

dta <- yourprep(G.names="cntry.codes.txt",adjacency="adjacency.txt",
year.var=TRUE,verbose=TRUE,sample.frame=c(1950,2000,2001,2030))

# With summary output (means of variables in each cross section) [输出的摘要（指在各断面的变量）]

dta <- yourprep(G.names="cntry.codes.txt",adjacency="adjacency.txt",
year.var=TRUE,summary=TRUE)

# Function can also add datafiles already loaded into R as objects in[功能还可以添加数据文件已经加载到R作为对象]
# the workspace with "datalist" option if put into a list and properly[“DataList控件”选项，如果投入的列表和正确的工作区]
# labeled. All diagnostics still performed [标记。所有诊断仍表现]
# 'csid204545', etc., are dataframes in workspace[“csid204545”，等等，都是在工作区中的dataframes]

# Labels changed to nonsense ones so as not to confuse with other files[改变废话的标签，以免混淆与其他文件]

data(csid204545)
data(csid204550)
data(csid204555)

datalist <- list("123456"=csid204545,"234567"=csid204550,
"345678"=csid204555)

# Verbose option turned on and datalist argument added [详细选项打开的情况和DataList控件参数]

dta <- yourprep(G.names="cntry.codes.txt",adjacency="adjacency.txt",
year.var=TRUE,verbose=TRUE,datalist=datalist)

# Setting working directory back[设置工作目录]
setwd(oldwd)
rm(oldwd)

## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册