R语言 RODM包 RODM-package()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:50:16

RODM-package(RODM)
RODM-package()所属R语言包：RODM

 RODM: An Interface to Oracle Data Mining
 RODM：Oracle数据挖掘接口

 译者：生物统计家园网机器人LoveR

描述----------Description----------

Oracle Data Mining (ODM) is an option of Oracle's Relational Database Management System (RDBMS) Enterprise Edition (EE). It contains several data mining and data analysis algorithms for classification, prediction, regression, clustering, associations, feature selection, anomaly detection, feature extraction, and specialized analytics. It provides means for the creation, management and operational deployment of data mining models inside the database environment.
Oracle数据挖掘（ODM）是Oracle的关系数据库管理系统（RDBMS），企业版（EE）的选项。它包含几个数据挖掘和数据分析算法，分类，预测，回归，聚类，关联，特征选择，异常检测，特征提取和专业分析。它提供了用于创建，管理和运营的数据挖掘模型的数据库环境内部署。

RODM is an interface that provides a powerful environment for prototyping data analysis and data mining methodologies. It facilitates the prototyping of vertical applications and makes ODM and the RDBMS environment easily accessible to statisticians and data analysts familiar with R but not experts in SQL. In this way it provides an ideal environment for demos and proof of concept studies. It also facilitates the benchmarking and testing of functionality including statistics and graphics of performance metrics and enables the scripting and control of production data mining methodologies from a high-level environment.
RODM接口，提供了一个功能强大的环境为原型的数据分析和数据挖掘的方法。它促进了原型的垂直应用，使ODM和RDBMS环境，方便统计人员和数据分析的熟悉与R，但不是SQL专家。在这种方式中，演示和概念研究证明，它提供了一个理想的环境。这也促进了基准和测试功能，包括统计数据和图形性能指标，使脚本和控制生产数据挖掘方法，从一个高层次的环境。

Details

详细信息----------Details----------

</table>
</ TABLE>

RODM is a package that provides access to Oracle's in-database data mining functionality.
RODM包，提供了访问Oracle的数据库中的数据挖掘功能。

Requirements RODM requires the use of an Oracle release 11g database. If you don't have an installed Oracle database in place and need to install one from scratch we stronly recommend you follow the guidelines in the Oracle Data Mining Administrator's Guide. RODM requires R release > 2.10.1.
要求参考RODM需要使用的Oracle发布11g数据库。如果你没有安装Oracle数据库在需要安装一个从头开始，我们stronly建议您在Oracle数据挖掘管理员指南遵循的准则。 RODM需要ŕ版本2.10.1。

Connecting to an Oracle database: RODM_open_dbms_connection RODM_close_dbms_connection 
连接到Oracle数据库： RODM_open_dbms_connection的 RODM_close_dbms_connection 

The above routines are used to establish a connection to an Oracle 11g database using ODBC. RODM uses the RODBC package as a means to manage the database connection. A data source name must be provided, as well as a username and password. The user that is connecting needs sufficient privileges for performing mining operations in the database. We have tested RODM using the Oracle ODBC driver that comes with the Oracle RDBMS. We recommend the use of this ODBC driver instead of others.
上述例程用于建立一个连接到Oracle 11g数据库使用ODBC。，RODM使用RODBC包的手段来管理数据库的连接。的数据源名称，以及用户名和密码，必须提供。连接的用户需要有足够的权限在数据库中进行采矿作业。我们已经测试了RODM使用Oracle ODBC驱动程序与Oracle RDBMS。我们建议使用此ODBC驱动程序，而不是别人的。

Pushing data to the database: RODM_create_dbms_table RODM_drop_dbms_table 
数据推送到数据库： RODM_create_dbms_table的 RODM_drop_dbms_table 

Once a valid database connection has been established, in-database mining can begin. If the data to be mined exists within R (e.g., in a data frame), it first needs to be pushed to the database and placed in a table. The above routines leverage the RODBC package to push data to a database table, which can then be accessed for mining by ODM.
一旦一个有效的数据库连接已经建立，数据库内挖掘可以开始了。如果存在R（例如，在一个数据框）内的数据以被开采时，它首先需要推到数据库中，并放置在一个表中。上述例程将数据推到一个数据库表，然后可以访问开采的ODM利用RODBC包。

Auxilliary functions (for internal use): RODM_store_settings RODM_create_model 
辅助函数（内部使用）：参考RODM_store_settings参考RODM_create_model参考

The above routines are used under-the-covers when building ODM models. They do not need to be invoked directly. They are present merely to improve maintainability and modularity.
ODM模式下使用盖时，上述例程。它们不需要被直接调用。它们存在仅仅是为了提高可维护性和模块化。

Building ODM models: RODM_create_ai_model RODM_create_assoc_model RODM_create_dt_model RODM_create_glm_model RODM_create_kmeans_model RODM_create_nb_model RODM_create_oc_model RODM_create_svm_model 
建立ODM模式：参考RODM_create_ai_model参考RODM_create_assoc_model参考RODM_create_dt_model参考RODM_create_glm_model参考RODM_create_kmeans_model参考RODM_create_nb_model参考RODM_create_oc_model参考RODM_create_svm_model参考

The above nine routines are used to build ODM models in the database. They share many of the same arguments. All of these routines require a database connection (as retrieved via RODM_open_dbms_connection) and a table/view in the database (either pre-existing or created via RODM_create_dbms_table) which will provide the training data. All routines accept a case identifier column name. This is the name of a column which can be used to uniquely identify a training record. Most routines do not need a case identifier, but some may provide extra information if one is present (e.g., cluster assignments). All supervised algorithms require a target column name. A model name can be specified (or defaults to an algorithm-specific model name). When created, the model will exist in Oracle as a database schema object. Most algorithms accept a parameter to direct ODM to enable automatic data preparation (default TRUE). This feature will request that ODM prepare data as befitting individual algorithm needs (e.g., outlier treatment, binning, normalization, missing value imputation). Many algorithms accept a number of expert settings. These expert settings will differ from algorithm to algorithm, and ODM is designed to identify values for these settings without user input, hence they do not need to be specified by the user in most situations. When the models are created in the database, information regarding the models can be retrieved and returned to the R environment. The retrieve_outputs_to_R parameter tells RODM whether or not this information should be pulled back into R for further analysis in R. As these models are database schema objects, they can be left in the database for future use. They can be applied to new data as desired. The default behavior is to leave the models in the database, but they can be automatically cleaned up by changing the leave_model_in_dbms parameter. If a model with the same name already exists in the database schema when another is being created, the previous model will be automatically dropped. Finally, the RODM package is envisioned as a quick proof of concept mechanism, with the potential of deploying the resulting methodology wholly within Oracle. As such, it is necessary to capture the SQL that would be used in the database. The sql.log.file parameter can be used to have RODM produce a file with the relevant SQL statements that comprise the work being performed.
上述9个例程用于在数据库中建立ODM模式。他们有着许多相同的参数。所有这些程序都需要一个数据库连接（检索通过RODM_open_dbms_connection），和数据库中的表/视图（无论是预先存在的，或通过RODM_create_dbms_table），将提供训练数据。所有例程受理情况的标识符字段名。这是一个可用于唯一地标识一个训练记录的列的名称。大多数例程没有需要的情况下，标识符，但有些人可能提供额外的信息，如果存在的话（例如，聚类分配）。所有监控的算法需要一个目标列名。可以指定模型的名称（或默认的算法，具体的型号名称）。在创建时，该模型将存在于甲骨文作为数据库模式对象。大部分算法接受一个参数，直接ODM启用自动数据准备（默认为true）。此功能要求，的ODM准备作为需要相称个人算法（如离群值处理，分级，规范化，失踪值归集）。许多算法接受一些专家设置的。这些专家设置将不同于从算法的算法，和ODM设计没有用户输入的情况下，以确定这些设置的值，因此它们并不需要在大多数情况下，由用户指定。当在数据库中创建的模型，与模型相关的信息可以被检索和返回到R环境。 retrieve_outputs_to_R参数告诉RODM这种信息是否应被拉回到R R.进一步分析这些模型数据库模式对象，他们可以留在数据库中，以备将来使用。它们可以应用到新的数据作为所需的。默认的行为是留在数据库中的模型，但它们可以被自动地清理通过改变leave_model_in_dbms参数。如果模型中已经存在具有相同的名称时，另一个正在创建的数据库模式，以前的模式将被自动删除。最后，RODM包作为一种快速的概念证明机制的设想，完全在Oracle部署方法的潜力。因此，它是必要的，将用于在数据库中捕捉到的SQL。 sql.log.file参数可用于RODM产生的文件与相关的SQL语句，包括所执行的工作。

Further operations involving ODM models: RODM_list_dbms_models RODM_apply_model RODM_drop_model 
进一步的操作涉及ODM模式： RODM_list_dbms_models的参考RODM_apply_model参考RODM_drop_model参考

Finally, there are a few routines involving ODM models once they are built. The list of accessible ODM models can be retrieved, and individual models can be dropped. Models (other than those used for Attribute Importance and Associations) can be applied to new data in the database. Regression models will produce the expected value given the input variables. Classification models will produce the probability distribution across target classes for each case, as well as a column indicating the winning class. Clustering models will produce the probability distribution across clusters for each case, as well as a column indicating the most likely cluster assignment. In all cases, additional columns from the test/apply dataset can be included in the output via the supplemental_cols parameter. It is necessary to provide some information here if there is a desire to link the results back to the original data. Either a case identifier should be provided, or the list of columns which will yield sufficient information for future analysis.
最后，有几个例程，涉及ODM模式一旦被建立。访问ODM模式的列表可以被检索到，个别机型可以被丢弃。模型（用于属性的重要性和协会以外的）可以应用到新的数据库中的数据。给定的输入变量回归模型会产生预期的价值。分类型看，会产生跨越目标类的概率分布为每一种情况下，以及一个列表示的优胜班级。聚类模型会产生多个聚类的每一种情况下的概率分布，以及一列表示最有可能的簇分配。在所有的情况下，额外的列从试验/应用数据集可以包括在输出经由supplemental_cols参数。有必要提供一些信息，在这里，如果有一个愿望返回到原始数据的结果链接。无论是一个标识符的情况下，应提供，或列表的列，这将产生足够的信息，供日后分析。

（作者）----------Author(s)----------

Pablo Tamayo <a href="mailto:pablo.tamayo@oracle.com">pablo.tamayo@oracle.com</a>

Ari Mozes <a href="mailto:ari.mozes@oracle.com">ari.mozes@oracle.com</a>

Maintainer: Pablo Tamayo <a href="mailto:pablo.tamayo@oracle.com">pablo.tamayo@oracle.com</a>

参考文献----------References----------

Oracle Data Mining Concepts 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/toc.htm
Oracle Data Mining Application Developer's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28131/toc.htm
Oracle Data Mining Administrator's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28130/toc.htm
Oracle Database PL/SQL Packages and Types Reference 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_datmin.htm#ARPLS192
Oracle Database SQL Language Reference (Data Mining functions) 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm#SQLRF20030
Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification (2nd Edition). John Wiley & Sons 2001.
Wikipedia entry for Oracle Data Mining. http://en.wikipedia.org/wiki/Oracle_Data_Mining
P. Tamayo, C. Berger, M. M. Campos, J. S. Yarmus, B. L.Milenova, A. Mozes, M. Taft, M. Hornick, R. Krishnan, S.Thomas, M. Kelly, D. Mukhin, R. Haberstroh, S. Stephens and J. Myczkowski. Oracle Data Mining - Data Mining in the Database Environment. In Part VII of Data Mining and Knowledge Discovery Handbook, Maimon, O.; Rokach, L. (Eds.) 2005, p315-1329, ISBN-10: 0-387-24435-2.
Oracle Data Mining: Mining Gold from Your Warehouse, (Oracle In-Focus series), by Dr. Carolyn Hamm.
转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册