R语言 RODM包 RODM_create_dt_model()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:51:06

RODM_create_dt_model(RODM)
RODM_create_dt_model()所属R语言包：RODM

                                    Create a Decision Tree (DT) model
                                       创建一个决策树模型（DT）

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function creates a Decision tree (DT).
这个函数创建一个决策树（DT）。

用法----------Usage----------

RODM_create_dt_model(database,
                  data_table_name,
                  case_id_column_name = NULL,
                  target_column_name,
                  model_name = "DT_MODEL",
                  auto_data_prep = TRUE,
                  cost_matrix = NULL,
                  gini_impurity_metric = TRUE,
                  max_depth = NULL,
                  minrec_split = NULL,
                  minpct_split = NULL,
                  minrec_node = NULL,
                  minpct_node = NULL,
                  retrieve_outputs_to_R = TRUE,
                  leave_model_in_dbms = TRUE,
                  sql.log.file = NULL)

参数----------Arguments----------

参数：database
Database ODBC channel identifier returned from a call to RODM_open_dbms_connection
数据库的ODBC通道标识符返回调用RODM_open_dbms_connection

参数：data_table_name
Database table/view containing the training dataset.
数据库表/视图包含训练数据集。

参数：case_id_column_name
Row unique case identifier in data_table_name.
行独特的标识符的data_table_name。

参数：target_column_name
Target column name in data_table_name.
目标列名data_table_name。

参数：model_name
ODM Model name.
ODM产品型号名称。

参数：auto_data_prep
Whether or not ODM should invoke automatic data preparation for the build.
无论ODM应该调用自动构建数据准备。

参数：cost_matrix
User-specified cost matrix for the target classes.
用户指定的成本矩阵为目标类。

参数：gini_impurity_metric
Tree impurity metric: "IMPURITY_GINI" (default) or "IMPURITY_ENTROPY"
树杂质指标：的“IMPURITY_GINI”（默认）或“IMPURITY_ENTROPY”

参数：max_depth
Specifies the maximum depth of the tree, from root to leaf inclusive.  The default is 7.
指定的最大深度的树，从根到叶的包容性。默认值是7。

参数：minrec_split
Specifies the minimum number of cases required in a node in order  for a further split to be possible. Default is 20.
指定为了进一步分割是可能的，在一个节点中所需的最小数目的情况下。默认值是20。

参数：minpct_split
Specifies the minimum number of cases required in a node in order for  a further split to be possible. Expressed as a percentage of all the rows  in the training data. The default is 1 (1 per cent).
指定为了进一步分割是可能的，在一个节点中所需的最小数目的情况下。在训练数据中的所有行的百分比表示。默认值是1（1％）。

参数：minrec_node
Specifies the minimum number of cases required in a child node.  Default is 10.
指定在子节点所需的最小数目的情况下。默认值是10。

参数：minpct_node
Specifies the minimum number of cases required in a child node, expressed  as a percentage of the rows in the training data. The default is 0.05 (.05 per cent).
指定在一个子节点所需的最小数目的情况下，表示为在训练数据中的行的百分比。默认值是0.05（0.05％）。

参数：retrieve_outputs_to_R
Flag controlling if the output results are moved to the R environment.
船籍控制，如果输出的结果被移动到R环境。

参数：leave_model_in_dbms
Flag controlling if the model is deleted or left in RDBMS.
如果模型被删除或留在RDBMS标志控制。

参数：sql.log.file
File where to append the log of all the SQL calls made by this function.
文件中追加的log所有的SQL调用此功能。

Details

详细信息----------Details----------

The Decision Tree algorithm produces accurate and interpretable models with relatively little user  intervention and can be used for both binary and multiclass classification problems. The algorithm  is fast, both at build time and apply time. The build process for Decision Tree is parallelized.  Decision tree scoring is especially fast. The tree structure, created in the model build, is used  for a series of simple tests. Each test is based on a single predictor. It is a membership test:  either IN or NOT IN a list of values (categorical predictor); or LESS THAN or EQUAL TO some value  (numeric predictor). The algorithm supports two homogeneity metrics, gini and entropy, for  calculating the splits.
决策树算法产生准确和解释的模型，以相对较少的用户干预，可以使用二进制和多类分类问题。该算法是快速的，无论是在建立时间和申请时间。决策树的生成过程的并行化。决策树的得分是特别快。树结构中，在构建模型创建，使用了一系列的简单测试。每个测试是基于一个单一的预测。这是一个成员测试：IN或NOT IN的值的列表（分类预测），或小于或等于一些值（数字预测）。该算法支持两种同质化的指标，基尼和熵的计算分割。

For more details on the algotithm implementation, parameters settings and  characteristics of the ODM function itself consult the following Oracle documents: ODM Concepts,  ODM Developer's Guide, Oracle SQL Packages: Data Mining, and Oracle Database SQL Language  Reference (Data Mining functions), listed in the references below.
有关algotithm的实现的详细信息，参数设置和的ODM函数本身的特性请咨询以下Oracle文档：ODM的概念，ODM开发的指南，Oracle的SQL套件：数据挖掘，和甲骨文数据库SQL语言参考（数据挖掘功能），上市在下面的参考资料。

值----------Value----------

If retrieve_outputs_to_R is TRUE, returns a list with the following elements: <table summary="R valueblock"> <tr valign="top"><td>model.model_settings</td> <td> Table of settings used to build the model.</td></tr> <tr valign="top"><td>model.model_attributes</td> <td> Table of attributes used to build the model.</td></tr> <tr valign="top"><td>dt.distributions</td> <td> Target class disctributions at each tree node.</td></tr> <tr valign="top"><td>dt.nodes</td> <td> Node summary information.</td></tr> </table>
如果retrieve_outputs_to_R是TRUE，返回一个列表，包含下列元素：<table summary="R valueblock"> <tr valign="top"> <TD> model.model_settings</ TD> <TD>表，用来设置建立模型。</ TD> </ TR> <tr valign="top"> <TD> model.model_attributes</ TD> <TD>表用于建立模型的属性。</ TD> </ TR> <tr valign="top"> <TD> dt.distributions </ TD> <TD>目标类disctributions在每个树节点。</ TD> </ TR> <tr valign="top"> < dt.nodes TD> </ TD> <td>节点的摘要信息。</ TD> </ TR> </ TABLE>

（作者）----------Author(s)----------

Pablo Tamayo <a href="mailto:pablo.tamayo@oracle.com">pablo.tamayo@oracle.com</a>

Ari Mozes <a href="mailto:ari.mozes@oracle.com">ari.mozes@oracle.com</a>

参考文献----------References----------

Oracle Data Mining Concepts 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/toc.htm
Oracle Data Mining Application Developer's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28131/toc.htm
Oracle Data Mining Administrator's Guide 11g Release 1 (11.1)  http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28130/toc.htm
Oracle Database PL/SQL Packages and Types Reference 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_datmin.htm#ARPLS192
Oracle Database SQL Language Reference (Data Mining functions) 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm#SQLRF20030

参见----------See Also----------

RODM_apply_model,
RODM_apply_model，

实例----------Examples----------

## Not run: [＃不运行：]
DB <- RODM_open_dbms_connection(dsn="orcl11g", uid= "rodm", pwd = "rodm")

# Predicting survival in the sinking of the Titanic based on pasenger's sex, age, class, etc.[在泰坦尼克号沉没的基础上pasenger的性别，年龄，阶级，等预测生存]
data(titanic3, package="PASWR")                                           # Load survival data from Titanic[加载生存“泰坦尼克”]
ds <- titanic3[,c("pclass", "survived", "sex", "age", "fare", "embarked")]  # Select subset of attributes[选择的属性子集]
ds[,"survived"] <- ifelse(ds[,"survived"] == 1, "Yes", "No")             # Rename target values[重命名目标值]
n.rows <- length(ds[,1])                                                 # Number of rows[行数]
random_sample <- sample(1:n.rows, ceiling(n.rows/2)) # Split dataset randomly in train/test subsets[随机拆分数据集火车/测试子集]
titanic_train <- ds[random_sample,]                      # Training set[训练集]
titanic_test <-  ds[setdiff(1:n.rows, random_sample),]    # Test set[测试集]
RODM_create_dbms_table(DB, "titanic_train") # Push the training table to the database[推训练表到数据库]
RODM_create_dbms_table(DB, "titanic_test") # Push the testing table to the database[将测试表到数据库]

dt <- RODM_create_dt_model(database = DB, # Create ODM DT classification model[ODM DT分类模型]
                           data_table_name = "titanic_train",
                           target_column_name = "survived",
                           model_name = "DT_MODEL")

dt2 <- RODM_apply_model(database = DB, # Predict test data[预测测试数据]
                     data_table_name = "titanic_test",
                     model_name = "DT_MODEL",
                     supplemental_cols = "survived")

print(dt2$model.apply.results[1:10,])                               # Print example of prediction results[打印示例的预测结果]
actual <- dt2$model.apply.results[, "SURVIVED"]
predicted <- dt2$model.apply.results[, "PREDICTION"]
probs <- as.real(as.character(dt2$model.apply.results[, "'Yes'"]))
table(actual, predicted, dnn = c("Actual", "Predicted"))             # Confusion matrix[混淆矩阵]
library(verification)
perf.auc <- roc.area(ifelse(actual == "Yes", 1, 0), probs)          # Compute ROC and plot[计算ROC和图]
auc.roc <- signif(perf.auc$A, digits=3)
auc.roc.p <- signif(perf.auc$p.value, digits=3)
roc.plot(ifelse(actual == "Yes", 1, 0), probs, binormal=T, plot="both", xlab="False Positive Rate",
      ylab="True Postive Rate", main= "Titanic survival ODM DT model ROC Curve")
text(0.7, 0.4, labels= paste("AUC ROC:", signif(perf.auc$A, digits=3)))
text(0.7, 0.3, labels= paste("p-value:", signif(perf.auc$p.value, digits=3)))

dt       # look at the model details[在模型的详细信息]

RODM_drop_model(DB, "DT_MODEL")          # Drop the model[掉落的模型]
RODM_drop_dbms_table(DB, "titanic_train") # Drop the database table[删除数据库中的表]
RODM_drop_dbms_table(DB, "titanic_test") # Drop the database table[删除数据库中的表]

RODM_close_dbms_connection(DB)

## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 RODM包 RODM_create_dt_model()函数中文帮助文档(中英文对照)

浏览过的版块