R语言 RODM包 RODM_create_nb_model()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:51:47

RODM_create_nb_model(RODM)
RODM_create_nb_model()所属R语言包：RODM

                                    Create a Naive Bayes model
                                       创建一个Naive Bayes模型

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function creates an Oracle Data Mining Naive Bayes model.
这个函数创建一个Oracle数据挖掘Naive Bayes模型。

用法----------Usage----------

RODM_create_nb_model(database,
                  data_table_name,
                  case_id_column_name = NULL,
                  target_column_name,
                  model_name = "NB_MODEL",
                  auto_data_prep = TRUE,
                  class_priors = NULL,
                  retrieve_outputs_to_R = TRUE,
                  leave_model_in_dbms = TRUE,
                  sql.log.file = NULL)

参数----------Arguments----------

参数：database
Database ODBC channel identifier returned from a call to RODM_open_dbms_connection
数据库的ODBC通道标识符返回调用RODM_open_dbms_connection

参数：data_table_name
Database table/view containing the training dataset.
数据库表/视图包含训练数据集。

参数：case_id_column_name
Row unique case identifier in data_table_name.
行独特的标识符的data_table_name。

参数：target_column_name
Target column name in data_table_name.
目标列名data_table_name。

参数：model_name
ODM Model name.
ODM产品型号名称。

参数：auto_data_prep
Whether or not ODM should invoke automatic data preparation for the build.
无论ODM应该调用自动构建数据准备。

参数：class_priors
User-specified priors for the target classes.
用户指定的先验目标类。

参数：retrieve_outputs_to_R
Flag controlling if the output results are moved to the R environment.
船籍控制，如果输出的结果被移动到R环境。

参数：leave_model_in_dbms
Flag controlling if the model is deleted or left in RDBMS.
如果模型被删除或留在RDBMS标志控制。

参数：sql.log.file
File where to append the log of all the SQL calls made by this function.
文件中追加的log所有的SQL调用此功能。

Details

详细信息----------Details----------

Naive Bayes (NB) for classification makes predictions using Bayes' Theorem assuming that each attribute is conditionally independent of the others given a particular value of the target (Duda, Hart and Stork 2000). NB provides a very flexible general classifier for fast model building and scoring that can be used for both binary and multi-class classification problems.
朴素贝叶斯（NB）分类，使用贝叶斯定理假设每个属性是一个特定的价值目标（杜达，哈特和Stork 2000），其他有条件独立的预测。 NB提供了非常灵活的分类器进行快速建模和打分，可以使用二进制和多类分类问题。

For more details on the algotithm implementation, parameters settings and  characteristics of the ODM function itself consult the following Oracle documents: ODM Concepts,  ODM Application Developer's Guide, Oracle SQL Packages: Data Mining, and Oracle Database SQL Language  Reference (Data Mining functions), listed in the references below.
有关algotithm的实现的详细信息，参数设置和的ODM函数本身的特性请咨询以下Oracle文档：ODM的概念，ODM应用开发者的指南，Oracle的SQL套件：数据挖掘，和甲骨文数据库SQL语言参考的数据挖掘功能，列在下面的参考资料。

值----------Value----------

If retrieve_outputs_to_R is TRUE, returns a list with the following elements: <table summary="R valueblock"> <tr valign="top"><td>model.model_settings</td> <td> Table of settings used to build the model.</td></tr> <tr valign="top"><td>model.model_attributes</td> <td> Table of attributes used to build the model.</td></tr> <tr valign="top"><td>nb.conditionals</td> <td> Table of conditional probabilities.</td></tr> </table>
如果retrieve_outputs_to_R是TRUE，返回一个列表，包含下列元素：<table summary="R valueblock"> <tr valign="top"> <TD> model.model_settings</ TD> <TD>表，用来设置建立模型。</ TD> </ TR> <tr valign="top"> <TD> model.model_attributes</ TD> <TD>表用于建立模型的属性。</ TD> </ TR> <tr valign="top"> <TD> nb.conditionals </ TD>条件概率<TD>表。</ TD> </ TR> </ TABLE>

（作者）----------Author(s)----------

Pablo Tamayo <a href="mailto:pablo.tamayo@oracle.com">pablo.tamayo@oracle.com</a>

Ari Mozes <a href="mailto:ari.mozes@oracle.com">ari.mozes@oracle.com</a>

参考文献----------References----------

Oracle Data Mining Concepts 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/toc.htm
Oracle Data Mining Application Developer's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28131/toc.htm
Oracle Data Mining Administrator's Guide 11g Release 1 (11.1)  http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28130/toc.htm
Oracle Database PL/SQL Packages and Types Reference 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_datmin.htm#ARPLS192
Oracle Database SQL Language Reference (Data Mining functions) 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm#SQLRF20030

参见----------See Also----------

RODM_apply_model,
RODM_apply_model，

实例----------Examples----------

# Predicting survival in the sinking of the Titanic based on pasenger's sex, age, class, etc.[在泰坦尼克号沉没的基础上pasenger的性别，年龄，阶级，等预测生存]
## Not run: [＃不运行：]
DB <- RODM_open_dbms_connection(dsn="orcl11g", uid= "rodm", pwd = "rodm")

data(titanic3, package="PASWR")                                           # Load survival data from Titanic[加载生存“泰坦尼克”]
ds <- titanic3[,c("pclass", "survived", "sex", "age", "fare", "embarked")]  # Select subset of attributes[选择的属性子集]
ds[,"survived"] <- ifelse(ds[,"survived"] == 1, "Yes", "No")             # Rename target values[重命名目标值]
n.rows <- length(ds[,1])                                                 # Number of rows[行数]
set.seed(seed=6218945)
random_sample <- sample(1:n.rows, ceiling(n.rows/2)) # Split dataset randomly in train/test subsets[随机拆分数据集火车/测试子集]
titanic_train <- ds[random_sample,]                      # Training set[训练集]
titanic_test <-  ds[setdiff(1:n.rows, random_sample),]    # Test set[测试集]

RODM_create_dbms_table(DB, "titanic_train") # Push the training table to the database[推训练表到数据库]
RODM_create_dbms_table(DB, "titanic_test") # Push the testing table to the database[将测试表到数据库]

# If the target distribution does not reflect the actual distribution due[如果目标分布并不能反映实际分布]
# to specialized sampling, specify priors for the model[特殊的取样，指定先验的模型]
priors <- data.frame(
         target_value = c("Yes", "No"),
         prior_probability = c(0.1, 0.9))

# Create an ODM Naive Bayes model[创建ODM Naive Bayes模型]
nb <- RODM_create_nb_model(
database = DB,                   # Database ODBC channel identifier[数据库的ODBC通道标识符]
model_name = "titanic_nb_model", # ODM model name[ODM型号名称]
data_table_name = "titanic_train", # (in quotes) Data frame or database table containing the input dataset[（在引号）数据框或数据库表中输入数据集]
class_priors = priors,          # user-specified priors[用户指定的先验]
target_column_name = "survived") # Target column name in data_table_name [目标列名data_table_name]

# Predict test data using the Naive Bayes model[使用Naive Bayes模型预测测试数据]
nb2 <- RODM_apply_model(
database = DB,                   # Database ODBC channel identifier[数据库的ODBC通道标识符]
data_table_name = "titanic_test", # Database table containing the input dataset[数据库表中输入数据集]
model_name = "titanic_nb_model",  # ODM model name[ODM型号名称]
supplemental_cols = "survived") # Carry the target column to the output for analysis[目标进行列分析的输出]

# Compute contingency matrix, performance statistics and ROC curve[计算应变矩阵，性能统计和ROC曲线]
print(nb2$model.apply.results[1:10,])                               # Print example of prediction results[打印示例的预测结果]
actual <- nb2$model.apply.results[, "SURVIVED"]
predicted <- nb2$model.apply.results[, "PREDICTION"]
probs <- as.real(as.character(nb2$model.apply.results[, "'Yes'"]))
table(actual, predicted, dnn = c("Actual", "Predicted"))             # Confusion matrix[混淆矩阵]

library(verification)
perf.auc <- roc.area(ifelse(actual == "Yes", 1, 0), probs)          # Compute ROC and plot[计算ROC和图]
auc.roc <- signif(perf.auc$A, digits=3)
auc.roc.p <- signif(perf.auc$p.value, digits=3)
roc.plot(ifelse(actual == "Yes", 1, 0), probs, binormal=T, plot="both", xlab="False Positive Rate",
      ylab="True Postive Rate", main= "Titanic survival ODM NB model ROC Curve")
text(0.7, 0.4, labels= paste("AUC ROC:", signif(perf.auc$A, digits=3)))
text(0.7, 0.3, labels= paste("p-value:", signif(perf.auc$p.value, digits=3)))

nb       # look at the model details[在模型的详细信息]

RODM_drop_model(DB, "titanic_nb_model")    # Drop the model[掉落的模型]
RODM_drop_dbms_table(DB, "titanic_train") # Drop the training table in the database[删除数据库中的表的培训]
RODM_drop_dbms_table(DB, "titanic_test") # Drop the testing table in the database[删除测试数据库中的表]

RODM_close_dbms_connection(DB)

## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册