R语言 RODM包 RODM_apply_model()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:50:26

RODM_apply_model(RODM)
RODM_apply_model()所属R语言包：RODM

                                    Apply an Oracle Data Mining model
                                       应用Oracle数据挖掘模型

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function applies a previously created ODM model to score new data.
此功能适用于先前创建的ODM模式，取得新的数据。

用法----------Usage----------

RODM_apply_model(database,
               data_table_name,
               model_name,
               supplemental_cols,
               sql.log.file = NULL)

参数----------Arguments----------

参数：database
Database ODBC channel identifier returned from a call to RODM_open_dbms_connection
数据库的ODBC通道标识符返回调用RODM_open_dbms_connection

参数：data_table_name
Database table/view containing the training dataset.
数据库表/视图包含训练数据集。

参数：model_name
ODM Model name
ODM产品型号名称

参数：supplemental_cols
Columns to carry over into the output result.
列进行到输出结果。

参数：sql.log.file
File to append the log of all the SQL calls made by this function.
文件追加记录所有的SQL调用此功能。

Details

详细信息----------Details----------

This function applies a previously created ODM model to score new data. The supplemental_cols parameter should be assigned in such a way as to retain the connection between the scores and the original cases.  The simplest way to do this is to include a unique case identifier in the list, which provides the ability to identify the original row information for a score.  If only some of the information from the original data is needed (for example, only the actual target value is needed when computing a measure of accuracy), then it is only this information which should be identified by the supplemental columns.
此功能适用于先前创建的ODM模式，取得新的数据。应该被分配supplemental_cols参数以这样的方式保留之间的分数和原来的情况下的连接。要做到这一点最简单的方法，包括一个独特的情况下，在列表中的标识符，它提供了识别能力，原来的行信息的得分。如果只从原始数据中的信息中的一些是必需的（例如，只有实际的目标值时，需要计算出测量的精度），则它是仅由补充列应查明此信息。

值----------Value----------

A list with the following components:
以下组件列表：

<table summary="R valueblock"> <tr valign="top"><td>model.apply_results</td> <td> A data frame table containing: For classification: class 1 probability    numeric/double ...         ... class N probability numeric/double supplemental column 1 ... supplemental column M prediction
<table summary="R valueblock"> <tr valign="top"> <TD> model.apply_results</ TD> <TD>的数据框表，内容包括：分类：类1的概率数字/双... ... N级的概率数字/双补充塔1 ...补充M列预测

For regression: supplemental column 1 ... supplemental column M prediction    numeric/double
对于回归的补充第1列...补充M列预测数字/双

For anomaly detection (e.g. one-class SVM): class 1 probability numeric/integer (class 1 is the typical class) class 0 probability numeric/integer (class 0 is the outlier class) supplemental column 1 ... supplemental column M prediction    integer: 0 or 1
异常检测（例如，一类SVM）等级：1级概率数字/整数（1级是典型的类）类数字/ 0的概率整数（0级是离群类）补充第1列...补充M列预测的整数：0或1

For clustering: leaf cluster 1 probability    numeric/double ... ... leaf cluster N probability numeric/double supplemental column 1 ... supplemental column M cluster_id       </td></tr> </table>
对于聚类：叶簇的概率数字/双... ...叶簇Ň的概率数字/双补充塔1 ...补充M列CLUSTER_ID </ TD> </ TR> </ TABLE>

（作者）----------Author(s)----------

Pablo Tamayo <a href="mailto:pablo.tamayo@oracle.com">pablo.tamayo@oracle.com</a>

Ari Mozes <a href="mailto:ari.mozes@oracle.com">ari.mozes@oracle.com</a>

参考文献----------References----------

Oracle Data Mining Concepts 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/toc.htm
Oracle Data Mining Application Developer's Guide 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28131/toc.htm
Oracle Data Mining Administrator's Guide 11g Release 1 (11.1)  http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28130/toc.htm
Oracle Database PL/SQL Packages and Types Reference 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_datmin.htm#ARPLS192
Oracle Database SQL Language Reference (Data Mining functions) 11g Release 1 (11.1) http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm#SQLRF20030

参见----------See Also----------

RODM_create_svm_model,  RODM_create_kmeans_model,  RODM_create_oc_model,  RODM_create_nb_model,  RODM_create_glm_model,
RODM_create_svm_model，RODM_create_kmeans_model，RODM_create_oc_model，RODM_create_nb_model，RODM_create_glm_model，

实例----------Examples----------

## Not run: [＃不运行：]
DB <- RODM_open_dbms_connection(dsn="orcl11g", uid= "rodm", pwd = "rodm")

### Classification[＃＃分类]

# Predicting survival in the sinking of the Titanic based on pasenger's sex, age, class, etc.[在泰坦尼克号沉没的基础上pasenger的性别，年龄，阶级，等预测生存]

data(titanic3, package="PASWR")                                           # Load survival data from Titanic[加载生存“泰坦尼克”]
ds <- titanic3[,c("pclass", "survived", "sex", "age", "fare", "embarked")]  # Select subset of attributes[选择的属性子集]
ds[,"survived"] <- ifelse(ds[,"survived"] == 1, "Yes", "No")             # Rename target values[重命名目标值]
n.rows <- length(ds[,1])                                                 # Number of rows[行数]
random_sample <- sample(1:n.rows, ceiling(n.rows/2)) # Split dataset randomly in train/test subsets[随机拆分数据集火车/测试子集]
titanic_train <- ds[random_sample,]                      # Training set[训练集]
titanic_test <-  ds[setdiff(1:n.rows, random_sample),]    # Test set[测试集]
RODM_create_dbms_table(DB, "titanic_train") # Push the training table to the database[推训练表到数据库]
RODM_create_dbms_table(DB, "titanic_test") # Push the testing table to the database[将测试表到数据库]
svm <- RODM_create_svm_model(database = DB, # Create ODM SVM classification model[ODM SVM分类模型]
                           data_table_name = "titanic_train",
                           target_column_name = "survived",
                           model_name = "SVM_MODEL",
                           mining_function = "classification")

# Apply the SVM classification model to test data.[应用SVM分类模型测试数据。]
svm2 <- RODM_apply_model(database = DB, # Predict test data[预测测试数据]
                     data_table_name = "titanic_test",
                     model_name = "SVM_MODEL",
                     supplemental_cols = "survived")

print(svm2$model.apply.results[1:10,])                               # Print example of prediction results[打印示例的预测结果]
actual <- svm2$model.apply.results[, "SURVIVED"]
predicted <- svm2$model.apply.results[, "PREDICTION"]
probs <- as.real(as.character(svm2$model.apply.results[, "'Yes'"]))
table(actual, predicted, dnn = c("Actual", "Predicted"))             # Confusion matrix[混淆矩阵]
library(verification)
perf.auc <- roc.area(ifelse(actual == "Yes", 1, 0), probs)          # Compute ROC and plot[计算ROC和图]
auc.roc <- signif(perf.auc$A, digits=3)
auc.roc.p <- signif(perf.auc$p.value, digits=3)
roc.plot(ifelse(actual == "Yes", 1, 0), probs, binormal=T, plot="both", xlab="False Positive Rate",
      ylab="True Postive Rate", main= "Titanic survival ODM SVM model ROC Curve")
text(0.7, 0.4, labels= paste("AUC ROC:", signif(perf.auc$A, digits=3)))
text(0.7, 0.3, labels= paste("p-value:", signif(perf.auc$p.value, digits=3)))

RODM_drop_model(DB, "SVM_MODEL")          # Drop the model[掉落的模型]
RODM_drop_dbms_table(DB, "titanic_train") # Drop the training table in the database[删除该培训数据库中的表]
RODM_drop_dbms_table(DB, "titanic_test") # Drop the testing table in the database[删除测试数据库中的表]

## End(Not run)[＃（不执行）]

### Regression[＃＃回归]

# Aproximating a one-dimensional non-linear function[Aproximating一个一维的非线性函数]

## Not run: [＃不运行：]
X1 <- 10 * runif(500) - 5
Y1 <- X1*cos(X1) + 2*runif(500)
ds <- data.frame(cbind(X1, Y1))
RODM_create_dbms_table(DB, "ds") # Push the table to the database[按下表的数据库]
svm <- RODM_create_svm_model(database = DB, # Create ODM SVM regression model[ODM SVM回归模型]
                           data_table_name = "ds",
                           target_column_name = "Y1",
                           model_name = "SVM_MODEL",
                           mining_function = "regression")

# Apply the SVM regression model to test data.[应用支持向量机回归模型的测试数据。]
svm2 <- RODM_apply_model(database = DB, # Predict training data[预测训练数据]
                     data_table_name = "ds",
                     model_name = "SVM_MODEL",
                     supplemental_cols = "X1")

plot(X1, Y1, pch=20, col="blue")
points(x=svm2$model.apply.results[, "X1"], svm2$model.apply.results[, "PREDICTION"], pch=20, col="red")
legend(-4, -1.5, legend = c("actual", "SVM regression"), pch = c(20, 20), col = c("blue", "red"),
            pt.bg =  c("blue", "red"), cex = 1.20, pt.cex=1.5, bty="n")

RODM_drop_model(DB, "SVM_MODEL")          # Drop the model[掉落的模型]
RODM_drop_dbms_table(DB, "ds")             # Drop the database table[删除数据库中的表]

## End(Not run)[＃（不执行）]

### Anomaly detection[＃＃异常检测]

# Finding outliers in a 2D-dimensional discrete distribution of points[寻找一个2D二维离散分布点中的离群值]

## Not run: [＃不运行：]
X1 <- c(rnorm(200, mean = 2, sd = 1), rnorm(300, mean = 8, sd = 2))
Y1 <- c(rnorm(200, mean = 2, sd = 1.5), rnorm(300, mean = 8, sd = 1.5))
ds <- data.frame(cbind(X1, Y1))
RODM_create_dbms_table(DB, "ds") # Push the table to the database[按下表的数据库]
svm <- RODM_create_svm_model(database = DB, # Create ODM SVM anomaly detection model[ODM SVM异常检测模型]
                           data_table_name = "ds",
                           target_column_name = NULL,
                           model_name = "SVM_MODEL",
                           mining_function = "anomaly_detection")

# Apply the SVM anomaly detection model to data.[应用支持向量机的异常检测模型的数据。]
svm2 <- RODM_apply_model(database = DB, # Predict training data[预测训练数据]
                     data_table_name = "ds",
                     model_name = "SVM_MODEL",
                     supplemental_cols = c("X1","Y1"))

plot(X1, Y1, pch=20, col="white")
col <- ifelse(svm2$model.apply.results[, "PREDICTION"] == 1, "green", "red")
for (i in 1:500) points(x=svm2$model.apply.results[i, "X1"],
                     y=svm2$model.apply.results[i, "Y1"],
                     col = col[i], pch=20)
legend(8, 2, legend = c("typical", "anomaly"), pch = c(20, 20), col = c("green", "red"),
            pt.bg =  c("green", "red"), cex = 1.20, pt.cex=1.5, bty="n")

RODM_drop_model(DB, "SVM_MODEL")          # Drop the model[掉落的模型]
RODM_drop_dbms_table(DB, "ds") # Drop the database table[删除数据库中的表]

## End(Not run)[＃（不执行）]

### Clustering [＃＃聚类]

# Clustering a 2D multi-Gaussian distribution of points into clusters[聚类一个二维多点高斯分布到聚类]

## Not run: [＃不运行：]
set.seed(seed=6218945)
X1 <- c(rnorm(100, mean = 2, sd = 1), rnorm(100, mean = 8, sd = 2), rnorm(100, mean = 5, sd = 0.6),
      rnorm(100, mean = 4, sd = 1), rnorm(100, mean = 10, sd = 1)) # Create and merge 5 Gaussian distributions[创建和合并高斯分布]
Y1 <- c(rnorm(100, mean = 1, sd = 2), rnorm(100, mean = 4, sd = 1.5), rnorm(100, mean = 6, sd = 0.5),
      rnorm(100, mean = 3, sd = 0.2), rnorm(100, mean = 2, sd = 1))
ds <- data.frame(cbind(X1, Y1))
n.rows <- length(ds[,1])                                                 # Number of rows[行数]
row.id <- matrix(seq(1, n.rows), nrow=n.rows, ncol=1, dimnames= list(NULL, c("ROW_ID"))) # Row id[行ID]
ds <- cbind(row.id, ds)                                                    # Add row id to dataset [添加行号数据集]
RODM_create_dbms_table(DB, "ds")
km <- RODM_create_kmeans_model(
database = DB,                # database ODBC channel identifier[数据库的ODBC通道标识符]
data_table_name = "ds",       # data frame containing the input dataset[数据框包含输入数据集]
case_id_column_name = "ROW_ID", # case id to enable assignments during build[情况编号，使任务在构建]
num_clusters = 5)

# Apply the K-Means clustering model to data.[应用的k-means聚类分析模型的数据。]
km2 <- RODM_apply_model(
database = DB,                # database ODBC channel identifier[数据库的ODBC通道标识符]
data_table_name = "ds",       # data frame containing the input dataset[数据框包含输入数据集]
model_name = "KM_MODEL",
supplemental_cols = c("X1","Y1"))

x1a <- km2$model.apply.results[, "X1"]
y1a <- km2$model.apply.results[, "Y1"]
clu <- km2$model.apply.results[, "CLUSTER_ID"]
c.numbers <- unique(as.numeric(clu))
c.assign <- match(clu, c.numbers)
color.map <- c("blue", "green", "red", "orange", "purple")
color <- color.map[c.assign]
nf <- layout(matrix(c(1, 2), 1, 2, byrow=T), widths = c(1, 1), heights = 1, respect = FALSE)
plot(x1a, y1a, pch=20, col=1, xlab="X1", ylab="Y1", main="Original Data Points")
plot(x1a, y1a, pch=20, type = "n", xlab="X1", ylab="Y1", main="After kmeans clustering")
for (i in 1:n.rows) {
points(x1a[i], y1a[i], col= color[i], pch=20)
}
legend(5, -0.5, legend=c("Cluster 1", "Cluster 2", "Cluster 3", "Cluster 4", "Cluster 5"), pch = rep(20, 5),
   col = color.map, pt.bg = color.map, cex = 0.8, pt.cex=1, bty="n")

RODM_drop_model(DB, "KM_MODEL")       # Drop the model[掉落的模型]
RODM_drop_dbms_table(DB, "ds")       # Drop the database table[删除数据库中的表]

RODM_close_dbms_connection(DB)

## End(Not run)[＃（不执行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册