UCI Machine Learning RepositoryのLetter Recognitionのデータの分析3 - RのrangerでA, B, Cを分類する。

www.crosshyou.info

の続きです。前回はglmnetのfamily = "multinomial"のモデルでA, B, Cを分類しました。

結果は、99.1%という正解率でした。

今回は、rangerパッケージでランダムフォレストのモデルで同じデータで予測してみます。

まずは、パッケージの読み込みをします。

モデルを構築します。

テスト用のデータで予測します。

確率をA, B, Cに変換します。

Confusion Matrixを作成して、正解率を計算します。

1つしか間違わなかったですね！正解率は99.9%です。ランダムフォレストは強力ですね。

今回は以上です。

はじめから読むには、

www.crosshyou.info

です。

今回のコードは以下になります。

#
# rangerの読み込み
library(ranger)
#
# モデル構築
ranger_model <- ranger(
formula = V1 ~ ., # 目的変数 ~ 説明変数
data = df[idx, ], # トレーニング用の行だけ
num.trees = 500, # 決定木の数
mtry = floor(sqrt(ncol(df) - 1)), # 説明変数の数
importance = "impurity", # 変数重要度の計算
probability = TRUE # 各クラスの確率を出力
)
#
# テスト用のデータで予測（確率）
pred_probs <- predict(ranger_model, data = df[-idx, ])$predictions
head(pred_probs)
#
# 最も確率が高いクラスを選択
pred_class <- colnames(pred_probs)[apply(pred_probs, 1, which.max)]
pred_class <- factor(pred_class, levels = levels(df$V1))
head(pred_class)
#
# Confusion Matrix
table(Predicted = pred_class, Actual = df[-idx, ]$V1)
(251 + 229 + 207) / nrow(df[-idx, ])
#

(冒頭の画像は Bing Image Creator で生成しました。プロンプトは Blue, Purple and Yellow colored Aconite flowers, blossoming on the wild grass field under the blue sky, photo です。)