Generated by Bing Image Creator: Picture with lots of red maple leaves
This post is following of the above post.
In the previous post, I made a dummy variable 'neg' which indicates whether a LOCATION has negative value or not.
In this post, let's do classification.
First, I make a sacatter plot for TIME and Valuse with coloring by 'neg'.
I divided df_main into two data frames, one is for training the other is testing.
Let's see summary statistics of the two data frames.
I see the both 'neg' mean is 0.3889 and TIME and Value have similar statistics. So I think df_main are well divided into two data frames.
All right, let's do classification. According to the above scatter plot, it seems not good to use linear model, I use tree model. I use 'tree' package.
I refered to R Tree Package | How does the Tree Package work? (educba.com)
First, I load 'tree' package.
Then, I use tree() function.
Then, I plot the results.
Let's predict with df_testing data.
So, let's see how the prediction is good or bad.
Above contingency table shows tree model correctly predict 108 + 77 = 185, wrongly predict 28 + 57 = 85.
So accuracy is 185 / (185 + 85) = 68.5%
If I predict df_testing has neg = 0 for all observations, accuracy is 1 - 0.3889 = 61.1%.
So, tree model is better than no prediction.
That's it. Thank you!
Next post is
To read from the first post,