# OECD Discriminatory family code data analysis 8 - Comparing some classification methods. Photo by Marek Piwnicki on Unsplash

This post is following of above post.

In this post, I will do some classification methods.

Firstly, I make binary variable. I made a binary variable named high, that shows 1 when lpc_gdp is hgher than the mdeian and 0 when lower than median.

Before making models, I load caret package/

Then, I make LPM: Linear Probability Model, which is liear regression model with OLS estimate. "atwm": Attitides Towards Working Mother is only significant variable.

Then, make a prediction and calculating how much prediction is correct.  Accuracy is 0.7941. This means 79.4% is correctly predicted.

Next, I use SVM: Support Vector Machine. I use e1071 package svm() function. with SVM model, what is accuracy?  Accuracy is 0.8529. So SVM is betther than LPM.

Next, I use GAM: Generalized Additive Model. I use gam() function in mgcv package. In GAM, s(atwm) and s(l_inf) are significant variables.

Let's plot GAM model.  The plot shows s(em) is not needed s().
So, let' make another GAM model, this time s(l_unem). plot GAM2  Let's make predictions with GAM model Let's get accuracy og GAM. Accuracy us 0.9118.  Accuracy is 0.8824. So, the fist GAM model is better.

Next, I use tree model.  I use tree() function in tree package. In this model, "atwm" is the most important variable and l_inf is next. Others are not important.

Let's get accuracy of tree model.  Accuracy is 0.8235.

Next, I use k-NN model. I use knn3() function in caret package.

Let's get accuracy of knn.  Accuracy is 0.7353.

Then, let's compare those methods accuracy.  GAM model has the highest accuracy.
That's it. Thank you!