OECD Researchers data analysis 6 - Multiple Linear Regression in R

UnsplashEvi T.が撮影した写真 


This post is following of the above post.
In the previous post, I did sinple linear regression, it menas there is only one explanatory vatiable. In this post I will do multiple linear regression, it means there are multiple explanatory variables.

Firstly, I make time variable which is factor type variable.

Let's see table of time.

Then, I add time for explanatory vatiable.

Let's see regrettion table with moderndive package's get_regression_table() function.

p_value of time: 2017 or l_tot_1000employed:time is greater than 0.05, so I know those variables are not statistically significant.

Let's visualize this regression.

Next, I use l_women_pc_researcher instead of time.

Let's see the regression table.

I see l_tot_1000employed, l_women_pc_researcher and l_tot_1000employed:l_women_pc_researcher have small p-value less than 0.05.
So, those three are statistically significant.

Let's visialize actual l_usd_cap and estimated l_usd_cap.

Firstly, I show regression data point with moderndive's get_regression_package

I used ggplot() + geom_point() function to draw a scatter plot.

If actual l_usd_cap and estimated l_usd_cap are exactly same, all scatter pooints shold be on the line.

Let's see the histogram of the residuals.

Now, I have three linear regression models. Let's compare them with moderndive's get_regression_summaries() function.

model3, which has two explanatory variables, l_tot_1000employed and l_women_pc_researcher, has the smallest emse and the largest adj_r_squared.

That's it. Thank you!

To read from the fist post,