Rで何かをしたり、読書をするブログ

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

OECD Adult education level data analysis 2 - calculate summary statistics and making histograms using R

UnsplashS. Tsuchiyaが撮影した写真 

www.crosshyou.info

This post is following of the above post.

In the above post, I made a data frame to work with. 
Let's check each variable names and it's explanations.

BUPSRY: Below upper secondary, in percentage
TRY: Tertiary, in percentage 
UPPSRY: Upper secondary, in percentage
TRY_MEN: Tertiary men, in percentage
TRY_WOMEN: Tertiary women, in percentage
UPPSRY_MEN: Upper secondary men, in percentage
UPPSRY_WOMEN: Upper secondary women, in percentage
BUPSRY + TRY + UPPSRY = 100
MLN_USD: GDP, in million USD
USD_CAP: per capita GDP, in USD

Since I would like to see relationships between tertiary education level and GDP, I will focus TRY, TRY_MEN, TRY_WOMEN, USD_CAP.

Let's see those 4 variables summary statistics.

I see TRYs averages are arorund 27%, USD_CAP is 31,647 USD.

I would like to see standard deviation and CV(coefficient of variation)

First, I make custom function to calulate mean, standard deviation and CV, then I use apply() fundtion. I see USD_CAP is the most variation.

Let's see histograms of those four variables.

I load gridExtra before making histograms.

Then, I use ggplot() + geom_histogram() function to make histogrmas and I use grid.arrange() function to display four histgrams at onece.

I see TRYs have similar shape and USD_CAP is very right skewed.

let's see log(USD_CAP) histogram.

I see log(USD_CAP) is more like normal distribution, then, I make a new varibale of log(USD_CAP).

Above is summary statistics of l_usd_cap. I see CV is very small, 0.056.

That's it. Thank you!

To read from the first post,

www.crosshyou.info