www.crosshyou.info

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

OECD Gross pension replacement rates data analysis 2 - making data frame for analysis using select(), rename(), mutate() function and making a histogram with ggplot() + geom?histogram() function with R

UnsplashAyesha Firdausが撮影した写真 

www.crosshyou.info

This post is following of the above post.

In the previous post, I load CSV file data into R with read_csv() function. In this post I make a data frame for analysis use.

First, I select meaningful variables only using select() function.

Next, I rename variable names to more understandable with rename() function.

Thirs, I change data type to factor for character variables with as.factor() function and mutate() function.

Great! Let's use summary() function to see summary statistics.

I see each location has 6 observations, MEN and WOMEN have the same number of observations, year starts from 2014 to 2020, replacement_rate range is 10.50 to 96.50.

What I would like to analyze are two things.

1. replacement_rate are different for MEN and WOMEN

2. replacement_rate are different for 2014 and 2020.

Let's see histogram of all replacement_rate.

Since the average value of relacement_rate is 51.77(%), the histogram is centerd around 50.

Let's divied MEN and WOMEN

In geom_histogram() function, I use binwidth = 10 to make binwidth is 10, boundary = 0 to place a histogram with 0. In addition, I use facet_wrap() function to make histograms for MEN and for WOMEN.

The two histogram look very similar, so I feel there is not much different between MEN and WOMEN.

Next, let's see 2014 and 2020.

Uhm... I cannot tell there are much difference.

I need statistical inference to tell whether there is differnece in MEN and WOMEN, in 2014 and 2020.

That's it. Thank you!

Next post is

www.crosshyou.info

 

To read form the first post,

www.crosshyou.info