www.crosshyou.info

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

OECD Material productivity data analysis 2 - Using R ggplot2 for making some graphs.

Data_Analysis

f:id:cross_hyou:20220213075008j:plain

Photo by Mateusz Klein on Unsplash

www.crosshyou.info

This post is following of the above post.
Let's make some graphs to get big picuture of the data.

Fisrstly, I make histograms for each variables.

Let's start with NONNRGMAT

f:id:cross_hyou:20220213075345p:plain

f:id:cross_hyou:20220213075356p:plain

TOTMAT

f:id:cross_hyou:20220213075522p:plain

f:id:cross_hyou:20220213075531p:plain

before making gdp histogram, I load gridExtra package to arrange two histogram.

f:id:cross_hyou:20220213080051p:plain

f:id:cross_hyou:20220213080637p:plain

f:id:cross_hyou:20220213080649p:plain

The lower histogram is log-scaled histogram. So, it is betther to use log(gdp) for analysis later.

So, I make log(gdp) as l_gdp.

f:id:cross_hyou:20220213080933p:plain

capi histogram

f:id:cross_hyou:20220213082203p:plain

f:id:cross_hyou:20220213082214p:plain

I see square rooted capi is the most synmetric distribution. So I make sqare rooted capi as r_capi.

f:id:cross_hyou:20220213082556p:plain

Then, Let's see ranking by LOCATION for each variables.

Firstly, I check which TIME has the most observations.

f:id:cross_hyou:20220213083027p:plain

2010, 2011, 2012, 2013 and 2014 has 57 observations.

So, I use 2010 ~ 2014 data only.

NONNRGMAT ranking by LOCATION.

f:id:cross_hyou:20220213083711p:plain

f:id:cross_hyou:20220213083654p:plain

NLD is the 1st, GBR is the 2nd, LUX is the 3rd.

TOTMAT ranking by LOCATION

f:id:cross_hyou:20220213084256p:plain

f:id:cross_hyou:20220213084309p:plain

CHE is the 1st, LUX is the 2nd, NLD is the 3rd.

log(gdp) ranking

f:id:cross_hyou:20220213084817p:plain

f:id:cross_hyou:20220213084830p:plain

USA is the 1st, CHN is the 2nd, IND is the 3rd.

sqrt(capi) ranking

f:id:cross_hyou:20220213085318p:plain

f:id:cross_hyou:20220213085329p:plain

LUX is the 1st, SGP is the 2nd, NOR is the 3rd.

Lastly, let's make scatter plots.

I use PerformanceAnalytics::chart.Correlation() function

f:id:cross_hyou:20220213090302p:plain

f:id:cross_hyou:20220213090320p:plain

NONNRGMAT and TOTMAT has very high correlation.
r_capi and NONNRGMAT, TOTMAT has relatively high correlation.

l_gdp and NONNRGMAT, TOTMAT has weak correlation.

That's it.
Thank you!

The next post is

www.crosshyou.info

To see the 1st post,

www.crosshyou.info