crosshyou

主にクロス表(分割表)分析をしようかなと思いはじめましたが、あまりクロス表の分析はできず。R言語の練習ブログになっています。

OECD Purchasing power parities (PPP) data analysis 3 - relationship with GDP data and PPP. Some countries have positive correlation and some have negative.

f:id:cross_hyou:20211226082738j:plain

Photo by Quino Al on Unsplash 

www.crosshyou.info

This post is following of the above post.
I have GDP data file like below, which I downloaded OECD web site.

f:id:cross_hyou:20211226083619p:plain

I am going to merge this data to previous ppp data.

Firstly, I upload this CSV file into R using read_csv() function.

f:id:cross_hyou:20211226083759p:plain

Then, I use inner_join() function to mergr previous dataframe;df and this GDP dataframe;oecd_gdp.

f:id:cross_hyou:20211226084037p:plain

gdp is real GDP and capi is per capita GDP.

Let's see ppp and gpd graph.

f:id:cross_hyou:20211226085140p:plain

f:id:cross_hyou:20211226085211p:plainSince GDP and PPP have wide range data, above graph is not good to see.

I change ppp to log(ppp) and gdp to log(gdp).

f:id:cross_hyou:20211226085747p:plain

f:id:cross_hyou:20211226085800p:plain

I see many countries have positive correlation between log(gdp) and log(ppp).

Now, let's see ppp and capi: per capita GDP.

f:id:cross_hyou:20211226090435p:plain

f:id:cross_hyou:20211226090447p:plain

Again, let's convert ppp to log(ppp) and capi to log(capi).

f:id:cross_hyou:20211226090741p:plain

f:id:cross_hyou:20211226090754p:plain

At this point, I realize it is better to see log(ppp), log(gdp) and log(capi) than ppp, gdp and capi in order to analyze them, I am going to make a new vatiables.

f:id:cross_hyou:20211226091013p:plain

l_ppp is log(ppp), l_gdp is log(gdp) and l_capi is log(capi).

Let's see correlation between l_ppp and l_gdp by year.

f:id:cross_hyou:20211226091416p:plain

f:id:cross_hyou:20211226091427p:plain

We see 1960 ~ 1969 and 2020 are different characteristics than other years. Correlation between log(gdp) and log(ppp) seems negative correlation.

Let's see log(ppp) and log(capi) by year.

f:id:cross_hyou:20211226091816p:plain

f:id:cross_hyou:20211226091827p:plain

We see 1960 ~ 1969 and 2020 are silightly different characteristics. Correlation between log(ppp) and log(capi) seems positive correlation.

I make df subset for excluding 1960 ~ 1969 and 2020.

f:id:cross_hyou:20211226092457p:plain

Using this df_subset, let's see l_ppp and l_gdp correlation by country.

f:id:cross_hyou:20211226094911p:plain

f:id:cross_hyou:20211226094542p:plain

I see USA is excluded because USA has no variation for l_ppp. Some countries have positive correlation and some has negative correlation.

I will delete USA from df_subset and let's see l_ppp and l_capi by country.

f:id:cross_hyou:20211226095256p:plain

f:id:cross_hyou:20211226095307p:plain

Like l_ppp and l_gdp, some contries have positive correlation between l_ppp and l_capi and some have negative correlation.

That's it. Thank you!

The next post is

 

www.crosshyou.info

 

To read 1st posrt

 

www.crosshyou.info