crosshyou

主にクロス表(分割表)分析をしようかなと思いはじめましたが、あまりクロス表の分析はできず。R言語の練習ブログになっています。

OECD Material productivity data analysis 1 - Using R to load CSV file data and reshape dataframe format with pivot_wider.

f:id:cross_hyou:20220212080206j:plain

Photo by Ivana Cajina on Unsplash 

In this blog, I will analyze OECD Material productivity data.

f:id:cross_hyou:20220212082117p:plain

First, I downloaded data from the OECD webiste: Materials - Material productivity - OECD Data

OECD (2022), Material productivity (indicator). doi: 10.1787/dae52b45-en (Accessed on 12 February 2022)

The downloaded CSV file looks like below.

f:id:cross_hyou:20220212082423p:plain

I use R to analyze the data, so I firstly load tidyverse package and use read_csv() function to load the CSV file.

f:id:cross_hyou:20220212082850p:plain

f:id:cross_hyou:20220212083041p:plain

Let's check each variables. For character variables, how many vales are there, for numeric variables, summary statistics.

LOCATION

f:id:cross_hyou:20220212083425p:plain

There are many locations.

INDICATOR

f:id:cross_hyou:20220212083613p:plain

For INDICATOR, there is only one value, MATPROD, so we can delete INDICATOR.

f:id:cross_hyou:20220212083751p:plain

SUBJECT

f:id:cross_hyou:20220212084400p:plain

There are two kinds of vakue, TOTMAT and NONNRGMAT. So I keep SUBJECT.

MEASURE

f:id:cross_hyou:20220212084628p:plain

There is only one value in MEASURE, it is USD_KG. I will remove MEASURE.

f:id:cross_hyou:20220212084836p:plain

FREQUENCY

f:id:cross_hyou:20220212084954p:plain

There is only one value in FRQUENCY, it is A. I will remove FREQUENCY.

f:id:cross_hyou:20220212085145p:plain

 

TIME

f:id:cross_hyou:20220212085310p:plain

For TIME, it starts in 1980 and ends in 2019. There is not NA value.

 

Value

f:id:cross_hyou:20220212085502p:plain

There is not NA value in Value.
Now, my new dataframe, df has LOCATION, SUBJECT, TIME and Value.

f:id:cross_hyou:20220212085652p:plain

I know SUBJECT has two kinds of value, one is TOTMAT and the oter is NONNRGMAT.
So, I want to reshape my dataframe df like below

LOCATION TIME TOTMAT NONNRGMAT
AUS            1990  XXX       XXX
AUS            1991  XXX       XXX
AUS            1992  XXX       XXX

 

So, I will use pivot_wider() function.

f:id:cross_hyou:20220212091544p:plain

All right.
Let's check summary statistics.

f:id:cross_hyou:20220212091742p:plain

There are several NA's in NONNRGMAT and TOTMAT. So, I will delete NA rows

f:id:cross_hyou:20220212092012p:plain

By the way, I have GDP and per capita GDP data file like below.

f:id:cross_hyou:20220212092358p:plain

So, I will conbine this GDP data and Material productibity data.

f:id:cross_hyou:20220212092846p:plain

let's see summary stats.

f:id:cross_hyou:20220212092949p:plain

That's it.
I will investigater further in next post.

Thank you!

The next post is

 

www.crosshyou.info