OECD Nutrient balance data analysis 1 - load data into R

Photo by ross tek on Unsplash 

In this post, I will upload OECD Nutrient balance data in to R.
From OECD web iste, I downloaded data csv file like below.

Let's analyze this data in R!

Firstly I load "tidyverse" package, this is the great package for R users.

I use read_csv() fundtion to load CSV file data into R.

LOCATION ~ FREQUENCY are character, TIME and Value are numeric data.

Let's check how many NA in each column.

Good news! There is no NA in the data.

Let's check LOCATION data.

Almost LOCATION have more than 100 observations.

Let's check INDICATOR

For INDICATOR, there is only one value, "NUTRBALANCE".

How about SUBJECT?


Let's go to MEASURE

There are two values for MEASURE, KG_HA and TONNE

Let's check FREQUENCY.

There is only A for FREQUENCY.

For character type variables, INDICATOR and FREQUENCY have only one value, so I can delete those two colmuns later.

Let's chack TIME and Value, they are numeric data.

For TIME, the oldest year is 1985 and the newest year is 2020.

For Value, there is negative value. According to OCED description, negative value means declining soil fertility.

Next, let's make a data frame for analysis.

I removed INDICATOR and FREQUENCY, converted to factor type for SUBJECT and MESURE and lastly, converted colmun names to lower character.

Let's see subject X measure table.

Each combinations seems almost same observations.

That's it. Thank you!

Next post is