OECD Total official and private flows data analysis 1 - Load CSV file into R and some data wrangling.

UnsplashCosmic Timetravelerが撮影した写真 

Hello. I will analyse OECE Total offical and private flows using R.

I download data CSV file from the OECD web site.


I got the CSV like below.

First, I load tidyverse package.

Let's use read_csv() function to load the CSV file.

Let's use glimpse() function to see what variables are there.

The data frame has 1583 rows(observations) and 7 columns(variables).

5 columns/variables are charcter, 2 columns/variables are numeric.

Let's see character variables.

INDICATOR, SUBJECT, MEASURE, REQUENCY have only one value, so I can remove them.

Let's see TIME and Value

TIME starts from 1960 to 2022. Value range is from -16782 to 620075.
Let's see histogram.

Value has right skew very much.

Let's see which LOCATION has many observations.

SWE, NLD, AUT have the most observations.

There are many LOCATIONS, wich has more than 60 obervations. I would like to filter df_main data frame for which LOCATION only.

I also would like to filter TIME for each LOCATION has only.

2022 has only 4 observations, 1960 has 16 observations, so I remove the both TIMEs.

All right, now I have df_main, which is good to analyze. 

Let's draw line chart.

That's it. Thank you!

Next post is