Rで何かをしたり、読書をするブログ

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

World Bank's Lead time to import data analysis 1 - load CSV file and make a tidy dataframe.

Generated by Bing Image Creator: Closeup flowering blue, yellow, pink and red roses. Background is natural high mountains dark night sky and a nebula, photo

In this series of posts, I will analyze Lead time to inport data from World Bank Data using R.

Lead time to import, median case (days) | Data (worldbank.org)

I downloaded two CSV files.

One is data file and the other is meta data file.

The data file is like below screen-shot.

The meta data file is likw below screen-shot.

First, I load tidyverse package.

Then, I use read_csv() function to load CSV file.

Since this dataframe is not a tidy dataframe, I will convert it to tidy dataframe with pivot_longer() function.

To make future workflow easier, I will change variables names and remove non necessarily variables.

Then, I will remove rows which include NA.

Next, I road meta data.

I change variable names and remove non necessarily variables.

Then, I omit rows which inclue NA.

So far, I have two tidy dataframes. I will merge the both with inner_join() function.

Looking at above screen-shot, I found year is <chr>, I will change it to numeric. and I will change code, region and group to factor.

All right. 

Next, let's use skimr::skim() function to get summary data.

I see there is not NA, there are 153 codes, 7 regions, 4 groups, minimum year is 2007, maximum year is 2018, minimum lttl(Lead time to imposrt) is 1, maximum lttl is 81.

That's it. Thank you!

Next post is,

www.crosshyou.info