OECD Trust in government data analysis 1 - import CSV file into R and make a tidy data frame.

Generated by Bing Image Creator : Photograph of Japanese Shrine and Blue Sky and White Could and Beautiful Flowers

In this post I an going to analyze OECD Trust in government with R.

Firstly, I downloaded CSV file like below from OECD web site.
General government - Trust in government - OECD Data

I load tidyverse package before uploading CSV file into R.

Then, I use read_csv() function.

Let's see raw_df

This data frame, df_raw has 638 observations and 7 variables.

Let's use skimr::skim() function.

There are 5 character variables and 2 numeric variables.
Among 5 character variables, only LOCATION has various value, 41 n_unique.
So, It is all right to remove INDICATOR, SUBJECT, MEASURE and FREQUENCY.

I use mutate() function to convert LOCATION to factorr and select() function to select LOCATION, TIME and Value only.

Since I would like to analyze relationship between Trust in government and per capita GDP, I also download per capita GDP CSV file.

GDP and spending - Gross domestic product (GDP) - OECD Data

I use read_csv().

Let's see how df_gdp looks like with glimpse() function.

Let's use skimr::skim() function.

Variable: MEASURE has 2 unique values. Let's see it.

There are MLN_USD and USD_CAP. MLN_USD is GDP value and USD_CAP is per capita GDP. So, I only use USD_CAP.

Let's merge df and df_gdp_pc with inner_join() function.

Let's use summary() function to see df summary.

I see each LOCATION has 17 records, which means there are 17 years data. TIME starts from 2006 to 2022. Value, which is Trust in governmnet is from 6.877 to 84.998, mean is 43.983. pc_gdp, which is per capita GDP varies from 9433 to 119364, mean is 37819.

It is good to change variable name from Value to trust and LOCATION and TIME.

All right. now, I got a tidy data frame to do analyze.

That's it. Thank you!

Next post is