crosshyou

主にクロス表(分割表)分析をしようかなと思いはじめましたが、あまりクロス表の分析はできず。R言語の練習ブログになっています。

OECD NEET Data Analysis 1 - read CSV file using R read_csv function.

f:id:cross_hyou:20210925170306j:plain

Photo by 懒 羊羊 on Unsplash 

f:id:cross_hyou:20210925170104p:plain

In this BLOG series, I will investigae NEET data.

Firstly, I get data from OECD web site.

CSV file is like below.

f:id:cross_hyou:20210925170623p:plain

Let's load this data into R.
Firstly, I load tidyverse package and will use read_csv function.

f:id:cross_hyou:20210925170910p:plain

f:id:cross_hyou:20210925171048p:plain

All right, let's check each variables.

f:id:cross_hyou:20210925171326p:plain

LOCATION is country code. We see some countries has over 200 observations and some countries has less than 100 obervations.

f:id:cross_hyou:20210925171551p:plain

INDICATOR has onry one value, NEET, so we can delete this variable from the dataframe.

f:id:cross_hyou:20210925171733p:plain

f:id:cross_hyou:20210925171908p:plain

We see SUBJECT has many values. So, I change SUBJECT to factor from character.

f:id:cross_hyou:20210925172103p:plain

Oh, I forgot to change LOCATION to factor class.

f:id:cross_hyou:20210925172237p:plain

f:id:cross_hyou:20210925172400p:plain

MEASURE has only one value, PC_AGE. So we can delete MEASURE.

f:id:cross_hyou:20210925172558p:plain

f:id:cross_hyou:20210925172719p:plain

There is only one value in FREQUENCY, it is A. So we can remove it.

f:id:cross_hyou:20210925172851p:plain

f:id:cross_hyou:20210925173028p:plain

TIME is numeric data. Minimum value is 1997 and max value is 2020. So, we have approximately 20 years history.

f:id:cross_hyou:20210925173212p:plain

Value is NEET percentage. It is surprising the max value is 66%. There is a country 66% young people is NEET!

Lest's use summary function for dataframe, df.

f:id:cross_hyou:20210925173527p:plain

All right, we deleted unnecessary variables and changed character class variables to factor class.

That's it. Thank you!

Next post is

 

www.crosshyou.info