crosshyou

主にクロス表(分割表)分析をしようかなと思いはじめましたが、あまりクロス表の分析はできず。R言語の練習ブログになっています。

OECD International Student Mobility Data Analysis 1 - load CSV file data into R with read_csv() function

f:id:cross_hyou:20210717103235j:plain

Photo by Kenrick Baksh on Unsplash

f:id:cross_hyou:20210717102617p:plain

In this post I will analyze OECD International Student Mobility data using R.

You can download data from

Students - International student mobility - OECD Data

CSV file image is below.

f:id:cross_hyou:20210717103544p:plain

Let's upload this data into R.
To begin with, load tidyverse package.

f:id:cross_hyou:20210717103809p:plain

Then, use read_csv() function to load CSV file data.

f:id:cross_hyou:20210717104038p:plain

Let's see each variables. I use table() function for character bariables and summary() function for numeric variables.

f:id:cross_hyou:20210717104557p:plain

Many countries have 10 observations.

f:id:cross_hyou:20210717104752p:plain

For INDICATOR, we have only one "STUMOBILITY", so we can delete it from df.

f:id:cross_hyou:20210717104935p:plain

f:id:cross_hyou:20210717105044p:plain

For SUBJECT, we have only one "TRY_INFLOW", so we can delete it too.

f:id:cross_hyou:20210717105211p:plain

 

f:id:cross_hyou:20210717184830p:plain

For MEASURE, we have only one "PC_STUD_ENRL", so we can delete it too.

f:id:cross_hyou:20210717185018p:plain

f:id:cross_hyou:20210717185125p:plain

For FREQUENCY, we have only "A", so we can delete it too.

f:id:cross_hyou:20210717185249p:plain

f:id:cross_hyou:20210717185351p:plain

For TIME, the oldest year is 2005, the newest year is 2018 and we don't have NA.

f:id:cross_hyou:20210717185550p:plain

Value is International Student Mobility, so it is the most important data in the data frame.

The lowest value is 0.074 and the highest value is 47.735. It means there is a country which tertially strudend is almost half from abroad. We have 90 NAs. 

f:id:cross_hyou:20210717190105p:plain

For Flag Codes, we have only "M", so we can delete it too,

f:id:cross_hyou:20210717190234p:plain

Now, we delete vatiables which have only one kind of observations.

Let's look at whole data frame summary.

f:id:cross_hyou:20210717190429p:plain

We have 90 NA's for Value, so, let's delete them.

f:id:cross_hyou:20210717190558p:plain

I change variable names.

f:id:cross_hyou:20210717190742p:plain

Now, let's see summary again.

f:id:cross_hyou:20210717190901p:plain

All right, we have prepared for analysis so far.
That's it today. Thank you!

 The next post,

 

www.crosshyou.info