www.crosshyou.info

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

OECD International Student Mobility Data Analysis 1 - load CSV file data into R with read_csv() function

f:id:cross_hyou:20210717103235j:plain

Photo by Kenrick Baksh on Unsplash

f:id:cross_hyou:20210717102617p:plain

In this post I will analyze OECD International Student Mobility data using R.

You can download data from

Students - International student mobility - OECD Data

CSV file image is below.

f:id:cross_hyou:20210717103544p:plain

Let's upload this data into R.
To begin with, load tidyverse package.

f:id:cross_hyou:20210717103809p:plain

Then, use read_csv() function to load CSV file data.

f:id:cross_hyou:20210717104038p:plain

Let's see each variables. I use table() function for character bariables and summary() function for numeric variables.

f:id:cross_hyou:20210717104557p:plain

Many countries have 10 observations.

f:id:cross_hyou:20210717104752p:plain

For INDICATOR, we have only one "STUMOBILITY", so we can delete it from df.

f:id:cross_hyou:20210717104935p:plain

f:id:cross_hyou:20210717105044p:plain

For SUBJECT, we have only one "TRY_INFLOW", so we can delete it too.

f:id:cross_hyou:20210717105211p:plain

 

f:id:cross_hyou:20210717184830p:plain

For MEASURE, we have only one "PC_STUD_ENRL", so we can delete it too.

f:id:cross_hyou:20210717185018p:plain

f:id:cross_hyou:20210717185125p:plain

For FREQUENCY, we have only "A", so we can delete it too.

f:id:cross_hyou:20210717185249p:plain

f:id:cross_hyou:20210717185351p:plain

For TIME, the oldest year is 2005, the newest year is 2018 and we don't have NA.

f:id:cross_hyou:20210717185550p:plain

Value is International Student Mobility, so it is the most important data in the data frame.

The lowest value is 0.074 and the highest value is 47.735. It means there is a country which tertially strudend is almost half from abroad. We have 90 NAs. 

f:id:cross_hyou:20210717190105p:plain

For Flag Codes, we have only "M", so we can delete it too,

f:id:cross_hyou:20210717190234p:plain

Now, we delete vatiables which have only one kind of observations.

Let's look at whole data frame summary.

f:id:cross_hyou:20210717190429p:plain

We have 90 NA's for Value, so, let's delete them.

f:id:cross_hyou:20210717190558p:plain

I change variable names.

f:id:cross_hyou:20210717190742p:plain

Now, let's see summary again.

f:id:cross_hyou:20210717190901p:plain

All right, we have prepared for analysis so far.
That's it today. Thank you!

 The next post,

 

www.crosshyou.info