www.crosshyou.info

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

OECD Road accidents data analysis 1 - import CSV file data using R and make a data frame tidy.

 

 

f:id:cross_hyou:20210731080555j:plain

 Photo by Stephen Leonardi on Unsplash  

f:id:cross_hyou:20210731080332p:plain


In this post, I will analyze OECD Road accidents data.

The CSV file I donwloaded from the OECD web site(Transport - Road accidents - OECD Data) is like below.

f:id:cross_hyou:20210731081209p:plain

Let' analuze this data using R.
Firstly, I load tidyverse package.

f:id:cross_hyou:20210731081441p:plain

Then, let's use read_csv() function to import data into R.

f:id:cross_hyou:20210731081634p:plain

Let's see each variables.
LOCATION

f:id:cross_hyou:20210731081831p:plain

We see there are many LOCATIONs, it is ISO country code. So, I change variable names to iso.

f:id:cross_hyou:20210731082326p:plain

INDICATOR

f:id:cross_hyou:20210731082435p:plain

INDICATOR has just one value, ROADACCID. So I will remove INDICATOR from the data frame.

f:id:cross_hyou:20210731082617p:plain

SUBJECT

f:id:cross_hyou:20210731082909p:plain

SUBJECT has three kinds of value, DEATH, ACCIDENTCASUAL and INJURE, so I will convert SUBJECT to factor class.

f:id:cross_hyou:20210731083222p:plain

MEASURE

f:id:cross_hyou:20210731083355p:plain

MEASURE has three kinds of value. So I will change it to factor and rename them.

f:id:cross_hyou:20210731083815p:plain

f:id:cross_hyou:20210731083930p:plain

FREQUENCY has only one value, A. So I will remove from the data frame, df.

f:id:cross_hyou:20210731084202p:plain

TIME

f:id:cross_hyou:20210731084319p:plain

TIME is numeric value and there is no NA value. minimum value is 1970 and maximum value is 2019. TIME is year.

Value

f:id:cross_hyou:20210731084958p:plain

Value is numeric variable and there is no NA.

I will convert data frame column name to lowercase.

f:id:cross_hyou:20210731085214p:plain

All right, let's use summary() function to see data frame summary.

f:id:cross_hyou:20210731085358p:plain

For subject ACCIDENTATCASUAL, it is too long. Let's make it short.

f:id:cross_hyou:20210731085713p:plain

Let's see summary(df) again.

f:id:cross_hyou:20210731085818p:plain

Great!

That's it. Thank you!

The next post is..

 

www.crosshyou.info