UnsplashのErik Knoefが撮影した写真
In this post, I will analyze OECD Tourism flows data analysis.
I download CSV file from the OECD website.
I also downladed GDP per capita data from OECD web site.
I use R for data analysis.
First, I load tidyverse package.
I use read_csv() fundtion to load CSV file data into R.
I also load GDP per capita CSV file.
Then, I merge 'df' and 'gdp' with inner_join() function and make a new data frame.
Let's use summary() function to get 'df_gdp' summary statistics.
From 'LOCATION' to 'FREQUENCY', there is not so much information, I just know there are 1379 observations.
For TIME, I know it strats from 2008 to 2921.
Value comes from 'df' and 'per_capita' comes from 'gdp'.
Let's check LOCATION with table() function.
LOCATIONs which have most observations are AUS, GRC, ISR, JPN and LTU.
Let's see INDICATOR
INDICATOR has only one value, TOUR_FLOW, so I can remove this variable.
Let's check SUBJECT
INTER_ARR is arrival number of tourist, ACC_NIGHTS is accomodation nights and INTER_DEP is departure number of tourist. I keep SUBJECT.
Let's check MEASURE.
MEASURE has only one value, NBR. so I remove MEASURE from df_gdp.
Let's check FRQUENCY
FREQUENCY has only one value, A. So I can remove FREQUENCY.
Then, I mutate LOCATION and SUBJECT to factor class.
I also changed variavle names.
All right, let's use summary() function again.
I think Value can be re-scale. I divide Value by 10000.
Let's see summary.
That's it for this post. I will continure analysis in the next post. Thank you!
Next post is