www.crosshyou.info

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

Data_Analysis

OECD Gender wage pay gap data analysis 7 - Line Graph and Boxplot with R - gender pay gap is decreasing

UnsplashのAlejandro Contrerasが撮影した写真 www.crosshyou.info This post is following of the above post. Let's check which LOCATION has the most observations. I see GBR has the most observations, 66 observations. USA has the 2nd most, FIN …

OECD Gender wage pay gap data analysis 6 - Multiple Linear Regression with R

UnsplashのMatteo Vellaが撮影した写真 www.crosshyou.info This post is following of the above post.In the above post, I did simple linear regression analysis. This time, I will for multiple linear regression (MLR). I make a new variable, fac…

OECD Gender wage pay gap data analysis 5 - Simple Linear Regression with R and calculating confidence interval.

UnsplashのLaura Smetsersが撮影した写真 www.crosshyou.info This post is following of the above post. I will do regression analysis. Fist, I will do simple linear regression analysis. I use SELFEMPLOYEMENT as a dependent variable and EMPLOYE…

OECD Gender wage pay gap data analysis 4 - Estonia is the outlier from the point of gender wage pay gap view.

UnsplashのRenato Pozziが撮影した写真 www.crosshyou.info This post is following of the above post. I make small data frame that contains only 2010 and 2014 observations. First, I use filter() function to get only 2010 and 2014 data, then I …

OECD Gender wage pay gap data analysis 3 - Bootstrapping Confidence Interval and Traditional Method p-value with R

UnsplashのMaria Tejadaが撮影した写真 www.crosshyou.info This post is following of the above post.I will calculate confidence interval. First, I make bootstrapping confidence interval. I use R infer package. I use specify(), generte() and c…

OECD Gender wage gap data analysis 2 - Statistical Inference with Infer package, mean difference, calculation p-value with simulation based method

UnsplashのJames Wainscoatが撮影した写真 www.crosshyou.info This post is following of the above post. In the previous post I see mean of EMPLOYEE is 17.9 and mean of SELFEMPLOYED is 30.6. Let's see whether the difference is statistically si…

OECD Gender wage gap data analysis 1 - Load CSV file data into R

UnsplashのKumiko SHIMIZUが撮影した写真 Hello. In this post, I will analyze Gender wage gap of OECD data with R. First, I download CSV file like belo from OECD web site, https://data.oecd.org/earnwage/gender-wage-gap.htm I use R to analyze …

OECD Nuclear power plants data analysis 5 - Hypothesis test for One proportion using R infer package.

UnsplashのCraig Mannersが撮影した写真 www.crosshyou.info This post is following of the above post. In this post, I will do hypothesis test for one proportion. For Japan nuclear power plants proportion. In the previous post, I found Japan n…

OECD Nuclear power plants data analysis 4 - Getting confidence interval for one proportion using R infer package

UnsplashのMaarten van den Heuvelが撮影した写真 www.crosshyou.info This post is following of the above post. In this post, I will get confidence interval for one proportion. In this case, number of nuclear power plants in Japan / number of …

OECD Nuclear power plants data analysis 3 - Hypothesis testing using R with infer package

UnsplashのYan Agritが撮影した写真 www.crosshyou.info This post is following of the above post. In this post I do hypothesis testing using R with infer package. I refere to B Inference Examples | Statistical Inference via Data Science (mode…

OECD Nuclear power plants data analysis 2 - Getting Confidence Interval using R with infer package

UnsplashのEean Chenが撮影した写真 www.crosshyou.info This post is following of the above post.I will calculate confidence interval in this post. There are two ways to calclulate confidence interval, one is bootstrap method and the other is…

OECD Nuclear power plants data analysis 1 - Loading CSV data with R - USA has the most nuclear power plants.

UnsplashのLukáš Lehotskýが撮影した写真 In this post, I will playaround with OECD Nuclear power plants data with R. OECD Nuclear power plants data is defined as the number of nuclear units in operation as of 1 January 2019. It is measured a…

OECD social spending data analysis 5 - Bootstrapping with R infer package

UnsplashのSonika Agarwalが撮影した写真 www.crosshyou.info This post is following of the above post. In this post, I will do bootstrapping with R infer package. Suppoese df2$priv_pc_gdp is population. So true mean of priv_pc_gdp is The true…

OECD Social spending data analysis 4 - Calculating Confidence Interval using R

UnsplashのArda Demirkaynakが撮影した写真 www.crosshyou.info This post is following of above post. In the previous post, I made some visualizations with R ggplot2 package. In this post. In this post I will calculate confidence intervals. Fi…

OECD Social spending data analysis 3 - Data Visualization with 5 Named Graphs (5NG) using R

UnsplashのAlicia Steelsが撮影した写真 www.crosshyou.info This post is following of above post.In the previous post, I made a dataframe for data analysis, named 'df2'.Now, let's start data analysis with data visualization.I will make 5 Name…

OECD Social spending data analysis 2 - Using filter(), select(), inner_join(), rename() function with R to make a dataframe to analyze.

UnsplashのMilos Prelevicが撮影した写真 www.crosshyou.info This post is following of the above post. In the previous post, I load OECD Social spending data into R. I also load country ISO code and continent name data like below CSV file. I …

OECD Social spending data analysis 1 - Load CSV file data using R, read_csv() function.

UnsplashのAlexander Schimmeckが撮影した写真 In this post I will analyze OECD Social spending data using R. OECD (2022), Social spending (indicator). doi: 10.1787/7497563b-en (Accessed on 26 November 2022) This indicator is measured as a pe…

OECD Researchers data analysis 6 - Multiple Linear Regression in R

UnsplashのEvi T.が撮影した写真 www.crosshyou.info This post is following of the above post.In the previous post, I did sinple linear regression, it menas there is only one explanatory vatiable. In this post I will do multiple linear regres…

OECD Researchers data analysis 5 - Simple Linear Regression with one numerical variable in R, ModernDive way

UnsplashのMadara Parmaが撮影した写真 www.crosshyou.info This post is following of the above post.In this post, I will do linear regression analysis. To do this, I make a small(subset) data frame. Let's check what TIME has the most observat…

OECD Researchers data analysis 4 - Sorting dataframe by column in R

UnsplashのKarsten Würthが撮影した写真 www.crosshyou.info This post is following of the above post. In this post, let's sort dataframe by variables. The smallest TOT_1000EMPLOTED observation is CHL 2009. The largest TOT_1000EMPLOYED observa…

OECD Researchers data analysis 3 - 5 Named Graphs in R

UnsplashのPhong Nguyenが撮影した写真 www.crosshyou.info This post is following of the above post.In this post I will create 5 names graphs in R. I refer to Chapter 2 Data Visualization | Statistical Inference via Data Science (moderndive.c…

OECD Researchers data analysis 2 - Converting long format dataframe to wide format dataframe and merge two dataframes with R

UnsplashのSakuraが撮影した写真 www.crosshyou.info This post is floowing of the above post. Let's explore gdp dataframe. gdp dataframe has more LOCATION than researcher dataframe. gdp dataframe INDICATOR has only one calue, GDP. So I can re…

OECD Researchers data analysis 1 - Load CSV file into R with read_csv() function.

UnsplashのMarek Piwnickiが撮影した写真 In this post, I will analyze OECD Researchers data. Researchers are professionals engaged in the conception or creation of new knowledge, products, processes, methos and systems, as well as in the man…

OECD Non-Financial Corporations Debt to Surplus Ratio Analysis 6 - Hierarchical Clustering using R

UnsplashのWolfgang Hasselmannが撮影した写真 www.crosshyou.info This post is following of the above post.In this post, I will do hierarchical clustering using R. It is very easi with R. Firstly, I make a matrix for hierarchical clustering. …

OECD Non-Financial Corporations Debt to Surplus Ratio Analysis 4 - t-test, Wilcoxon rank sum test and correlation test using R

Unsplashのmartin bennieが撮影した写真 www.crosshyou.info This post is following of above post. Let's calculate difference between Y2016 and Y2015 Let's see a histogram of d2016 Then, let's calculate difference between Y2017 and Y2016 Let's…

OECD Non-Financial Corporations Debt to Surplus Ratio Analysis 3 - Calculating Confidence Interval in R, Parametric and Monte Carlo.

UnsplashのHeather Wildeが撮影した写真 www.crosshyou.info This post is following of the above post. In this post, I will show some statistics of our data. Before investigation, I make data frame to wide format with pivot_wider() function. W…

OECD Non-Financial Corporations Debt to Surplus Ratio Analysis 2 - making various type plots with ggplot() + geom_~~~ using R.

UnsplashのJ Cruikshankが撮影した写真 www.crosshyou.info This post is following of the above post. In the previous post, I load CSV file data into R. Then, let's make some basic graphs using ggplot2 package. Scatter plot ggplot() + geom_poi…

OECD Non-Financial Corporations Debt to Surplus Ratio Analysis 1 - Load CSV file data using R

UnsplashのJeremy Thomasが撮影した写真 In this post, I will use R for analysis about OECD Non-Financial Corporations Debt to Surplus Ratio. This ratio is debt outstanding / annual flow if gross operating surplus. So, the higher the ratio, t…

OECD Nutrient balance data analysis 8 - F-Test and Heteroskedasticity-Robust Inference in R

Photo by S. Tsuchiya on Unsplash www.crosshyou.info This post is following above post. In the previous post, I did multiple regression, s_ni_kg ~ s_po_kg + s_ni_to. Let's add 'time' variables. All time variables are not statistically signi…

OECD Nutrient balance data analysis 7 - Simple Regression and Multiple Regression using R

Photo by Harry Gillen on Unsplash www.crosshyou.info This post is following of the above post. In the previous post, I made scaled variables in df4, let's see correlation matrix of those variables. The most highly correlated variable pair …