www.crosshyou.info

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

World Bank Population living in slums data analysis 4 - Simulation based statistical inference with R infer package

Generated by Bing Image Creatpr: Wide large view of rice fields in the summer, blue sky and sunflower

www.crosshyou.info

This post is following of the above post. In this post I will do simulation based statistical inference with R infer package.

First, I load infer package.

I refere website of Tidy Statistical Inference • infer

First, I will check "whether change are related to Region or not."

I calculate obserbed F statistic.

Then, I generate null distribution using randomization simulation.

Let's visualize null distribution and observed F statistic.

It seems there is not statistically significant relationship between change and Region.

Let's calculate p-value.

p-value is 0.291, it is grater than 0.05. So, it is not statistically significant relationship between change and Region.

To confirm that, I also do formula based ANOVA with aov() function and summary() function.

With formula based, p-value is 0.293. So, the conclusion is not changed.

 

Next, let's check whether change and IncomeGroup has relationship or not.

I calculate obserbed F statistic.

Next, I generate null distribution.

I visualize null distribution and obserbed F statistic.

It seems there is statistically significant relationship.

Let's calculate p-value.

p-value is 0.006, which is smaller than 0.05. So, there is statistically relationship between change and IncomeGroup.

How about formula based ANOVA?

p-vale by formula based ANOVA is 0.00381, so it indicates significantly relationship.

 

Then, let's check change and sulums_2000.

First, I calculate obserbed slope.

Then, I calculate null distribution.

I visualize null distribution and observed slope.

It seems there is statistically relationship between change and sulums_2000.

Let's calculate p-value.

p-value is 0, so there is  surely statistically significant relationship.

Let's do formula based regression inference with lm() function and summary() function.

With formula based regression inference, p-value is 1.91e-08. So it is statistically significant. 

In this post I clonclude there is statistically relationship between change and IncomeGroup, change and sulums_2000, there is not statisticall significant relationship between change and Region.

That's it! Thank you!

Next post is

www.crosshyou.info

 

To reard from the first post,

www.crosshyou.info