OECD Crop production data analysis 9 - ANOVA(ANalysis Of VAriance) analysis with "infer" package using R

UnsplashAnisur Rahmanが撮影した写真 


This post is followoing of the above post.

In this post I will do ANOVA analysis with "infer" package using R.

I would like to see crop production are different by crop, maize, rice, soybean and wheat.

I check there summary statisitics.

For averages, maize is 6.0, rice is 3.4, soybean is 1.7 and wheat is 3.3. So there seems different.

Before doing ANOVA, let's make a data frame using pivot_longer() function.

Let's see boxplots.

It seems there must be difference among crops.

I will using "infer" package workflow for ANOVA analysis, it means not theoretical formula based analysis but computer simulation based analysis.
I will do the same way as Tidy ANOVA (Analysis of Variance) with infer • infer

I load "infer" package.


I calculate observed F statistic.

F statistic is 14.1. 

Next, I generate the null distribution of F statistics.

Let's visualize the null distribution and observed F statistic.

I see the observed F statistic is far away from the null distribution.

We can make the theoretical null distribution using assume() function.

The theoretical F distribution is 3 and 132 degrees of freedom distribution.

Let's visualize the theoretical F distribution and the observed F statistic.

I see the observed F statistic is far away too.

If I put method = "both" in visualize() function, I will see the both distributiom, simulation based and theoretical based.

Then, calculate p-value from the simulation null distribution.

The p-value is almost 0.

We can use pf() funtion to get p-value with theoretical formula.

We can use aov() function for theoretical(formula) based ANOVA.

p-values are almost 0, so I can reject the NULL hypothesis, crop productivity are all the same among maize, rice, soubean and wheat.

That's it. Thank you!

To read from the 1st post,