# OECD Crop production data analysis 9 - ANOVA(ANalysis Of VAriance) analysis with "infer" package using R UnsplashAnisur Rahmanが撮影した写真

This post is followoing of the above post.

In this post I will do ANOVA analysis with "infer" package using R.

I would like to see crop production are different by crop, maize, rice, soybean and wheat.

I check there summary statisitics. For averages, maize is 6.0, rice is 3.4, soybean is 1.7 and wheat is 3.3. So there seems different.

Before doing ANOVA, let's make a data frame using pivot_longer() function. Let's see boxplots.  It seems there must be difference among crops.

I will using "infer" package workflow for ANOVA analysis, it means not theoretical formula based analysis but computer simulation based analysis.
I will do the same way as Tidy ANOVA (Analysis of Variance) with infer • infer I calculate observed F statistic. F statistic is 14.1.

Next, I generate the null distribution of F statistics. Let's visualize the null distribution and observed F statistic.  I see the observed F statistic is far away from the null distribution.

We can make the theoretical null distribution using assume() function. The theoretical F distribution is 3 and 132 degrees of freedom distribution.

Let's visualize the theoretical F distribution and the observed F statistic.  I see the observed F statistic is far away too.

If I put method = "both" in visualize() function, I will see the both distributiom, simulation based and theoretical based.  Then, calculate p-value from the simulation null distribution. The p-value is almost 0.

We can use pf() funtion to get p-value with theoretical formula. We can use aov() function for theoretical(formula) based ANOVA. p-values are almost 0, so I can reject the NULL hypothesis, crop productivity are all the same among maize, rice, soubean and wheat.

That's it. Thank you!

To read from the 1st post,