Generated by Bing Image Creator: Light blue forest landscape, yellow and green flowers, photo
www.crosshyou.info
This post is following of the above post.
In this post, I will do statistical inference with infer package.
I refere to Tidy Statistical Inference • infer
In the previous post, I make a below graph.
The graph shows bmi is different by gender, but there are many overlaps.
So, let's do statistical inference.
First, I load infer package.
Then, I will follow 2-Sample t-Test in Tidy t-Tests with infer • infer article.
Let's begin with calculating observed statistic.
The bmi mean difference by gender is 4.16. It can be easily confirmed below.
Next, I sill generate null distribution.
Then, visualize null distribution and observed statistic.
You'll see our observed statistic is far away from the null distribution.
Let's calculate p-value.
I can say mean bmi differs by gender.
Next, let's calculate confidence interval.
Then, I calculate confidence interval.
Finally, I visualize bootstrap distribution and confidence interval.
After doing above statistical inference, I am 99% confident bmi is differ about 3.10 to 5.10 by gender and I am alomost 100% sure bmi differs by gender.
That's it. Thank you!
Next post is
To read from the first post,
I used below code in this post.
#
# load infer package
library(infer)
#
# Calculate the observed statistic
observed_statistic <- gym_raw |>
specify(bmi ~ gender) |>
calculate(stat = "diff in means", order = c("Male", "Female"))
observed_statistic
#
# confirm difference
gym_raw |>
group_by(gender) |>
summarize(mean_bmi = mean(bmi))
26.9 - 22.7
#
# Next, I will generate the null distribution with randomization
set.seed(123)
null_dist_2_sample <- gym_raw |>
specify(bmi ~ gender) |>
hypothesize(null = "independence") |>
generate(reps = 1000, type = "permute") |>
calculate(stat = "diff in means", order = c("Male", "Female"))
#
# Visualize the null distribution and test statistic
null_dist_2_sample |>
visualize() +
shade_p_value(observed_statistic,
direction = "two-sided")
#
# Calculate p-value
p_value_2_sample <- null_dist_2_sample |>
get_p_value(obs_stat = observed_statistic,
direction = "two-sided")
p_value_2_sample
#
# generate bootstrap distribution
set.seed(123)
boot_dist <- gym_raw |>
specify(bmi ~ gender) |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "diff in means", order = c("Male", "Female"))
#
# calculate confidence interval 99% level
percentile_ci <- get_confidence_interval(boot_dist,
level = 0.99)
percentile_ci
#
# visualize bootstrap distribution with confidence interval
boot_dist |>
visualize() +
shade_confidence_interval(endpoints = percentile_ci)
#