Rで何かをしたり、読書をするブログ

政府統計の総合窓口のデータや、OECDやUCIやのデータを使って、Rの練習をしています。ときどき、読書記録も載せています。

Kaggle's Gym Members Exercise Dataset Analysis with R 2 - Visualizing Gym Data

Generated by Bing Image Creator: wine red colored kochia field under blue sky and white cloud, fine photo

www.crosshyou.info

This post is following of above post. In the above post, I imported Kaggle's Gym Members Exercise Dataset into R.

In this post, let's do what we call EDA(Exploratory Data Analysis).

We have 973 observations and 15 variables, age, gender, weight, height, maxbpm, avgbpm, restbpm, hours, calories, type, fatpct, water, days, level and bmi.

Let's visualize each variables. 
I start with "age".

age seems uniform distribution.

Male has more observations, but there is not large difference.

weight seems normal distribution with right side tale long.

height seems normal distribution

maxbpm seems uniform distribution.

avgpbm seems uniform distribution.

restbpm also seems uniform distribution.

hours seems have three groups, less than 1.0, between 1.0 to 1.5, greater than 1.5.

calories seems normal distribution.

There are four types, they are about same observations.

water seems normal distribution.

3 days and 4 days are majority.

level 2 has the most observation.

bmi seems normal distribution with long right tail.

All right, now I have some sense of variables. There are not peculiar values in our dataset.

That's it. Thank you!

Next post is

www.crosshyou.info

 

To read from the first post,

www.crosshyou.info

This post's code is below.

#
# age histogram
ggplot(gym_raw, aes(x = age)) +
  geom_histogram(color = "white", binwidth = 5, boundary = 20)
#
# gender bar graph
ggplot(gym_raw, aes(x = gender)) +
  geom_bar(aes(fill = gender))
#
# weight histogram
ggplot(gym_raw, aes(x = weight)) +
  geom_histogram(color = "white")
#
# height histogram
ggplot(gym_raw, aes(x = height)) +
  geom_histogram(color = "white", binwidth = 0.025, boundary = 1.8)
#
# maxbpm histogram
ggplot(gym_raw, aes(x = maxbpm)) +
  geom_histogram(color = "white", binwidth = 1, boundary = 200)
#
# avgbpm histogram
ggplot(gym_raw, aes(x = avgbpm)) +
  geom_histogram(color = "white", binwidth = 1, boundary = 160)
#
# restbpm
ggplot(gym_raw, aes(x = restbpm)) +
  geom_histogram(color = "white", binwidth = 1, boundary = 120)
#
# hours histogram
ggplot(gym_raw, aes(x = hours)) +
  geom_histogram(color = "white", binwidth = 0.1, boundary = 0.5)
#
# calories histogram
ggplot(gym_raw, aes(x = calories)) +
  geom_histogram(color = "white", binwidth = 50, boundary = 1000)
#
# type bar chart
ggplot(gym_raw, aes(x = type)) +
  geom_bar(aes(fill = type))
#
# fatpct histogram
ggplot(gym_raw, aes(x = fatpct)) +
  geom_histogram(color = "white", binwidth = 1, boundary = 25)
#
# water histogram
ggplot(gym_raw, aes(x = water)) +
  geom_histogram(color = "white", binwidth = 0.125, boundary = 2.5)
#
# days bar chart
ggplot(gym_raw, aes(x = days)) +
  geom_bar()
#
# level bar chart
ggplot(gym_raw, aes(x = level)) +
  geom_bar(aes(fill = level))
#
# bmi histogram
ggplot(gym_raw, aes(x = bmi)) +
  geom_histogram(color = "white", binwidth = 2, boundary = 30)
#