Generated by Bing Image Creator: wine red colored kochia field under blue sky and white cloud, fine photo
This post is following of above post. In the above post, I imported Kaggle's Gym Members Exercise Dataset into R.
In this post, let's do what we call EDA(Exploratory Data Analysis).
We have 973 observations and 15 variables, age, gender, weight, height, maxbpm, avgbpm, restbpm, hours, calories, type, fatpct, water, days, level and bmi.
Let's visualize each variables.
I start with "age".
age seems uniform distribution.
Male has more observations, but there is not large difference.
weight seems normal distribution with right side tale long.
height seems normal distribution
maxbpm seems uniform distribution.
avgpbm seems uniform distribution.
restbpm also seems uniform distribution.
hours seems have three groups, less than 1.0, between 1.0 to 1.5, greater than 1.5.
calories seems normal distribution.
There are four types, they are about same observations.
water seems normal distribution.
3 days and 4 days are majority.
level 2 has the most observation.
bmi seems normal distribution with long right tail.
All right, now I have some sense of variables. There are not peculiar values in our dataset.
That's it. Thank you!
Next post is
To read from the first post,
This post's code is below.
#
# age histogram
ggplot(gym_raw, aes(x = age)) +
geom_histogram(color = "white", binwidth = 5, boundary = 20)
#
# gender bar graph
ggplot(gym_raw, aes(x = gender)) +
geom_bar(aes(fill = gender))
#
# weight histogram
ggplot(gym_raw, aes(x = weight)) +
geom_histogram(color = "white")
#
# height histogram
ggplot(gym_raw, aes(x = height)) +
geom_histogram(color = "white", binwidth = 0.025, boundary = 1.8)
#
# maxbpm histogram
ggplot(gym_raw, aes(x = maxbpm)) +
geom_histogram(color = "white", binwidth = 1, boundary = 200)
#
# avgbpm histogram
ggplot(gym_raw, aes(x = avgbpm)) +
geom_histogram(color = "white", binwidth = 1, boundary = 160)
#
# restbpm
ggplot(gym_raw, aes(x = restbpm)) +
geom_histogram(color = "white", binwidth = 1, boundary = 120)
#
# hours histogram
ggplot(gym_raw, aes(x = hours)) +
geom_histogram(color = "white", binwidth = 0.1, boundary = 0.5)
#
# calories histogram
ggplot(gym_raw, aes(x = calories)) +
geom_histogram(color = "white", binwidth = 50, boundary = 1000)
#
# type bar chart
ggplot(gym_raw, aes(x = type)) +
geom_bar(aes(fill = type))
#
# fatpct histogram
ggplot(gym_raw, aes(x = fatpct)) +
geom_histogram(color = "white", binwidth = 1, boundary = 25)
#
# water histogram
ggplot(gym_raw, aes(x = water)) +
geom_histogram(color = "white", binwidth = 0.125, boundary = 2.5)
#
# days bar chart
ggplot(gym_raw, aes(x = days)) +
geom_bar()
#
# level bar chart
ggplot(gym_raw, aes(x = level)) +
geom_bar(aes(fill = level))
#
# bmi histogram
ggplot(gym_raw, aes(x = bmi)) +
geom_histogram(color = "white", binwidth = 2, boundary = 30)
#