Take-home Exercise 2

This is the second take-home exercise in a series of take-home exercises for the Visual Analytics module.In this exercise, we will learn from peer and remake graphs in a more readable way of city’s demographic data.

Published

May 18, 2022

DOI

1. Overview

The take-home exercise provides students the opportunity to revise and practice the R packages and programming skills we learnt in-class at home. This time,the exercise requires students to be innovative and creative by applying appropriate R packages to design enlightening and yet functional data visualization for analytics purposes. Students are encouraged to create multiple data visualization and compare our pros and cons before finalizing the best design.

2. Getting Started

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('tidyverse','rvest','reshape2','ggtern','ggrepel','ggpubr','stringr','lubridate','ggthemes','ggplot2','webr','ggdist','hrbrthemes')
for(p in packages){
  if(!require(p, character.only =T)){
    install.packages(p)
  }
  library(p, character.only =T)
}  

3. Importing Data

The code chunk below import Participants.csv from the data folder into R by using read_csv() of readr and save it as a tibble data frame called participants.

participants <- read_csv("data/Participants.csv")
glimpse(participants)
Rows: 1,011
Columns: 7
$ participantId  <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ householdSize  <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ haveKids       <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…
$ age            <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2…
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",…
$ interestGroup  <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", …
$ joviality      <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380…

4. Makover Design

4.1 Original version

The original version shows percentage of household size by bar chart.To zoom in, the main idea of this graph is to show up the different percentages of having kids and not having kids among the participants. And the fractions of not having kids is around 30%.

4.2 Formatting the Chart to improve visualisation

To solve the problem of original design, using webr package to build a combination of pie chart and donut plot.

Convert householdSive from numerical to character for further visualization, and make is more readable.

participants$householdSize[which(participants$householdSize == 1)]<- "Size 1"
participants$householdSize[which(participants$householdSize == 2)]<- "Size 2"
participants$householdSize[which(participants$householdSize == 3)]<- "Size 3"
participants$participantId <- as.numeric(participants$participantId)
hh_kid <- participants %>% group_by(householdSize, haveKids) %>% count(haveKids)
print(hh_kid)
# A tibble: 3 × 3
# Groups:   householdSize, haveKids [3]
  householdSize haveKids     n
  <chr>         <lgl>    <int>
1 Size 1        FALSE      337
2 Size 2        FALSE      373
3 Size 3        TRUE       301

PieDonut chart using webr package

PieDonut(hh_kid, aes(haveKids, householdSize, count=n),r0 = 0.7, r1=1.5, r2 = 1.9,explode = 2,
         titlesize = 4,
         title = "No. of People Who Have/Not Have Kids
         in Different Household Size")

4.3 Original Household Distribution

The main idea of original bar chart is to show basic information of household size by count and highlight the proportion of size 2 is the largest one. The observed fuzziness are as below:

4.4 Formatting the Chart to improve visualisation

Descending bar chart using webr package and geom_bar as below:

ggplot(data = participants,
       aes(x=reorder(householdSize, householdSize, function(x)-length(x))))  +
  geom_bar(fill = "steelblue4") +
  ylim(0,400)+
  geom_text(stat="count", 
      aes(label=paste0(..count.., "  (", 
      round(..count../sum(..count..)*100,
            1), "%)")),
      vjust=-1) +
  xlab("Household Size") +
  ylab("No. of\nParticipants") +
  ggtitle("Household size distribution")+
  theme(panel.background= element_blank(), axis.line= element_line(color= 'grey'),
        axis.title.y= element_text(angle=0))

4.5 Original Joviality distribution by different age range

The original design is trying to show up and compare joviality distributions by different age ranges.

4.6 Formatting the Chart to improve visualisation

Distribution by age range using geom_violin and geom_boxplot as below:

ggplot(data=participants,
       aes(y = joviality*100, x= Age_range)) +
  ggtitle("Joviality Distribution by Age range ",subtitle = "Elder people is much unhappy") +
  geom_violin(fill="skyblue",alpha=0.6) +
  geom_boxplot(notch=TRUE, alpha=0.4) +
  xlab("Age Range") +
  ylab("Jovility Score") +
  stat_summary(geom = "point",fun="mean",colour ="black",size=2.7) + 
  theme(panel.background= element_blank(), axis.line= element_line(color= 'grey')) +
  geom_hline(aes(yintercept=mean(joviality*100)),color="red",linetype="dashed",size = 1) +
  geom_text(aes(x=4.3, label="Average", y=55), colour="red",text=element_text(size=5))

4.7 Original Joviality distribution by educaion level

4.8 Formatting the Chart to improve visualisation

Joviality distribution by education using stat_halfeye and geom_boxplot as below:

participants$educationLevel2 <- factor(participants$educationLevel,levels = c("Low", "HighSchoolOrCollege","Bachelors","Graduate" ))
ggplot(participants, 
       aes(x= educationLevel2, y= joviality*100)) +
  stat_halfeye(adjust = .35,
               width = .8,
               color = 'dodgerblue3',
               justification = -.15,
               position = position_nudge(x = .12)) +
  geom_boxplot(width = .12, outlier.color = NA ) +
  geom_hline(aes(yintercept = mean(participants$joviality*100)),
             linetype= 'dashed',color= 'red',size= 0.8) +
  geom_text(aes(x=4.9, label="Average", y=50), 
            colour="red",text=element_text(size=2)) +
  coord_flip() +
   labs(y = 'Joviality Score',x= 'Education Leve',
       title = 'Joviality Distribution in by Education Level') +
  theme(panel.background= element_blank(), axis.line= element_line(color= 'grey'),
        axis.ticks.y = element_blank(),
        panel.grid.major = element_line(size= 0.2, color = "grey"))

5. Learning Points

This makeover take-home exercise provides an excellent opportunity to criticize submission in terms of clarity and aesthetics meanwhile it’s a good chance to learn from peer work. My key takeaways are:

6.Reference

https://isss608-hhhandy.netlify.app/th_ex/th_ex1.html