Dylan's DataViz Blog: DataViz Makeover 2

Dylan Wong

DataViz Makeover 2

A critique and proposed makeover of the data visualisation created by a research team to understand the willingness of the public on COVID-19 vaccination. The data used by the research team was obtained from Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub hosted at Github (https://github.com/YouGov-Data/covid-19-tracker).

Author

Affiliation

Dylan Wong

School of Computing and Information Systems, Singapore Management University

Published

Feb. 14, 2021

Citation

Wong, 2021

Overview

A. Critiques

Clarity

The title of the 100% stacked bar chart on the left is misleading as the visualisation does not show which country is more pro-vaccine clearly. Viewers may not be able to understand intuitively what the surveyees were agreeing or disagreeing to.
While it is pleasant that the right bar chart has been sorted in descending order to allow better reading experience, the order of the countries on the y-axis of both bar charts are not synchronised - the chart on the left is sorted alphabetically. viewers may be confused as to why the 2 charts are placed side by side, and they may second-guess whether the author is trying to make some sort of comparison.
The right bar chart is essentially just the extraction of the percentage of “strongly agree”, which has already been shown in the 100% stacked bar chart on the left. To achieve the same intent, the left bar chart could have been sorted in descending order by the “Strongly agree”, and with the “Strongly agree” bars aligned to the left - against the y-axis. Viewers may be confused by the redundant chart created.
The rationale as to why the focus is only on those who strongly agreed is unclear. “Agree” is also a positive response to the willingness to get vaccinated.
The confidence interval, which is the margin of error that tells reader how much the survey results would be reflective of the view of the overall population, is missing. Viewers should be kept informed.
The legend shows only “strongly agree” and “strongly disagree”. It is up to viewers’ own interpretation of what the other 3 colours are referring to.
The Likert scale used is rather unconventional. The ratings/ score is usually assigned in a progressive manner, i.e. higher rating denotes better or more positive response than the preceding value. The case here is opposite where “1” is assigned to “Strongly Agree” and “5” being assigned to “Strongly Disagree”.

Aesthetics

The word “vaccine” on the title of the left bar chart is misspelt as “vacinne” - not what viewers would expect from the English standard of research scientists from Imperial College London.
It is not clear what the legend title “Vac 1” is referring to.
The numbers on the legend is redundant as what really matters is to explain what each bar colour represents.
The colour scheme used in the left bar chart helps viewers to easily identify the different segments in the stacked bar chart. However, the colour choice could be further improved. For example, red typically denotes negative sentiment while green for positive sentiment. Therefore, the use of red for “not sure” and green for “strongly disagree” is not ideal.
While it is a commendable effort to zoom in on the “strongly agree” with the bar chart on the right to allow the viewers to understand the percentage of the survey population who strongly agree to vaccination, which explains the un-synchronised x-axis scale, it is unfortunate that the x-axis maximum bound ended abruptly at “60%”. As a result, viewers may not be able identify the percentage value for United Kingdom.
Names of countries are proper nouns and so the first letter of the names of countries should always be capitalised.

B. Proposed Improvement and Data Visualisation

Clarity

The chart title should be changed to better reflect the survey question asked. Since there are other questions asked to
If 2 charts of similar y-axis are to be placed side by side, the order of the countries should be synchronised.
Since the bar chart on the right is essentially redundant, replace it with an error bar on dot plot chart to help viewers visualise the uncertainty.
Include the missing legend names for rating 2 to 4, i.e. 2 = Agree, 3 = Neutral, 4 = Disagree

Aesthetics

Visualise the Likert scale survey using diverging stacked bar chart. It helps viewers to better visualise the sentiments or desires of the surveyees with a common vertical baseline.
Recode the names of countries to capitalise the first letter.
Improve the colour scheme to group similar responses together, e.g. different shade of the same colour for “Strongly Agree” and “Agree” to denote similar sentiment.
Remove the legend title as the legend itself is self-explanatory.

Proposed Data Visualisation

C. Data Visualisation in Tableau

The interactive dashboard created could be found here.

D. Preparation of Data Visualisation

1. Data Preparation

The data use for this visualisation was downloaded from Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub hosted on Github.
From the upload history of the repository, it was observed that the files were last updated at different period ranging from few hours ago to 5 months ago. Since the research is to focus on survey conducted in January 2021, files updated after and before January 2021 are excluded. The data of Australia, Canada, Denmark, France, Germany, Italy, Japan, Netherlands, Norway, Singapore, South Korea, Spain, Sweden, and United Kingdom were used for the visualisation.
Using Tableau Prep Builder, the australia.csv file was first dragged directly from the folder into the Tableau Prep Builder window.
To merge the data files of the other countries, make sure that the files are save in the same location as australia.csv. Thereafter, in Tableau Prep Builder, select the Multiple Files tab > Wildcard union > Apply
From the right panel, it was observed that the data has many variables. Referring to the codebook.xlsx file, identify the variables to be retained and leave them checked. One could either first uncheck all fields then manually check the fields needed, or leave all fields checked and manually uncheck the fields not needed. For this visualisation exercise, the fields selected are: endtime, household_children, age, gender, household_size, employment_status, vac_1, vac2_1, vac2_2, vac2_3, vac2_6, vac_3, employment_status_1, employment_status_2, employment_status_3, employment_status_4, employment_status_5, employment_status_6, employment_status_7, File Paths.

Referring to codebook.xlsx, the original field names are as follow:

Variables	Original Field Names
endtime	When did you complete this question (epoch seconds)?
household_children	Number of children under 18 in household
age	Age
gender	Gender
household_size	Number of people in household
employment_status	Employment Status
vac_1	If a Covid-19 vaccine were made available to me this week, I would definitely get it
vac2_1	I am worried about getting COVID19
vac2_2	I am worried about potential side effects of a COVID19 vaccine
vac2_3	I believe government health authorities in my country will provide me with an effective COVID19 vaccine
vac2_6	If I do not get a COVID19 vaccine when it is available, I will regret it
vac_3	If a Covid-19 vaccine becomes available to me a year from now, I definitely intend to get it
employment_status_1	Full time employment
employment_status_2	Part time employment
employment_status_3	Full time student
employment_status_4	Retired
employment_status_5	Unemployed
employment_status_6	Not working
employment_status_7	Other
File Paths	File Name (Country)

Next, click on the “+” sign beside connected data and add a Clean Step to apply the variables selection.
Under Clean 1 step, frequency distributions of each variables could be observed. To ensure that the eventual visualisation is limited to data collected in January 2021 only, the endtime was filtered by clicking on the 3 dots button “…” > Filter > Range of Dates

The minimum date was set at 01/01/2021. Since from the frequency distribution it was observed that there are no data collect beyond January 2021, a maximum date need not be inputted.
It was also observed that the datas sets from several countries have separate fields for respondents to respond to the question on their employment status, i.e. employment_status_1, employment_status_2, employment_status_3, employment_status_4, employment_status_5, employment_status_6, employment_status_7. A calculated field was created to harmonise these fields into one. After clicking on Create Calculated Field…, input the formula shown below to consolidate the different variables for employment status.
File Paths was renamed to Country
Finally, export the data by creating an Output step. Click on the the “+” button beside Clean 1 and select Output.

In the Output, select the preferred savings option and click on Run Flow.

2. Loading of Data

Open up Tableau, click on Text file and locate the Cleaned.csv file exported from Tableau Prep Builder
Minor clean up here before the visualisation

Hide away unnecessary columns, i.e. employment_status_1, employment_status_2, employment_status_3, employment_status_4, employment_status_5, employment_status_6, employment_status_7

Right-click on the Country column, and select Aliases...

Edit the country names - removing “.cvs” and capitalising first letter

3. Creating Diverging Stacked Bar Chart

To create the diverging stacked bar chart, the vaccine related survey questions, i.e. vac_1, vac2_1, vac2_2, vac2_3, vac2_6, vac_3, are pivoted. Highlight the columns, right-click, and select Pivot.
Next, double-clock on the column names and change it to “Survey Questions” and “Score” as shown.
Right-click on the Survey Questions column and select Aliases... Update the Value of the field names to the actual questions.
Similarly for Score, update the aliases.
Create a calculated field to assign an integer value to each score and name is “Response”.

Create 7 other calculated fields using the formula shown:

Fields	Formula
Number of Records	1
Total Count	Total(SUM([Number of Records]))
Count Negative	IF [Response]<3 THEN 1 ELSEIF [Response]=3 THEN 0.5 ELSE 0 END
Total Count Negative	TOTAL(SUM([Count Negative]))
Percentage	SUM([Number of Records])/[Total Count]
Gantt Start	-[Total Count Negative]/[Total Count]
Gantt Percent	PREVIOUS_VALUE([Gantt Start])+ZN(LOOKUP([Percentage],-1))

Drag Gantt Percent from the left panel into Columns Country into Rows, and Response into Detail
From the dropdown menu of Gantt Percent, select Compute Using > Response
Next, change the chart type to Gantt Chart, drag from the left panel Response into Color, and Percentage into Size
Double-click on the color palette on the top right corner, and adjust the color appropriately.
Drag Survey Questions into Filter, select All, and follow by Apply.
Right-click on Survey Questions under Filter, and select Show Filter.
On the right panel, click on the the dropdown arrow in the Survey Questions filter and select Single Value (dropdown).
Next, to create the typical age groups used, right-click on Age on the left panel. Then, select the first age then hold down Shift key to select the last age of the age group to be created. Click on Group and input the name of the age group.
Similar groups are also created for Household Children and Household Size. Drag the newly created Household Children (group), Household Size (group), Age (group), along with Employment Status into the Filters. Right-click each of these variables in the Filters and select Show Filter
Since Survey Questions filter options could not be displayed in full if display on the right due to the long text, it was dragged down to the under the graph. Using the similar method described above, the filter selection types of Household Children (group), Household Size (group), Age (group), Employment Status, and were changed to Single Value (dropdown).
Last but not least, on the dropdown of Country field in Rows, and select Sort. Sort By Field in Ascending order, and select % Agree under Field Name.

4. Creating Error Bar on Dot Plot Chart

The first step is to create calculated field as follows. Here both “Agree” and “Strongly Agree”, as well as 50% of “Neutral” are treated equally as “Agree”.

Field	Formula
% Agree%	SUM(IF [Response]=5 OR [Response] = 4 THEN 1 ELSEIF [Response] = 3 THEN 0.5 ELSE 0 END)/ SUM([Number of Records])
Prop_SE	SQRT(([% Agree]*(1-[% Agree]))/SUM([Number of Records]))
Z_95%	1.959964
Z_99%	2.575829
Prop_Margin of Error 95%	[Z_95%]*[Prop_SE]
Prop_Lower Limit 95%	[% Agree]-[Prop_Margin of Error 95%]
Prop_Upper Limit 95%	[% Agree]+[Prop_Margin of Error 95%]
Prop_Margin of Error 99%	[Z_99%]*[Prop_SE]
Prop_Lower Limit 99%	[% Agree]-[Prop_Margin of Error 99%]
Prop_Upper Limit 99%	[% Agree]+[Prop_Margin of Error 99%]

To create the chart, % Agree was first dragged into Columns from the left panel and Country into Rows
Next, create a new parameter with the inputs shown below
Measure Values was then dragged into Columns. Click on the dropdown of Measure Values and select Dual Axis.
Hold down the Ctrl key on the keyboard and select all the fields to be removed from the Measure Values pane. Then drag the highlighted fields out of the pane.
Right-click on the x-axis and select Synchronise Axis
Under the Measure Values setting, change the visualisation type to Line. Drag Measure Names into Path, and another Measure Names from the left panel to Colour
Right-click on the Confidence Interval parameter created and select show Parameter.
Change the colour of the Lower Limit and Upper Limit to grey, and swap the position of Measure Values and % Agree in the Columns.
Lastly, on the dropdown of Country field in Rows, and select Sort. Sort By Field in Ascending order, and select % Agree under Field Name.

5. Creating the Dashboard

In a new dashboard, drag the “Diverging” worksheet into the center. Next, drag the “CI” worksheet onto the right side of the diverging stacked bar chart.
Arrange the Filters and Legend neatly as shown. Since there is sufficient space, the selection type for Employment Status filter was changed to Multiple Values (list).
Since the options of Survey Questions filters are truncated if placed on the side along with the other filters due to the long text, it was dragged to the bottom of the 2 graph.
Insert Text on top of the 2 charts to create a chart title
Finally, right-click on the Household Children (group), Household Size (group), Age (group), Employment Status, and Survey Questions filters > Apply to Worksheets > All Using Related Data Sources to synchronise the filters for both charts. This has to be done individually and manually for all filters mentioned.

E. Observations

From the comparison of the 2 survey responses above, i.e. “If a Covid-19 vaccine were made available to me this week, I would definitely get it” and “I am worried about getting COVID19”, it could be observed that a number of countries are in dilemma. For example, while Japan worries the most about contracting COVID-19, they are also worried about getting vaccinated in both near and far future. In fact, they are extremely worried of the potential side effect of the vaccine that and the trust in government as shown. This is rather surprising as Japan was never known to be more “kiasi” or worrisome than Singapore, but from the data it is clear that Japan is the worst. Not only are they afraid of COVID-19, they are also afraid of being affected by the potential side effects to the point that they will not consider getting vaccinated 1 year later. Another country with such observation is France. It is a relief that while Singaporeans are afraid of COVID-19 and the vaccine, at least they have a government that they trust (backed by data).

It could be observed that United Kingdom, Denmark, and Israel are very enthusiastic about getting vaccinated. This is due to the fact that they have government who they trust

Globally, there are less margin error in the responses of those who are full time employed and retired for all questions. The margin of error observed on those who are not working, unemployed, and part time employed are wide. This shows the respondents who are working or have worked are mostly consistent in their responses and their responses are much more representative of the real population.

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.

Citation

For attribution, please cite this work as

Wong (2021, Feb. 15). Dylan's DataViz Blog: DataViz Makeover 2. Retrieved from https://dylanwong.netlify.app/posts/2021-02-15-dataviz-makeover-2/

BibTeX citation

@misc{wong2021dataviz,
  author = {Wong, Dylan},
  title = {Dylan's DataViz Blog: DataViz Makeover 2},
  url = {https://dylanwong.netlify.app/posts/2021-02-15-dataviz-makeover-2/},
  year = {2021}
}

DataViz Makeover 2

Author

Affiliation

Published

Citation

Overview

A. Critiques

Clarity

Aesthetics

B. Proposed Improvement and Data Visualisation

Clarity

Aesthetics

Proposed Data Visualisation

C. Data Visualisation in Tableau

D. Preparation of Data Visualisation

1. Data Preparation

2. Loading of Data

3. Creating Diverging Stacked Bar Chart

4. Creating Error Bar on Dot Plot Chart

5. Creating the Dashboard

E. Observations

Footnotes

Citation