A critique and proposed makeover of the data visualisation created by a research team to understand the willingness of the public on COVID-19 vaccination. The data used by the research team was obtained from Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub hosted at Github (https://github.com/YouGov-Data/covid-19-tracker).


The interactive dashboard created could be found here.

The data use for this visualisation was downloaded from Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub hosted on Github.

From the upload history of the repository, it was observed that the files were last updated at different period ranging from few hours ago to 5 months ago. Since the research is to focus on survey conducted in January 2021, files updated after and before January 2021 are excluded. The data of Australia, Canada, Denmark, France, Germany, Italy, Japan, Netherlands, Norway, Singapore, South Korea, Spain, Sweden, and United Kingdom were used for the visualisation.
Using Tableau Prep Builder, the australia.csv file was first dragged directly from the folder into the Tableau Prep Builder window.

To merge the data files of the other countries, make sure that the files are save in the same location as australia.csv. Thereafter, in Tableau Prep Builder, select the Multiple Files tab > Wildcard union > Apply

From the right panel, it was observed that the data has many variables. Referring to the codebook.xlsx file, identify the variables to be retained and leave them checked. One could either first uncheck all fields then manually check the fields needed, or leave all fields checked and manually uncheck the fields not needed. For this visualisation exercise, the fields selected are: endtime, household_children, age, gender, household_size, employment_status, vac_1, vac2_1, vac2_2, vac2_3, vac2_6, vac_3, employment_status_1, employment_status_2, employment_status_3, employment_status_4, employment_status_5, employment_status_6, employment_status_7, File Paths.

Referring to codebook.xlsx, the original field names are as follow:
| Variables | Original Field Names |
|---|---|
| endtime | When did you complete this question (epoch seconds)? |
| household_children | Number of children under 18 in household |
| age | Age |
| gender | Gender |
| household_size | Number of people in household |
| employment_status | Employment Status |
| vac_1 | If a Covid-19 vaccine were made available to me this week, I would definitely get it |
| vac2_1 | I am worried about getting COVID19 |
| vac2_2 | I am worried about potential side effects of a COVID19 vaccine |
| vac2_3 | I believe government health authorities in my country will provide me with an effective COVID19 vaccine |
| vac2_6 | If I do not get a COVID19 vaccine when it is available, I will regret it |
| vac_3 | If a Covid-19 vaccine becomes available to me a year from now, I definitely intend to get it |
| employment_status_1 | Full time employment |
| employment_status_2 | Part time employment |
| employment_status_3 | Full time student |
| employment_status_4 | Retired |
| employment_status_5 | Unemployed |
| employment_status_6 | Not working |
| employment_status_7 | Other |
| File Paths | File Name (Country) |
Next, click on the “+” sign beside connected data and add a Clean Step to apply the variables selection.

Under Clean 1 step, frequency distributions of each variables could be observed. To ensure that the eventual visualisation is limited to data collected in January 2021 only, the endtime was filtered by clicking on the 3 dots button “…” > Filter > Range of Dates

The minimum date was set at 01/01/2021. Since from the frequency distribution it was observed that there are no data collect beyond January 2021, a maximum date need not be inputted.

It was also observed that the datas sets from several countries have separate fields for respondents to respond to the question on their employment status, i.e. employment_status_1, employment_status_2, employment_status_3, employment_status_4, employment_status_5, employment_status_6, employment_status_7. A calculated field was created to harmonise these fields into one. After clicking on Create Calculated Field…, input the formula shown below to consolidate the different variables for employment status.

File Paths was renamed to Country

Finally, export the data by creating an Output step. Click on the the “+” button beside Clean 1 and select Output.

In the Output, select the preferred savings option and click on Run Flow.

Open up Tableau, click on Text file and locate the Cleaned.csv file exported from Tableau Prep Builder

Minor clean up here before the visualisation
Hide away unnecessary columns, i.e. employment_status_1, employment_status_2, employment_status_3, employment_status_4, employment_status_5, employment_status_6, employment_status_7

Right-click on the Country column, and select Aliases...

Edit the country names - removing “.cvs” and capitalising first letter

To create the diverging stacked bar chart, the vaccine related survey questions, i.e. vac_1, vac2_1, vac2_2, vac2_3, vac2_6, vac_3, are pivoted. Highlight the columns, right-click, and select Pivot.

Next, double-clock on the column names and change it to “Survey Questions” and “Score” as shown.

Right-click on the Survey Questions column and select Aliases... Update the Value of the field names to the actual questions.

Similarly for Score, update the aliases.

Create a calculated field to assign an integer value to each score and name is “Response”.

Create 7 other calculated fields using the formula shown:
| Fields | Formula |
|---|---|
| Number of Records | 1 |
| Total Count | Total(SUM([Number of Records])) |
| Count Negative | IF [Response]<3 THEN 1 ELSEIF [Response]=3 THEN 0.5 ELSE 0 END |
| Total Count Negative | TOTAL(SUM([Count Negative])) |
| Percentage | SUM([Number of Records])/[Total Count] |
| Gantt Start | -[Total Count Negative]/[Total Count] |
| Gantt Percent | PREVIOUS_VALUE([Gantt Start])+ZN(LOOKUP([Percentage],-1)) |
Drag Gantt Percent from the left panel into Columns Country into Rows, and Response into Detail

From the dropdown menu of Gantt Percent, select Compute Using > Response

Next, change the chart type to Gantt Chart, drag from the left panel Response into Color, and Percentage into Size

Double-click on the color palette on the top right corner, and adjust the color appropriately.


Drag Survey Questions into Filter, select All, and follow by Apply.

Right-click on Survey Questions under Filter, and select Show Filter.

On the right panel, click on the the dropdown arrow in the Survey Questions filter and select Single Value (dropdown).

Next, to create the typical age groups used, right-click on Age on the left panel. Then, select the first age then hold down Shift key to select the last age of the age group to be created. Click on Group and input the name of the age group.


Similar groups are also created for Household Children and Household Size. Drag the newly created Household Children (group), Household Size (group), Age (group), along with Employment Status into the Filters. Right-click each of these variables in the Filters and select Show Filter

Since Survey Questions filter options could not be displayed in full if display on the right due to the long text, it was dragged down to the under the graph. Using the similar method described above, the filter selection types of Household Children (group), Household Size (group), Age (group), Employment Status, and were changed to Single Value (dropdown).
Last but not least, on the dropdown of Country field in Rows, and select Sort. Sort By Field in Ascending order, and select % Agree under Field Name.

The first step is to create calculated field as follows. Here both “Agree” and “Strongly Agree”, as well as 50% of “Neutral” are treated equally as “Agree”.
| Field | Formula |
|---|---|
| % Agree% | SUM(IF [Response]=5 OR [Response] = 4 THEN 1 ELSEIF [Response] = 3 THEN 0.5 ELSE 0 END)/ SUM([Number of Records]) |
| Prop_SE | SQRT(([% Agree]*(1-[% Agree]))/SUM([Number of Records])) |
| Z_95% | 1.959964 |
| Z_99% | 2.575829 |
| Prop_Margin of Error 95% | [Z_95%]*[Prop_SE] |
| Prop_Lower Limit 95% | [% Agree]-[Prop_Margin of Error 95%] |
| Prop_Upper Limit 95% | [% Agree]+[Prop_Margin of Error 95%] |
| Prop_Margin of Error 99% | [Z_99%]*[Prop_SE] |
| Prop_Lower Limit 99% | [% Agree]-[Prop_Margin of Error 99%] |
| Prop_Upper Limit 99% | [% Agree]+[Prop_Margin of Error 99%] |
To create the chart, % Agree was first dragged into Columns from the left panel and Country into Rows
Next, create a new parameter with the inputs shown below

Measure Values was then dragged into Columns. Click on the dropdown of Measure Values and select Dual Axis.

Hold down the Ctrl key on the keyboard and select all the fields to be removed from the Measure Values pane. Then drag the highlighted fields out of the pane.

Right-click on the x-axis and select Synchronise Axis

Under the Measure Values setting, change the visualisation type to Line. Drag Measure Names into Path, and another Measure Names from the left panel to Colour

Right-click on the Confidence Interval parameter created and select show Parameter.

Change the colour of the Lower Limit and Upper Limit to grey, and swap the position of Measure Values and % Agree in the Columns.

Lastly, on the dropdown of Country field in Rows, and select Sort. Sort By Field in Ascending order, and select % Agree under Field Name.

In a new dashboard, drag the “Diverging” worksheet into the center. Next, drag the “CI” worksheet onto the right side of the diverging stacked bar chart.

Arrange the Filters and Legend neatly as shown. Since there is sufficient space, the selection type for Employment Status filter was changed to Multiple Values (list).

Since the options of Survey Questions filters are truncated if placed on the side along with the other filters due to the long text, it was dragged to the bottom of the 2 graph.

Insert Text on top of the 2 charts to create a chart title

Finally, right-click on the Household Children (group), Household Size (group), Age (group), Employment Status, and Survey Questions filters > Apply to Worksheets > All Using Related Data Sources to synchronise the filters for both charts. This has to be done individually and manually for all filters mentioned.




Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill at https://rstudio.github.io/distill.
For attribution, please cite this work as
Wong (2021, Feb. 15). Dylan's DataViz Blog: DataViz Makeover 2. Retrieved from https://dylanwong.netlify.app/posts/2021-02-15-dataviz-makeover-2/
BibTeX citation
@misc{wong2021dataviz,
author = {Wong, Dylan},
title = {Dylan's DataViz Blog: DataViz Makeover 2},
url = {https://dylanwong.netlify.app/posts/2021-02-15-dataviz-makeover-2/},
year = {2021}
}