Fallacies in the Data Visualization Process
Data Visualization workshop
Antwerp Spring Academy, 11 June 2015
Marijn Koolen
University of Amsterdam
Overview
- Understanding Process
- where can errors/problems occur?
- Understanding Experimentation
- how to detect errors/problems?
Understanding Process
Fallacy of Irrelevance?
Step 0: Asking Questions
“The most important part of understanding data is identifying the question that you want to answer. Rather than thinking about the data that was collected, think about how it will be used and work backward to what was collected.” (Fry, 2007)
- What story should it tell?
Procedural Model
- Procedure for Visualization
- systematic way of thinking through process
- Helps one thinking about data, method and goal
- Fallacies can occur in each step
Step 1: Acquiring
- Data criticism: what’s in, what’s missing
- Data set may be contingent, incomplete, biased, ...
Step 2: Parsing
- Preparation criticism: what’s impact of e.g. systematic errors
- Parsing procedure determines units of analysis
Step 3: Filtering
- Remove what is not needed
- Remove until the key message jumps out
- but leave enough to understand/contextualise
- Visualization should be made as simple as possible, but no simpler
- London Tube Map
Step 3: Filtering
- Remove what is not needed
- Remove until the key message jumps out
- but leave enough to understand/contextualise
- Visualization should be made as simple as possible, but no simpler
- London Tube Map
- Distortion can be useful!
Step 4: Mining
- Aggregate/summarise:
- show patterns of interest (mean, distribution, ...)
- Don’t over-simplify
- e.g. show only mean while distribution is important
- First observed pattern is not most interesting
Step 5: Representing
- Size of effect in data vs. size of effect in visualization
- Going against convention: Gun deaths in Florida (source)
- Explain representation: provide legend, caption (explain units, scales, etc)
Step 6: Refining
- Calling attention to particular data through color, hierarchy, zoom, ...
- Focus should not hide relevant context
Step 7: Interacting
- Adding interaction allows exploring data
- New York Times Guantanamo Docket
- interactivity allows exploration of data and relations
Non-Linear Process
Understanding Experimentation
- Joanna Guldi could have discovered the problem with strolling through experimenting
- try out different parameters, filters, representations
- Each step (but especailly mining) involves many choices
- try multiple options, explore stability and validity of pattern
Criticizing Visualizations
- A beautiful fallacy in criticizing:
Criticizing Visualizations
- A beautiful fallacy in criticizing:
- If you criticize a visualization, at least look at it!
Wrap Up
- Understanding process helps avoiding and locating mistakes
- 7-step process
- but think about the why
- Experiment and consider alternatives at each step