Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Visualization

Data Visualization

Lecture slides 2020-2021

Georges Hattab

April 04, 2020
Tweet

More Decks by Georges Hattab

Other Decks in Education

Transcript

  1. Data Visualization Dr Georges Hattab Junior group leader and Head

    of the Bioinformatics division Department of Mathematics & Computer Science
  2. Why data visualization? • Unprecedented amount of data: 2.5 quintillion

    bytes/day • Efficient exploration • Effective communication • Integral aspects of scientific communication • Challenge to benefit from it without being overwhelmed • 65% population are visual learners • 90% information transmitted to the brain is visual • 70% sensory receptors in the eyes • 50% of the brain is dedicated to visual processing 3
  3. Problem • The value and utility of this popular form

    of communication remains unclear although there’s a growing appetite for the visual display of information • Clear objectives are needed to drive design decisions • Assess utility and practicality of visualizations • What do researchers want to and need to see in the data? • Which computational approaches and visual encodings will best bring out the trends? 4 Continual refinement of design decisions to meet research objectives
  4. Task specificity Miele et al. 2019. Nine quick tips for

    analyzing network data. https://doi.org/10.1371/journal.pcbi.1007434 5
  5. Difficulty of validation • Solution: Use methods from different fields

    at each level 6 Nested Model of Visualization Design and Validation. Tamara Munzner. IEEE TVCG 2009.
  6. Textbook references • Visualization Analysis and Design. Tamara Munzner •

    Points of View article series. Nature Methods. • Data visualization handbook. Juuso Koponen and Jonatam Hildén 7
  7. Pointers before design • The story: Which story do you

    want to tell and how do you want to tell it? • The overview figure: How do you clarify concepts and quickly understand the overall idea? 12 Original image from REUTERS/Simon Scarr, Marco Hernandez.
  8. The story 13 Separation: A hero ventures forth from the

    world of common day into a region of supernatural wonder. Initiation: Fabulous forces are there encountered and a decisive victory is won. Return: The hero comes back from this mysterious adventure with the power to bestow boons on his fellow man. Info we trust, RJ Andrews. Joseph Campbell, 1949
  9. The overview figure Portrays discrete yet connected steps or states

    Accounts for all graphical elements to follow change Relationships and Hierarchy Compact and economical for public understanding 19
  10. The overview figure 22 Wong, B. Nat Methods 8, 365

    (2011) doi:10.1038/nmeth0511-365. Lieberman-Aiden et al. Science 326, 289–293 (2009).
  11. … or quite a bit serious 26 • 1 million

    seconds equal 11 and 1/2 days . • 1 billion seconds equal 31 and 3/4 years . • 1 trillion seconds equal 31 710 years.
  12. Data visualization • Enable people to explore and explain data

    through human visual abilities to recognize patterns • Data visualization transforms information into a visual form • This process requires skills from engineering, statistics, graphic design, and other disciplines 28 IBM Design
  13. The grid system • organizes content • improves the design

    process • organizes typography • makes easy collaboration • helps create balanced compositions • is very flexible 35 Thomas Gaskin The 892 unique ways to partition a 3 × 4 grid This poster illustrates a change in design practice. Compu- tation-based design—that is, the use of algorithms to compute options—is becoming more practical and more common. Design tools are becoming more computation- based; designers are working more closely with program- mers; and designers are taking up programming. Above, you see the 892 unique ways to partition a 3 × 4 grid into unit rectangles. For many years, designers have used grids to unify diverse sets of content in books, magazines, screens, and other environments. The 3 × 4 grid is a com- mon example. Yet even in this simple case, generating all the options has—until now—been almost impossible. Patch Kessler designed algorithms to generate all the possible variations, identify unique ones, and sort them— not only for 3 × 4 grids but also for any n × m grid. He instantiated the algorithms in a MATLAB program, which output PDFs, which Thomas Gaskin imported into Adobe Illustrator to design the poster. Rules for generating variations The rule system that generated the variations in the poster was suggested by Bill Drenttel and Jessica Helfand who noted its relationship to the tatami mat system used in Japanese buildings for 1300 years or more. In 2006, Drenttel and Helfand obtained U.S. Patent 7124360 on this grid system—“Method and system for computer screen lay- out based on recombinant geometric modular structure” . The tatami system uses 1 × 2 rectangles. Within a 3 × 4 grid, 1 × 2 rectangles can be arranged in 5 ways. They appear at the end of section 6. Unit rectangles (1 × 1, 1 × 2, 1 × 3, 1 × 4; 2 × 2, 2 × 3, 2 × 4; 3 × 3, 3 × 4) can be arranged in a 3 × 4 grid in 3,164 ways. Many are almost the same—mirrored or rotated versions of the same configuration. The poster includes only unique variations—one version from each mirror or rotation group. Colors indicate the type and number of related non-unique variations. The variations shown in black have 3 related versions; blue, green, and orange have 1 related version; and magenta variations are unique, because mirroring and rotating yields the original, thus no other versions. (See the table to the right for examples.) Rules for sorting The poster groups variations according to the number of non-overlapping rectangles. The large figures indicate the beginning of each group. The sequence begins in the upper left and proceeds from left to right and top to bottom. Each group is further divided into sub-groups sharing the same set of elements. The sub-groups are arranged according to the size of their largest element from largest to smallest. Squares precede rectangles of the same area; horizontals precede verticals of the same dimensions. Within sub- groups, variations are arranged according to the position of the largest element, preceding from left to right and top to bottom. Variations themselves are oriented so that the largest rectangle is in the top left. Black dots separate groups by size. Gray dots separate groups by orientation. Where to learn more Grids have been described in design literature for at least 50 years. French architect Le Corbusier describes grid systems in his 1946 book, Le Modulor. Swiss graphic designer Karl Gerstner describes a number of grid systems or “programmes” in his 1964 book, Designing Programmes. The classic work on grids for graphic designers is Josef Muller-Brockman’s 1981 book, Grid Systems. Patch Kessler explores the mathematical underpinnings of grid generation in his paper “Arranging Rectangles” . www.mechanicaldust.com/Documents/Partitions_05.pdf Thomas Gaskin has created an interactive tool for viewing variations and generating HTML. www.3x4grid.com Design: Thomas Gaskin Creative Direction: Hugh Dubberly Algorithms: Patrick Kessler Patent: William Drenttel + Jessica Helfand Copyright © 2011 Dubberly Design Office 2501 Harrison Street, #7 San Francisco, CA 94110 415 648 9799 26 × Magenta All three symmetries combined Unchanged by horizontal reflection, vertical reflection, or 180º rotation. 26 × Green Rotational symmetry Changed by horizontal and vertical reflection. 61 × Blue Top-bottom symmetry Changed by horizontal reflection and 180º rotation. 76 × Orange Left-right symmetry Changed by vertical reflection and 180º rotation. 703 × Black Asymmetric Changed by horizontal reflection, vertical reflection, and 180º rotation. Original Horizontal Reflection Vertical Reflection 180º Rotation R R R R R R R 3 10 of 4 33 of 5 90 of 7 232 of 8 201 of 9 105 of 10 35 of 11 6 of 12 1 of 2 3 of 1 1 of 6 175 of 3 × 4’s 3 × 3’s 3 × 3’s 3 × 3’s 2 × 4’s 2 × 4’s 2 × 4’s 2 × 4’s 2 × 3’s 2 × 3’s 2 × 3’s 2 × 3’s 2 × 3’s 2 × 3’s 1 × 4’s 1 × 4’s 1 × 4’s 1 × 4’s 1 × 4’s 1 × 4’s 1 × 4’s 2 × 2’s 2 × 2’s 2 × 2’s 2 × 2’s 2 × 2’s 2 × 2’s 2 × 2’s 1 × 3’s 1 × 3’s 1 × 3’s 1 × 3’s 1 × 3’s 1 × 3’s 1 × 2’s 1 × 2’s 1 × 2’s 1 × 2’s 1 × 2’s 1 × 2’s 1 × 1’s
  14. The grid system 38 Kharchenko, P., Alekseyenko, A., Schwartz, Y.

    et al. Nature 471, 480–485 (2011) doi:10.1038/nature09725
  15. … or the journey of our eyes 40 Wong. Nat

    Methods 8, 783 (2011) doi:10.1038/nmeth.1711
  16. 41 … or the journey of our eyes Wong. Nat

    Methods 8, 783 (2011) doi:10.1038/nmeth.1711
  17. Property that depends on the relationship of one object to

    other objects on a display Salience to relevance 42
  18. Salience • Salience is the physical property that sets an

    object apart from its surroundings • It should align with relevance in visuals used for presentations • information encoding needs to be efficient because the audience is expected to simultaneously listen and read 43 Wong. Nat Methods 8, 889 (2011) doi:10.1038/nmeth.1762
  19. Tips • create salience by using: color, shape, size, position

    • easier to see information that is presented physically larger • elements at a diagonal stand out when all others are oriented vertically and horizontally • on a black and white backdrop of elements, colored information is attractive • salience of unintentional assignment can be very harmful to communicate a clear message 47
  20. Whitespace Mori Kansai (1814-1894), Rabbits, 1881 Gaps between text blocks

    The term stems from the printing practice which white paper is generally used. Margins and gaps that separate blocks of text make it easier to access written material because they provide a visual structure. Well-planned negative space balances the positive (nonwhite) space and is key to aesthetic. and images 50
  21. Congested environments 52 Siegenthaler et al. (2019) PLoS Biol 17(8):

    e3000400. https://doi.org/10.1371/journal.pbio.3000400
  22. Congested environments 53 Kharchenko, P., Alekseyenko, A., Schwartz, Y. et

    al. Nature 471, 480–485 (2011) doi:10.1038/nature09725 Chromatin annotation of the Drosophila melanogaster genome
  23. Example solution 54 Chromatin annotation of the Drosophila melanogaster genome

    Kharchenko, P., Alekseyenko, A., Schwartz, Y. et al. Nature 471, 480–485 (2011) doi:10.1038/nature09725
  24. Visual structure 56 A B C D Wong. Nat Methods

    7, 863 (2010) doi:10.1038/nmeth1110-863
  25. Similarity, proximity, grouping, etc 57 Wong. Nat Methods 7, 863

    (2010) doi:10.1038/nmeth1110-863 c B F N U ABCDEFGH
  26. Unified compositions • Graphics and text used as vertices and

    edges of geometric shapes 
 • Geometric and curvilinear shapes used as flexible guides to align content. 59 Wong. Nat Methods 7, 941 (2010) doi:10.1038/nmeth1210-941
  27. Patterns • Our eyes see patterns everywhere • We can

    tell when things look the same or different • When are combined, we see something new 60
  28. Summary We can see … 
 • patterns with things

    that have the same shape or color 
 • that things belong together when they are close to one another 
 61
  29. Summary 62 We can see … 
 • that things

    belong together when they move the same way • that things belong together when they are close to one another 

  30. Summary 63 We can see … 
 • and enjoy

    patterns that are neat and even • shapes even when part of the shape is missing
  31. We can see … 
 • shapes where things aren’t!

    Summary 64 • when our eyes and brain work together, there is really no limit to what we can make!
  32. Conclusion • From bits to larger units • Structure gives

    meaning • Perceptual organization based on principles • Principles: Similarity, proximity, connection, enclosure • Structure helps us draw correlations between visual elements 65 Hattab et al. Info+ conference. (2016)
  33. Figure creation • apply principles of effective written communication •

    leverage our training and experience with words • make the process structured and reproducible • assess and optimize each part of a figure “Do not take shortcuts at the expense of clarity” 
 — Strung and White’s dictum Krzywinski. Nat Methods 10, 371 (2013) doi:10.1038/nmeth.2444 67
  34. Problematic constructs “Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo”

    Pinker, S. The Language Instinct. (1994) W. Morrow, New York 68 Krzywinski. Nat Methods 10, 371 (2013) doi:10.1038/nmeth.2444
  35. Important tips • avoid overwriting because “rich ornate prose is

    hard to digest, generally unwholesome, and sometimes nauseating” • the visual equivalent is “chartjunk” as coined by Tufte • visually garnished elements shout at the reader • if you don’t write it this way don’t draw it then • don’t overwhelm the reader • simple shapes provide an elegant representation • use color sparingly Krzywinski. Nat Methods 10, 371 (2013) doi:10.1038/nmeth.2444 69
  36. Why stories? • capacity to delight and surprise • spark

    creativity by making meaningful connections between data and the ideas, interests and lives of the reader • stories may contain vexing questions, conflict, dead ends, insights and occasional thrilling leaps • When you see these indicators the story is well told 72
  37. The story 73 Separation: A hero ventures forth from the

    world of common day into a region of supernatural wonder. Initiation: Fabulous forces are there encountered and a decisive victory is won. Return: The hero comes back from this mysterious adventure with the power to bestow boons on his fellow man. Info we trust, RJ Andrews. Joseph Campbell, 1949
  38. Maintaining focus • Leave out detail that does not advance

    the plot • Distinguish necessary detail from superfluous detail • Do not show everything • Provide context and support for your story but stay on track • Make use of clever visual elements to help your readers • Consider: What would the headline of your story be? 74
  39. • The first two panels of the figure provide the

    background necessary for this plot twist to be appreciated 
 • The vertical scale is chosen to accentuate the similarity of the death rates for males due to cancer in aggregate and to lung cancer in panels 2 and 3. • 
 In short, it’s a good story!
  40. Some more tips Be sure to • use multiple panels

    for the flow • use colloquial language when addressing a large audience • not use the complete data • rely on a visual guideline to maintain focus • use coherent visual elements • avoid color unless it is necessary Visually dull or accentuate • axes and grids to maintain focus on data trends • qualitative and quantitative aspects but always be accurate • the context (e.g., panel 4 compares adult vs youth rates) • style to meet the journal or publisher style requirements 77
  41. Design process is next A good figure, like good writing,

    doesn't simply happen—it is crafted. “Revise and rewrite” becomes “revise and redraw”. 78 Search for and find the design Refine Iterate Enjoy life!
  42. Why the word Design? • Design is a requirement not

    a cosmetic addition • Design is all around us • Industrial design is for objects you use • Graphic design is for designs you read • Well designed objects and figures provide visible clues to their underlying function • Is interaction important? • How easy to use is the provided functionality? 80 Wong, B. The design process. Nat Methods 8, 987 (2011). https://doi.org/10.1038/nmeth.1783
  43. Example overview figure 82 • Represents a catalog of gene

    expression data from human cells treated with chemical and genetic reagents • Accentuate the steps with a mountain between ‘sample preparation’ and ‘data analysis’ (placed at 8:13) • Differentiate steps with color and find the physical location in the institute where the work is carried out • high contrast headings for 4 major features of the poster
  44. Motivation • Computer-based visualization systems provide visual representations of datasets

    designed to help people carry out tasks more effectively 
 • Visualization or VIS is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods 83 [Marie Neurath. Too Small To See. 1956]
  45. Nested model of visualization • domain situation: 
 - who

    are the target users? • abstraction: translate from specifics of domain to vocabulary of vis 
 - what is shown? data abstraction 
 - why is the user looking at it? task abstraction • idiom 
 - how is it shown? 
 + visual encoding idiom: how to draw 
 + interaction idiom: how to manipulate • algorithm for efficient computation alization design algorithm idiom abstraction domain Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009) 84
  46. Problem of validation Why is validation difficult? • different ways

    to get it wrong at each level 4 Domain situation You misunderstood their needs You’re showing them the wrong thing Visual encoding/interaction idiom The way you show it doesn’t work Algorithm Your code is too slow Data/task abstraction [A Nested Model of Visualization Design and Validation. Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009). ] 85 Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009)
  47. Evaluation • Methods from many fields, qualitative & quantitative 


    - Controlled experiments in lab, field studies of deployed systems Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009) computer science design cognitive psychology anthropology/ 
 ethnography anthropology/ 
 ethnography Domain situation Observe target users using existing tools Visual encoding/interaction idiom Justify design with respect to alternatives Algorithm Measure system time/memory Analyze computational complexity Observe target users after deployment ( ) Measure adoption Analyze results qualitatively Measure human time with lab experiment (lab study) Data/task abstraction 86
  48. Why represent all the data? Summaries lose information, details matter

    – confirm expected and find unexpected patterns – assess validity of statistical model Identical statistics x mean 9 x variance 10 y mean 7,5 y variance 3,75 x/y correlation 0,816 Anscombe’s Quartet X Y X Y X Y X Y Mean Variance Correlation 87
  49. Why analyze? • imposes a structure on huge design space

    – scaffold to help you think systematically about choices – analyzing existing as stepping stone to designing new Present Locate Identify Path between two nodes Actions Targets SpaceTree TreeJuxtaposer Encode Navigate Select Filter Aggregate Tree Arrange Why? What? How? Encode Navigate Select 88 Tamara Munzner. Visualization Analysis and Design
  50. Examples of How? 89 SpaceTree: Supporting Exploration in Large Node

    Link Tree, Design Evolution and Empirical Evaluation. Grosjean, Plaisant, and Bederson. Proc. InfoVis 2002, p 57–64. SpaceTree Tamara Munzner. Visualization Analysis and Design ntify SpaceTree TreeJuxtaposer Encode Navigate Select Filter Aggregate Arrange How? Encode Navigate Select
  51. Examples of How? TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context With

    Guaranteed Visibility. ACM Trans. on Graphics (Proc. SIGGRAPH) 22:453– 462, 2003. TreeJuxtaposer 90 Tamara Munzner. Visualization Analysis and Design ntify SpaceTree TreeJuxtaposer Encode Navigate Select Filter Aggregate Arrange How? Encode Navigate Select
  52. What to analyze? 91 Why? How? What? Dataset Types Dataset

    Availability Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Ordering Direction Sequential Diverging Cyclic Grid of positions Position Datasets What? Attributes Dataset Types Data Types Data and Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Fields (Continuous) Attributes (columns) Cell Items Attributes Links Positions Grids Attribute Types Ordering Direction Categorical Ordered Ordinal Quantitative Sequential Diverging Tables Networks & Trees Fields Geometry Clusters, Sets, Lists Items Attributes Items (nodes) Links Attributes Grids Positions Attributes Items Positions Items Grid of positions Datasets What? Attributes Dataset Types Data Types Data and Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Fields (Continuous) Attributes (columns) Cell Items Attributes Links Positions Grids Attribute Types Ordering Direction Categorical Ordered Ordinal Quantitative Sequential Diverging Tables Networks & Trees Fields Geometry Clusters, Sets, Lists Items Attributes Items (nodes) Links Attributes Grids Positions Attributes Items Positions Items Grid of positions Datasets What? Attributes Dataset Types Data Types Data and Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Fields (Continuous) Attributes (columns) Cell Items Attributes Links Positions Grids Attribute Types Ordering Direction Categorical Ordered Ordinal Quantitative Sequential Diverging Tables Networks & Trees Fields Geometry Clusters, Sets, Lists Items Attributes Items (nodes) Links Attributes Grids Positions Attributes Items Positions Items Grid of positions Tamara Munzner. Visualization Analysis and Design
  53. What to analyze? 92 Why? What? Dataset Types Dataset Availability

    Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Ordering Direction Quantitative Sequential Diverging Cyclic Items Attributes Items (nodes) Links Attributes Grids Positions Attributes Items Positions Items Grid of positions Position Why? How? What? Dataset Types Dataset Availability Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Ordering Direction Sequential Diverging Cyclic Grid of positions Position Dataset Availability Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Position Why? What? Dataset Types Dataset Availability Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Ordering Direction Quantitative Sequential Diverging Cyclic Items Attributes Items (nodes) Links Attributes Grids Positions Attributes Items Positions Items Grid of positions Position Tamara Munzner. Visualization Analysis and Design
  54. Types: Datasets and data Tables Attributes (columns) Items (rows) Cell

    containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Geometry (Spatial) Position Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Geometry (Spatial) Position Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Geometry (Spatial) Position Dataset Types Spatial Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Ge Dataset Types Attributes Attribute Types Ordering Direction Categorical Ordered Ordinal Quantitative Sequential Diverging Cyclic 93 Why? How? What? Dataset Types Dataset Availability Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Ordering Direction Sequential Diverging Cyclic Grid of positions Position Tamara Munzner. Visualization Analysis and Design
  55. Types: Datasets and data Tables Attributes (columns) Items (rows) Cell

    containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Geometry (Spatial) Position Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Geometry (Spatial) Position Dataset Types Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Geometry (Spatial) Position Dataset Types Spatial Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Grid of positions Ge Dataset Types Attributes Attribute Types Ordering Direction Categorical Ordered Ordinal Quantitative Sequential Diverging Cyclic 94 Why? How? What? Dataset Types Dataset Availability Static Dynamic Tables Attributes (columns) Items (rows) Cell containing value Networks Link Node (item) Trees Fields (Continuous) Geometry (Spatial) Attributes (columns) Value in cell Cell Multidimensional Table Value in cell Ordering Direction Sequential Diverging Cyclic Grid of positions Position Tamara Munzner. Visualization Analysis and Design
  56. Trends Actions Analyze Search Query Why? All Data Outliers Features

    Attributes One Many Distribution Dependency Correlation Similarity Network Data Topology Paths Extremes Consume Present Enjoy Discover Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Targets What? Why analyze data? 95 Tamara Munzner. Visualization Analysis and Design Search Query At Ne Sp Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Trends Actions Analyze Search Query Why? All Data Outliers Features Attributes One Many Distribution Dependency Correlation Similarity Network Data Topology Paths Extremes Consume Present Enjoy Discover Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Targets What?
  57. Trends Actions Analyze Search Query Why? All Data Outliers Features

    Attributes One Many Distribution Dependency Correlation Similarity Network Data Topology Paths Extremes Consume Present Enjoy Discover Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Targets What? Why analyze data? 96 Tamara Munzner. Visualization Analysis and Design Trends Why? All Data Outliers Features Attributes One Many Distribution Dependency Correlation Similarity Network Data Topology Paths Extremes Targets What? Search Query Attributes One Many Distribution Dependency Correlation Sim Network Data Spatial Data Shape Topology Paths Extremes Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Why How What Trends Why? All Data Outliers Features Attributes One Many Distribution Dependency Correlation Similarity Network Data Topology Paths Extremes Targets What?
  58. 1. Action: Analyze • consume – discover vs present •

    classic split • aka explore vs explain – enjoy • newcomer • aka casual, social • produce – annotate, record – derive • crucial design choice Analyze Search Query Consume Present Enjoy Discover Produce Annotate Record Derive tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Actions 97 Tamara Munzner. Visualization Analysis and Design Analyze Search Query Consume Present Enjoy Discover Produce Annotate Record Derive tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Actions
  59. 2. Action: Search • what does user know? – target,

    location 98 Search Query Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Tamara Munzner. Visualization Analysis and Design
  60. 3. Action: Query • what does user know? – target,

    location • how much of the data matters? – one, some, all • analyze, search, query – independent choices for each 99 Search Query Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Tamara Munzner. Visualization Analysis and Design Search Query Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore
  61. Targets 100 Trends Why? All Data Outliers Features Attributes One

    Many Distribution Dependency Correlation Similarity Network Data Topology Paths Extremes Targets What? Search Query One Many Distribution Dependency Correla Network Data Spatial Data Shape Topology Paths Extremes Produce Annotate Record Derive Identify Compare Summarize tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Tamara Munzner. Visualization Analysis and Design
  62. How to visualize? 101 Tamara Munzner. Visualization Analysis and Design

    How? Encode Manipulate Facet Reduce Arrange Map Change Select Navigate Express Separate Order Align Use Juxtapose Partition Superimpose Filter Aggregate Embed Color Size, Angle, Curvature, ... Hue Saturation Luminance from categorical and ordered attributes How? Encode Manipulate Facet Reduce Arrange Map Change Select Navigate Express Separate Order Align Use Juxtapose Partition Superimpose Filter Aggregate Embed Color Size, Angle, Curvature, ... Hue Saturation Luminance from categorical and ordered attributes Map Select Navigate Order Align Use Pa Su Color Motion Size, Angle, Curvature, ... Hue Saturation Luminance Shape Direction, Rate, Frequency, ... from categorical and ordered attributes
  63. How to visualize? How? Manipulate Facet Reduce Change Select Navigate

    Juxtapose Partition Superimpose Filter Aggregate Embed e 102 Tamara Munzner. Visualization Analysis and Design How? Encode Manipulate Facet Reduce Arrange Map Change Select Navigate Express Separate Order Align Use Juxtapose Partition Superimpose Filter Aggregate Embed Color Size, Angle, Curvature, ... Hue Saturation Luminance from categorical and ordered attributes How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed e How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed e
  64. How to visually encode information? • analyze idiom structure Tamara

    Munzner. Visualization Analysis and Design 103
  65. Definition: Marks and channels Tamara Munzner. Visualization Analysis and Design

    104 • marks – geometric primitives • channels – control appearance of marks Horizontal Position Vertical Both Color Shape Tilt Size Length Area Volume Points Lines Areas 104
  66. How to visually encode information? • analyze idiom structure 


    — as combination of marks and channels Tamara Munzner. Visualization Analysis and Design 105 1: 
 vertical position mark: line 2: 
 vertical position horizontal position mark: point 3: 
 vertical position horizontal position color hue mark: point 4: 
 vertical position horizontal position color hue size (area) mark: point
  67. Channels 106 Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes

    Spatial region Color hue Motion Shape Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) Channels: Expressiveness Types And E ectiveness Ranks Tamara Munzner. Visualization Analysis and Design Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) Channels: Expressiveness Types And E ectiveness Ranks
  68. Channels • expressiveness principle 
 — match channel and data

    characteristics • effectiveness principle 
 — encode most important attributes with highest ranked channels Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) Channels: Expressiveness Types And E ectiveness Ranks Tamara Munzner. Visualization Analysis and Design 107
  69. Problem: Visual complexity Tamara Munzner. Visualization Analysis and Design 108

    Four strategies: 1. change view over time 2. facet across multiple views 3. reduce items/attributes within single view 4. derive new data to show within view
  70. Complexity: Strategies Tamara Munzner. Visualization Analysis and Design 109 Search

    Query Consume Present Enjoy Discover Produce Annotate Record Derive tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce
  71. Strategy 1: Change over time Tamara Munzner. Visualization Analysis and

    Design How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce 110 Search Query Consume Present Enjoy Discover Produce Annotate Record Derive tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore
  72. Idiom: Animated transitions 111 [Using Multilevel Call Matrices in Large

    Software Projects. van Ham. Proc. IEEE InfoVis, pp. 227–232, 2003.] • smooth transition from one state to another 
 — alternative to jump cuts 
 — support for item tracking when amount of change is limited • example: multilevel matrix views 
 — scope of what is shown narrows down info • middle block stretches to fill space, additional structure appears within • other blocks squish down to increasingly aggregated representations
  73. Strategy 2: Facet Tamara Munzner. Visualization Analysis and Design 112

    Search Query Consume Present Enjoy Discover Produce Annotate Record Derive tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce
  74. Strategy 2: Facet Facet Reduce Juxtapose Partition Superimpose Filter Aggregate

    Embed Superimpose Layers Coordinate Multiple Side By Side Views Share Encoding: Same/Di erent Share Data: All/Subset/None Share Navigation All Subset No Linked Highlighting 113 Tamara Munzner. Visualization Analysis and Design
  75. • see how regions contiguous in one view are distributed

    within another – powerful and pervasive interaction idiom • encoding: different – multiform • data: all shared [Visual Exploration of Large Structured Datasets. Wills. NTTS, pp. 237–246. IOS Press, 1995.] 114 Idiom: Linked Highlighting
  76. • encoding: same • data: subset shared • navigation: shared

    – bidirectional linking • differences – viewpoint – (size) • overview-detail • System: Google Maps 115 Idiom: Bird’s-eye maps [A Review of Overview+Detail, Zooming, and Focus+Context Interfaces. Cockburn, Karlson, and Bederson. ACM Computing Surveys 41:1 (2008), 1–31.]
  77. • encoding: same • data: none shared – different attributes

    for node colors – (same network layout) • navigation: shared 
 • System: Cerebral 116 Idiom: Small multiples [Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE Trans. Visualization and Computer Graphics (Proc. InfoVis 2008) 14:6 (2008), 1253–1260.]
  78. • benefits: eyes vs memory – lower cognitive load to

    move eyes between 2 views than remembering previous state with single view • costs: display area, 2 views side by side each have only half the area of one view 117 Strategy 2: Facet: Coordinate views Tamara Munzner. Visualization Analysis and Design All Subset Same Multiform Multiform, Overview/ Detail None Redundant No Linkage Small Multiples Overview/ Detail
  79. Strategy 2: Facet: Partition • how to divide data between

    views – encodes association between items using spatial proximity – major implications for what patterns are visible – split according to attributes • design choices – how many splits • all the way down: one mark per region? • stop earlier, for more complex structure within region? – order in which attributes used to split – how many views Partition into Side-by-Side Views Superimpose Layers Share Navigation All Subset Same Multiform Multiform, Overview/ Detail None Redundant No Linkage Small Multiples Overview/ Detail e Facet Reduce Juxtapose Partition Superimpose Filter Aggregate Embed Why? How? What? 118 Tamara Munzner. Visualization Analysis and Design
  80. Partition: List alignment • single bar chart with grouped bar

    s – split by state into region s • complex glyph within each region showing all age s – compare: easy within state, harder across ages Tamara Munzner. Visualization Analysis and Design 11.0 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 CA TK NY FL IL PA 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years CA TK NY FL IL PA 0 5 11 0 5 11 0 5 11 0 5 11 0 5 11 0 5 11 0 5 11 • small multiple bar chart s – split by age into region s • one chart per regio n – compare: easy within age, harder across states 11.0 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 CA TK NY FL IL PA 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years CA TK NY FL IL PA 0 5 11 0 5 11 0 5 11 0 5 11 0 5 11 0 5 11 0 5 11 119
  81. Strategy 2: Facet: Partition • split by neighborhood, then by

    type, then by time - years as rows - months as columns • color by price • neighborhood patterns • where it’s expensive • where you pay much more for detached type 120 [Configuring Hierarchical Layouts to Address Research Questions. Slingsby, Dykes, and Wood. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 977–984.]
  82. Strategy 2: Facet: Partition • switch order of splits -

    type then neighborhood • switch color - by price variation • type patterns - within specific type, which neighborhoods are inconsistent 121 [Configuring Hierarchical Layouts to Address Research Questions. Slingsby, Dykes, and Wood. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 977–984.]
  83. Strategy 2: Facet: Partition • different encodings for second-level regions

    - chloropleth maps 122 [Configuring Hierarchical Layouts to Address Research Questions. Slingsby, Dykes, and Wood. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis 2009) 15:6 (2009), 977–984.]
  84. Strategy 3: Reduce 123 Tamara Munzner. Visualization Analysis and Design

    Search Query Consume Present Enjoy Discover Produce Annotate Record Derive tag Target known Target unknown Location known Location unknown Lookup Locate Browse Explore How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce How? Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed nce
  85. Strategy 3: Reduce Reduce Filter Aggregate Embed Reducing Items and

    Attributes Filter Items Attributes Aggregate Items Attributes 124 Tamara Munzner. Visualization Analysis and Design • reduce/increase: filters – pro: straightforward and intuitive to understand and compute – con: out of sight, out of mind • aggregation – pro: inform about whole set – con: difficult to avoid losing signal • not mutually exclusive – combine filter, aggregate – combine reduce, facet, change, derive
  86. • static item aggregation • task: find distribution • data:

    table • derived data – 5 quant attributes • median: central line • lower and upper quartile: boxes • lower upper fences: whiskers or values beyond which items are outliers – outliers beyond fence cutoffs explicitly shown Idiom: Boxplot multi-modality is particularly ! ! ! ! ! ! ! ! ! n s k mm !2 0 2 4 !2 0 2 4 Figure 4: From left to right: box right are: standard normal (n), r [40 years of boxplots. Wickham and Stryjewski. 2012. had.co.nz]125
  87. Idiom: Dimensionality reduction • specifically applied to docs • attribute

    aggregation – derive low-dimensional target space from high-dimensional measured space Task 1 In HD data Out 2D data Produce In High- dimensional data Why? What? Derive In 2D data Task 2 Out 2D data How? Why? What? Encode Navigate Select Discover Explore Identify In 2D data Out Scatterplot Out Clusters & points Out Scatterplot Clusters & points Task 3 In Scatterplot Clusters & points Out Labels for clusters Why? What? Produce Annotate In Scatterplot In Clusters & points Out Labels for clusters wombat Tamara Munzner. Visualization Analysis and Design 126 Task 1 In HD data Out 2D data Produce In High- dimensional data Why? What? Derive In 2D data Task 2 Out 2D data How? Why? What? Encode Navigate Select Discover Explore Identify In 2D data Out Scatterplot Out Clusters & points Out Scatterplot Clusters & points Task 3 In Scatterplot Clusters & points Out Labels for clusters Why? What? Produce Annotate In Scatterplot In Clusters & points Out Labels for clusters wombat Task 1 In HD data Out 2D data Produce In High- dimensional data Why? What? Derive In 2D data Task 2 Out 2D data How? Why? What? Encode Navigate Select Discover Explore Identify In 2D data Out Scatterplot Out Clusters & points Out Scatterplot Clusters & points Task 3 In Scatterplot Clusters & points Out Labels for clusters Why? What? Produce Annotate In Scatterplot In Clusters & points Out Labels for clusters wombat
  88. Let’s take a step back! evels of visualization design ?

    f domain to 
 a abstraction ing at it? task algorithm idiom abstraction domain Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009) 127 algorithm idiom abstraction domain
  89. Refine graphical characteristics • Pencil and paper • facilitate thinking

    and hypothesis generation • inward reflection • outward expression • constructive activity • thinking specific and explicit • demanding activity • to contextualize our understanding spatially Wong, B., Kjærgaard, R. Pencil and paper. Nat Methods 9, 1037 (2012). https://doi.org/10.1038/nmeth.2223 128 Depict Data Studio. The Data Visualization Design Process: A Step-by-Step Guide for Beginners
  90. Sketch to structure 130 Michele Graffieti & Giorgia Lupi, 2016.

    Sketching with Data Opens the Mind’s Eye
  91. Explore design elements 131 Michele Graffieti & Giorgia Lupi, 2016.

    Sketching with Data Opens the Mind’s Eye
  92. Drawing to refine 132 Michele Graffieti & Giorgia Lupi, 2016.

    Sketching with Data Opens the Mind’s Eye
  93. Pointers • In education, drawing improves comprehension of scientific concepts

    • Students were found to perform markedly better after they had been prompted to generate, justify and refine visual representations of classroom material • Drawing is to augment our short term working memory 134 Wong, B., Kjærgaard, R. Pencil and paper. Nat Methods 9, 1037 (2012). https://doi.org/10.1038/nmeth.2223
  94. Visual working memory 135 Wong, B., Kjærgaard, R. Pencil and

    paper. Nat Methods 9, 1037 (2012). https://doi.org/10.1038/nmeth.2223 • Table describes a simple network where connections between the nodes are indicated by filled cells • Connections are arranged as rows and columns • Try to mentally picture the underlying network!
  95. Color 138 The property possessed by an object of producing

    different sensations on the eye as a result of the way it reflects or emits light
  96. Color as illusion 141 colors to specific categories, color can

    bias the reader such a potent differentiator, the appropriate strategy i that are discernible from one another but comparab Color is a relative medium, and neighboring c visual perception. For example, it is possible to color look different or different colors appear the the same) by changing only the background color perception of color depends on context, and manip butes of neighboring colors affects how we see th A heat map requires us to judge the relative bright a matrix. The interaction of color can cause a pro makes this graphical representation suffer (Fig. 1c Every color is described by three properties: hu lightness. Hue is the attribute we use to classify a colo Saturation describes the neutrality of a color; a red o no white is said to be very saturated. The lightness about its relative ordering on the dark-to-light scale Figure 1 | Perception of color can vary. (a,b) The same colo (a), and different colors can appear to be nearly the same b background color (b)1. (c) The rectangles in the heat map i * a c b Albers, J. Interaction of Color (Yale University Press, New Haven, Connecticut, USA, 1975)
  97. Color as three numbers •trichromatic cone cells respond to 1

    out of 3 frequencies exhibited by photons arriving on their surface •only about 6 — 7 million of cones •different cone cell responses: area function of wavelength [Representing Colors as Three Numbers, Stone, IEEE Computer Graphics and Applications, 25(4), July 2005, pp. 78-85] 143
  98. Color as cone cell responses •different cone cell responses: area

    function of wavelength •for a given spectrum 
 - multiply by response curve 
 - integrate to get response [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015]144
  99. • brain sees only cone response 
 - different spectra

    appear the same Spectral Sensitivity 6 Wavelength (nm) IR UV Visible Spectrum • varies strongly over the wavelength range between 380 and 800 nm [Representing Colors as Three Numbers, Stone, IEEE Computer Graphics and Applications, 25(4), July 2005, pp. 78-85]
 [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015] Metamerism: Several similar segments 145
  100. Color as three channels minance, saturation, hue r categorical for

    ordered e n mon color spaces r choice for visual encoding r, but beware ≠ luminance cy creating visual layers ot combine with luminance or saturation 8 Saturation Luminance values Hue 146 Wong, B. Color coding. Nat Methods 7, 573 (2010)
  101. Color coding • adds dimensionality and richness to scientific communications

    • simplifies a complex analysis task • typically used to differentiate information into classes • challenge of picking colors that are discriminable • need for systematic approach for color coding 147 such a potent differentiator, the appropriate strategy is to choose colors that are discernible from one another but comparable in visibility. Color is a relative medium, and neighboring colors can affect visual perception. For example, it is possible to make the same color look different or different colors appear the same (or nearly the same) by changing only the background color (Fig. 1a,b). The perception of color depends on context, and manipulating the attri- butes of neighboring colors affects how we see the original color1. A heat map requires us to judge the relative brightness of colors in a matrix. The interaction of color can cause a profound effect that makes this graphical representation suffer (Fig. 1c). Every color is described by three properties: hue, saturation and lightness. Hue is the attribute we use to classify a color as red or yellow. Saturation describes the neutrality of a color; a red object with little or no white is said to be very saturated. The lightness of a color tells us about its relative ordering on the dark-to-light scale. cases To colo whil rang syste erty to bl ors. beco tural data Ju size the o hue, com and Co choi senta us m disce the d Bang 1. Alb Con Bang W of Tec Art as Figure 1 | Perception of color can vary. (a,b) The same color can look different (a), and different colors can appear to be nearly the same by changing the background color (b)1. (c) The rectangles in the heat map indicated by the asterisks (*) are the same color but appear to be different. * * a c b
  102. Color to categorical data 148 • well suited to represent

    categorical data • distinguish between experimental conditions • if used by assigning intense or weak colors to specific categories, color can bias the reader. • color is a potent differentiator • appropriate strategy: choose colors that are discernible from one another but comparable in visibility.
  103. Categorical color: Discriminability constraints • noncontiguous small regions of color:

    only 6-12 bins 10 [Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. Sinha and Meller. BMC Bioinformatics, 8:82, 2007.] [Cinteny: flexible analysis and visualization of sentent and genome rearrangements in multiple organisms. Sinha and Meller. BMC Bioinformatics, 8:82, 2007] • noncontiguous small regions of color: only 6-12 bins • have a really good reason to use 10 or more categorical colors 149 How many categories?
  104. [Tableau software blog. How we designed the new color palettes

    in Tableau 10? Stone, 2016] Tableau10 150
  105. Color to quantitative data • define key regions or points

    in the data range that we intend to highlight before designing a color-coding scheme that varies the 3 components • need to determine the aspects of the data we want to make apparent • hue is impractical • principally need to rely on color value • reserve hue to indicate different segments of the data range • a meaningful range will be the extremes: min and max values • the zero value may also be interesting • different ranges for different contexts: sea level, absolute zero −273.15°Celsius
  106. Color to quantitative data • color not ideal due to

    ambiguity of how colors should be ordered • is yellow smaller than blue? • could pattern the sequence after the ordering visible light by wavelength: ROYGBIV • Transition between colors are uneven, which breaks the correspondence between color and numerical value 152 Gehlenborg, N., Wong, B. Mapping quantitative data to color. Nat Methods 9, 769 (2012)
  107. Color blindness • Tritanopia/Tritanomaly: Missing/malfunctioning S-cone (blue). • Deuteranopia/Deuteranomaly: Missing/malfunctioning

    M-cone (green). • Protanopia/Protanomaly: Missing/malfunctioning L-cone (red). Tritanomaly Normal Deuteroanomaly Protoanomaly • Monochromatism: either no cones available or just one type is missing • etc Protanopia Deuteranopia Tritanopia
  108. • Luminance is measurable • Lightness is perceived Luminance •

    Brightness is perceived Lightness relative to some average level in an image or environment • HSV and HSB are the same. V stands for Value and B stands for Brightness • HSL: L is Lightness All are employed for the same purpose: make an image more or less light- er! Luminance Lightness Brightness Disambiguation: Luminance
  109. [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015.

    http:// www.cs.ubc.ca/~tmm/courses/547-15] • RGB 
 - convenient for machines
 - three channels are not separable
 • CIE XYZ
 - from color matching functions
 - perceptually based
 • HSL
 - a simple transformation from RGB
 - good: separates out lightness from hue and saturation
 - bad: lightness not true luminance
 - careful: only pseudo-perceptual 156 Color spaces
  110. Color: Luminance, saturation, hue Color: Luminance, saturation, hue • 3

    channels – identity for categorical • hue – magnitude for ordered • luminance • saturation • other common color spaces – RGB: poor choice for visual encoding – HSL: better, but beware • lightness ≠ luminance • transparency – useful for creating visual layers • but cannot combine with luminance or saturation 8 Corners of the RGB color cube L from HLS All the same Luminance values Color: Luminance, saturation, hue • 3 channels – identity for categorical • hue – magnitude for ordered • luminance • saturation • other common color spaces – RGB: poor choice for visual encoding – HSL: better, but beware • lightness ≠ luminance • transparency – useful for creating visual layers • but cannot combine with luminance or saturation 8 Saturation Luminance values Hue [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015] 157
  111. Colormaps 9 • categorical limits: noncontiguous – 6-12 bins hue/color

    • far fewer if colorblind – 3-4 bins luminance, saturation – size heavily affects salience • use high saturation for small regions, low saturation for large after [Color Use Guidelines for Mapping and Visualization. Brewer, 1994. http://www.personal.psu.edu/faculty/c/a/cab38/ColorSch/Schemes.html] Categorical Ordered Sequential Bivariate Diverging Binary Diverging Categorical Sequential Categorical Categorical [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015. http://www.cs.ubc.ca/~tmm/courses/547-15] Color maps
  112. Exploiting the Power of the Human Visual System July 21,

    2009 Contrast hierarchy creates layers Context Normal Urgent Context Normal Urgent Wrong Right Context Normal Normal Context Normal Normal From Larry Arend colorusage.arc.nasa.gov Rules for managing attention whisper silent shout indoor voice Maureen Stone, StoneSoup Consulting Jock Mackinlay, Tableau Software 4 From Vision and the Art of Seeing by Margaret Livingstone Luminance Hue & chroma Get it right in black & white Maps courtesy of the National Park Service (www.nps.gov) How do we fix this? Context Normal Urgent Context Normal [Exploiting the Power of the Human Visual System. Stone and Mackinlay, 2009] Color usage 159
  113. Exploiting the Power of the Human Visual System July 21,

    2009 From Stephen Few Bezold Effect Spreading: Adjacent colors blend What color is this? What color is this? What color is this? What color is this? Tufte’s Fundamental Uses To label • Primarily hue variation • Associated with color names To measure • Vary lightness & chroma • Map to data distribution • Map to data distribution [Exploiting the Power of the Human Visual System. Stone and Mackinlay, 2009] Color usage
  114. Ordered color: Rainbow is poor default • problems – perceptually

    unordered – perceptually nonlinear • benefits – fine-grained structure visible and nameable • alternatives – fewer hues for large-scale structure – multiple hues with monotonically increasing luminance for fine-grained – segmented rainbows good for categorical, ok for binned 11 [Transfer Functions in Direct Volume Rendering: Design, Interface, Interaction. Kindlmann. SIGGRAPH 2002 Course Notes] [A Rule-based Tool for Assisting Colormap Selection. Bergman,. Rogowitz, and. Treinish. Proc. IEEE Visualization (Vis), pp. 118–125, 1995.] [Why Should Engineers Be Worried About Color? Treinish and Rogowitz 1998. http://www.research.ibm.com/people/l/lloydt/color/color.HTM] [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015. http://www.cs.ubc.ca/~tmm/courses/547-15] Ordered color: Rainbow is poor default • problems – perceptually unordered – perceptually nonlinear • benefits – fine-grained structure visible and nameable • alternatives – fewer hues for large-scale structure – multiple hues with monotonically increasing luminance for fine-grained – segmented rainbows good for categorical, ok for binned 11 [Transfer Functions in Direct Volume Rendering: Design, Interface, Interaction. Kindlmann. SIGGRAPH 2002 Course Notes] [A Rule-based Tool for Assisting Colormap Selection. Bergman,. Rogowitz, and. Treinish. Proc. IEEE Visualization (Vis), pp. 118–125, 1995.] [Why Should Engineers Be Worried About Color? Treinish and Rogowitz 1998. http://www.research.ibm.com/people/l/lloydt/color/color.HTM] Ordered color
  115. • what is the color used for? • what type

    of imagery needs to be colored? • what can we assume about the display? • what can we assume about the user? • what can we assume about the task? [What’s so hard about categorical color? Stone. StoneSoup Consulting] 164 Helpful tips
  116. 12 Map other channels • size – length accurate, 2D

    area ok, 3D volume poor • angle – nonlinear accuracy • horizontal, vertical, exact diagonal • shape – complex combination of lower-level primitives – many bins • motion – highly separable against static • binary: great for highlighting – use with care to avoid irritation [Ch 10: Color. Papers: Colors as Three Numbers. Munzner, 2015. http://www.cs.ubc.ca/~tmm/courses/547-15] 12 ap other channels size – length accurate, 2D area ok, 3D volume poor angle – nonlinear accuracy • horizontal, vertical, exact diagonal shape – complex combination of lower-level primitives – many bins motion – highly separable against static • binary: great for highlighting – use with care to avoid irritation Motion Direction, Rate, Frequency, ... Length Angle Curvature Area Volume Size, Angle, Curvature, ... Shape Motion 165 Map other channels!
  117. man Visual System July 21, 2009 If you can’t use

    color wisely, it is best to avoid it entirely Above all, do no harm If you can’t use color wisely, it is best to avoid it entirely If you can’t use color wisely, it is best to avoid it entirely Above all, do no harm. it is best to avoid it entirely Above all, do no harm [Exploiting the Power of the Human Visual System. Stone and Mackinlay, 2009] 167
  118. Bar charts • typically used to visualize quantities associated with

    a set of items • Bar charts are appropriate for counts • Bar charts encode quantities by length • Stacked bar charts enable comparison across items • Layered bar charts support comparison within categories • Grouped bar charts allow comparison across categories 172 xkcd
  119. Bar charts and box plots 173 Streit, M., Gehlenborg, N.

    Bar charts and box plots. Nat Methods 11, 117 (2014)
  120. Box plots • typically when dealing with quantities sampled from

    a population (VS a set of counts) the data contains uncertainty! • Bar charts aren’t suitable to add error bars because misleading • Bar charts start at zero, the resulting range might not have been observed • Box plots better fit and represent the characteristics of a distribution 174 xkcd
  121. Helpful tips • for better readability you may oder each

    • order bars by heights • order boxes by medians • use zero as base line for bar charts unless there’s a good reason • facilitate interpretation and comparison by adding ticks, marks and, if necessary, grid lines to show smaller differences • fill with solid color and forgo outlines • avoid more than 10 colors 176 xkcd
  122. Sets and intersections • sets is a universal concept •

    examples: 
 - bacteria in a soil sample 
 - enzymes in biochem pathway 
 - variants in a genome 
 - proteins in a serum 
 - genes in a patient cohort • often the task is to identify these sets • other common task: analysis of the similarities and differences of n sets by using the concept of intersection 177 xkcd
  123. Euler diagrams • Euler diagrams represent intersecting sets as overlapping

    shapes: circles, ellipses, etc • They are drawn so that there area is proportional to the number of elements they represent • effective vis to encode all sets of intersections is to use a matrix with a binary pattern to render bars above the columns representing each intersection (sorted, scaled, etc) 178 Lex, A., Gehlenborg, N. Sets and intersections. Nat Methods 11, 779 (2014)
  124. Venn diagrams • Venn diagrams are identical with the exception

    that they show all intersections • all intersections include empty sets (which are not drawn in Euler diagrams) 179
  125. Heat maps • represent 2D tables of numbers as shades

    of colors • very popular in Biology, to depict gene expression, high- throughput data, multivariate data, etc • dense and intuitive • hundreds of rows can be displayed on a screen • rely on color encoding and on meaningful reordering of the rows and columns 183 Ramakrishna, C., Corleto, J., Ruegger, P.M. et al. Dominant Role of the Gut Microbiota in Chemotherapy Induced Neuropathic Pain. Sci Rep 9, 20324 (2019)
  126. Clustering 185 Gehlenborg, N., Wong, B. Heat maps. Nat Methods

    9, 213 (2012) • color relative medium • clustering reveals patterns and structure in the heat maps • added gaps reveal relationship
  127. Parallel coordinates • all lines pass through a small number

    of points • categorical data is not well suited for parallel coordinates • limited visibility of items when the number gets high • works best for moderate number of dimensions and no more than a few thousand records • quickly recognize patterns • estimate the strength of correlations 188
  128. Temporal data: Line charts • use inherent properties of time

    to create effective visualizations • time is unidirectional, provides a natural order for events and has an inherent semantic structure • temporal data are often cyclic and exhibit repeating patterns • the challenge is that time cannot be directly perceived (unlike spatial dimensions) • common approaches may combine: position, brightness, saturation, animation
  129. Helpful tips Time • very effective visual variable • examples:

    line and bar charts • mapped to the horizontal axis 
 • take into account the inherent cyclicality • break apart the time dimension into time intervals • they emphasize a recurring pattern 192 Week Streit, M., Gehlenborg, N. Temporal data. Nat Methods 12, 97 (2015)
  130. Radar charts • use polar coordinates • project the data

    onto a circular plane • often applied because of visual appeal • produce a continuous curve over the cycles • support comparison of patterns across cycles • harder to interpret due to distortion 1 193 Streit, M., Gehlenborg, N. Temporal data. Nat Methods 12, 97 (2015)
  131. Sparklines • term introduced by E Tufte for illustrating data

    trends • show data in a highly condensed form that still allows for comparison • designed to show qualitative data aspects • don’t require scales or axes • enables integration of a high number of measurements over time 195 Laurence Sterne: The Life and Opinions of Tristram Shandy, Gentleman, 1759–1767, Vol. VI, Chapter Forty. “These were the four lines I moved through my first, second, third, and fourth volumes”.
  132. Animation • maps time to time • alternative approach if

    visual variables such as position and saturation or brightness are already in use • intuitive • limits ability to detect patterns • cannot compare across multiple time points • change blindness makes it that we lose track of many changing elements 196
  133. Networks • arise from complex Biological or other relational data

    • mathematically known as graphs • describe a set of pairwise relationships • common plotting is a node-link diagram • typically molecules as nodes and the connections between the nodes as straight or curved lines (or edges) • directed (asymmetric) or undirected (symmetric) 200 Gehlenborg, N., Wong, B. Networks. Nat Methods 9, 115 (2012)
  134. Networks: Helpful tips (1) • advantage of preserving local network

    detail • easy to identify nearest neighbors of a node • easy to trace paths through the network • layout affects how data is perceived • spring-embedded layout creates hubs and clusters (doesn’t scale) • pitfall of the ‘hairball’ effect • alternative is adjacency matrix • one drawback: difficult to understand for non-connected nodes Gehlenborg, N., Wong, B. Networks. Nat Methods 9, 115 (2012) 201
  135. Networks: Helpful tips (2) • adjacency matrix: reorder nodes such

    that many filled cells appear next to each other as possible • clusters are evident • connections between clusters appear as clumps of information away from the diagonal • if adj. matrix and node-link diagrams are inadequate: limit the representation to partial network or rely on statistical metric to describe certain data aspect Gehlenborg, N., Wong, B. Networks. Nat Methods 9, 115 (2012) 202
  136. Pathways • describe the connectivity and flow of information in

    biological systems • example of molecules, cells, species, global ecological networks, etc • pathways are network • elements as nodes and their relationships as edge • requirement: clear depiction of connectivity via pattern 203 Hunnicutt, B., Krzywinski, M. Pathways. Nat Methods 13, 5 (2016)
  137. Pathways: Helpful tips (1) • information flow from left to

    right and top to bottom • diverging from this standard or introducing asymmetry in the layout helps emphasize differences • should be done carefully and sparingly • edges that loop back should be in clockwise direction (b) • placing nodes on grid assists eye movement across (c) • alignment type emphasizes either information flow or source nodes 204 Hunnicutt, B., Krzywinski, M. Pathways. Nat Methods 13, 5 (2016)
  138. Pathways: Helpful tips (2) • strong relationships can be illustrated

    using connection and enclosure • edges group nodes via connection • enclosure can group nodes in shared compartments • associate nodes through similarity (color or shape) or proximity (pixel distance) • highlights parts with grouping • proximity grouping can be done with negative space • need for start and finish to easily identify and examine pathways • labels or names: high visual cost and disrupts grouping 205 Hunnicutt, B., Krzywinski, M. Pathways. Nat Methods 13, 5 (2016)
  139. Neural circuit diagrams • network • nodes: brains regions or

    single neurons • directed edges: axonal connections • edge may encode many variables • designates neurotransmitter type • determines cell excitation, inhibition or modulation of its targets • node position, color and shape encode cell morphology, type, location, etc 206 Hunnicutt, B., Krzywinski, M. Neural circuit diagrams. Nat Methods 13, 189 (2016)
  140. Neural circuits: Helpful tips (1) 207 Hunnicutt, B., Krzywinski, M.

    Neural circuit diagrams. Nat Methods 13, 189 (2016)
  141. Neural circuits: Helpful tips (2) Supplementary Figure 1 Strategies to

    add emphasis and information to the circuit shown in Fig Full region acronyms are shown as in the original5. ! 208 Hunnicutt, B., Krzywinski, M. Neural circuit diagrams. Nat Methods 13, 189 (2016)
  142. Neural circuits: Helpful tips (3) 209 Hunnicutt, B., Krzywinski, M.

    Neural circuit diagrams. Nat Methods 13, 189 (2016)
  143. Pie charts 211 Drew Skau, Robert Kosara, Arcs, Angles, or

    Areas: Individual Data Encodings in Pie and Donut Charts, EuroVis 2016
  144. Typography: Art and technique • affects perception of credibility •

    frequently conflated with font • Arial is a typeface that include roman, bold and italic fonts • letterforms: serif, sans serif • primary characteristics • Serif: thinner, formal, easier to read in block text because ‘feet’ helps our eyes follow line (posters, printed documents) • Sans serif: simpler, information, and appropriate for headings and labels (slides) 220 Wong, B. Points of view: Typography. Nat Methods 8, 277 (2011)
  145. Typography to honor content • pick one and ignore the

    rest • combine with care • reveals the tone of the doc • clarifies structure and meaning • space among ¶ > line spacing 221
  146. Typography • font selection shows quickly if content is stately

    or humble, formal or informal, creative or technical 
 • most docs can be set perfectly with: 
 - one typeface 
 - 2 or 3 type sizes 
 - bold and italics if necessary 
 • typography must draw our attention without interfering with the reading 224
  147. Axes, ticks and grids • figures with quantitative info more

    accurately understood • helpful navigational elements • provide scale and aid to assess lengths & proportions • must be distinct from primary information • Gestalt principles inform us how to use: line width, color and transparency • keep data-to-ink ratio high • least ink for navigational elements 226
  148. Axes • data as foundation • follow its coordinate system

    • figure axes are critical in orienting the reader • avoid bounding by axes on all sides • containment often mistaken for organization (negative space) • multi-panel figures should maintain fixed scales for comparison • variation in axis ranges is easily overlooked • outliers shouldn’t compress the dynamic range of all the data 227 Krzywinski, M. Axes, ticks and grids. Nat Methods 10, 183 (2013)
  149. Ticks • densely labeled figures uneasy on the eye •

    axis ticks burden it with repetition • esp. relevant for views of data across large genomes (filled with repeating non significant zeros) • easy strategy to keep tick label complexity low while maintaining usability 228 Krzywinski, M. Axes, ticks and grids. Nat Methods 10, 183 (2013)
  150. Grids • establish sight lines to compare proportions and relate

    positions to axis ticks • grid number is suggestive of the scale of differences • dense grid means minor fluctuations in the data and low uncertainty level • dense grid impedes accurate judgement due to high density • no grid may be better than a bad one • use only when needed 229 Krzywinski, M. Axes, ticks and grids. Nat Methods 10, 183 (2013)
  151. Consistency and alignment • deal with complexity by using: 


    - labels to identify components 
 - defining terms and acronyms 
 - focusing reader’s attention 
 235 • labels are annotations • labels position in relation to data points • placement priority scheme Krzywinski, M. Labels and callouts. Nat Methods 10, 275 (2013)
  152. Clarity • keep labels concise but clear • move common

    text to legend • explore ways to present labels in alignment 236 Krzywinski, M. Labels and callouts. Nat Methods 10, 275 (2013) • control spatial variation • if in doubt keep extra space
  153. Integration • design figures to incorporate labels and callouts •

    even the space • group labels intuitively • use a grid system • uniform arrangement/spacing • consistent line lengths, angles, spacing and alignment • limit diversity in length and angle of callout lines 237 Krzywinski, M. Labels and callouts. Nat Methods 10, 275 (2013) Hanahan, D. & Weinberg, R.A. Cell 144, 646–674 (2011)
  154. Symbol diversity • data categories encoded with distinct symbols •

    insufficient symbol contrast impede on identifying them • letters as plotting symbols • some draw more attention • bias assignment of category • color as efficient discriminator Krzywinski, M., Wong, B. Plotting symbols. Nat Methods 10, 451 (2013) 240
  155. Natural hierarchies • data points represent genes classified by: 


    - type (gene, non processed pseudogene, processed) 
 - transcription state (on, off) • map salience to relevance • elevate important data using symbols with greater visual weight (fill and/or color) • single color isolates single var
  156. Arrows and meanings • metaphorical uses (increase, decrease) • geometry

    tells us its purpose • arrows in one figure could have a different purpose • label parts, convey mechanical motion, flow, change, movement or causality • functional relationship of elements 245 Wong, B. Arrows. Nat Methods 8, 701 (2011)
  157. ‘There are two goals when presenting data: convey your story

    and establish credibility’ -Edward Tufte 248
  158. Align to a grid 253 PRINCIPLES OF FORM AND DESIGN

    MODULAR GRID GRID GESTALT PRINCIPLES OF GROUPING COLOR CONTRAST HIERARCHY WEIGHT AND SCALE HIERARCHY HIERARCHY SHAPE CONTRAST WEIGHT AND SCALE HIERARCHY GOLDEN RATIO 1.61803399 GOLDEN RECTANGLE RULE OF THIRDS align focal point to one of the four circles
  159. Use form and design principles 256 PRINCIPLES OF FORM AND

    DESIGN POINT GEOMETRIC STATIC LINE ORGANIC ACTIVE/DYNAMIC VOLUME SYMMETRY GRADATION ASYMMETRY SPACE/PLACEMENT RADIAL RHYTHM GROUPING SPACE/SCALE NEGATIVE/POSITIVE TRANSPARENT DIRECTION OPAQUE PLANE LAYERS TENSION TENSION SCALE REPETITION POINT GEOMETRIC STATIC LINE ORGANIC ACTIVE/DYNAMIC VOLUME TEXTURE SYMMETRY GRADATION ASYMMETRY SPACE/PLACEMENT RADIAL RHYTHM GROUPING SPACE/SCALE NEGATIVE/POSITIVE TRANSPARENT DIRECTION OPAQUE PLANE LAYERS TENSION TENSION SCALE GESTURE PATTERN REPETITION FIGURE/GROUND AMBIGUOUS FIGURE/GROUND REVERSIBLE
  160. … and Gestalt principles 257 GESTALT PRINCIPLES OF GROUPING CLOSURE

    AREA SYMMETRY COLOR CONTRAST HIERARCHY WEIGHT AND SCALE HIERARCHY HIERARCHY SHAPE CONTRAST WEIGHT AND SCALE HIERARCHY PROXIMITY SIMILARITY PROXIMITY CONTINUITY
  161. … and how about some fun? 260 Briscoe et al.

    (2014) Biology Letters Jablonski et al. (2012), Historical Biology, 24:5, 527-536
  162. Understanding graphs • accurate interpretation of visual variables • effective

    graphs should: 
 - accommodate reader needs 
 - focus on human perception strengths • e.g. tough to accurately judge differences among two curves • perceptual system is attuned to detecting min distances • shortcoming of judging relative area • e.g. bubble charts usefulness 262 Wong, B. Design of data figures. Nat Methods 7, 665 (2010)
  163. Cleveland and McGill Rank Aspect to compare 1 Positions on

    a common scale 2 Positions on the same but nonaligned scales 3 Lengths 4 Angles, slopes 5 Area 6 Volume, color saturation 7 Color hue 265 Wong, B. Design of data figures. Nat Methods 7, 665 (2010)
  164. Visual communication • sci comm with graphs depends on the

    design decisions made by authors • specifically, encoding info for readers to decode • strong visuals to compose better figures • rely on accurate perceptual tasks • support the visual assessment for better interpretation 267
  165. Microscopy system 276 Wong, B. Points of view: Points of

    review (part 1). Nat Methods 8, 101 (2011)
  166. Microscopy system • intended to illustrate 3 parts • redraw

    the figure so the threefold nature is apparent even at a glance • Gestalt principles to organize objects into groups • e.g. line connections, space containment, proximity • compartments for structure • the horizontal feature links the system together • negative space as separator, added uniformity using shapes 277 Wong, B. Points of view: Points of review (part 1). Nat Methods 8, 101 (2011)
  167. Gene expression Wong, B. Points of view: Points of review

    (part 1). Nat Methods 8, 101 (2011) 278
  168. Gene expression • fitting vertical structure to relate parts to

    one another • line up arrows for visual completion (connect and order process) • differentiate the central path from other elements using orientation and alignment to create salience • added reagents misaligned or placed at an angle from central molecules • consistent color encoding (green as barcode) Wong, B. Points of view: Points of review (part 1). Nat Methods 8, 101 (2011) 279
  169. Data graphs • reading graphs to observe individual data points

    • keep each in memory to construct an image • fast process thanks to visual perception • graphical encoding supports detection and assembly process • certain tasks easier • e.g. reading bar chart vs pie 280 Wong, B. Points of view: Points of review (part 2). Nat Methods 8, 189 (2011)
  170. Visual encodings 281 Wong, B. Points of view: Points of

    review (part 2). Nat Methods 8, 189 (2011)
  171. Multivariate scatter plot 282 Wong, B. Points of view: Points

    of review (part 2). Nat Methods 8, 189 (2011)
  172. Color, color, and more color 283 Wong, B. Points of

    view: Points of review (part 2). Nat Methods 8, 189 (2011)
  173. Visualization • effective for spatial data, rarely effective for other

    types • complexity & understandability • higher effectiveness 2D plane • rely on non spatial graphical encodings to add extra dimensions 291 https://3dmapart.com/
  174. 3D representation of abstract data Gehlenborg, N., Wong, B. Into

    the third dimension. Nat Methods 9, 851 (2012) 296
  175. Tips (1) • if one data dimension is categorical and

    there are few categories use shapes • many approaches to represent nD data on 2D plane • matrix of scatter plots for pairwise correlations can reveal correlations • also: Heat maps and coordinate plots • dimensionality reduction methods (PCA, MDS, etc) acceptable yet with info loss 297 http://www.turingfinance.com/artificial-intelligence-and-statistics-principal-component-analysis-and-self-organizing-maps/
  176. Tips (2) • minimize impact of occlusion • animated rotation

    of objects of interest is common to show hidden surfaces (interactive) • semitransparent surfaces allow to look through or into objects • problem of having unintended artifacts, esp. with color usage • place labels onto the 2D projection not in 3D scene (distortion and readability) • take into account data properties • support vis goals with depth cues and consistent encodings 298 Lan Huong, and Holmes. "Ten quick tips for effective dimensionality reduction." PLoS computational biology 15.6 (2019)
  177. Power of the plane • parallel coordinate plots • scatter

    plots • highly useful 2D plot types for high-dimensional data • representation of data using location on a plane • strengths for highlighting different data aspects • data tasks to show: clusters, trends and outliers 300
  178. Parallel coordinate plot • one data set: Iris (R.A. Fisher)

    • multiple visualizations • parallel coordinate plot handles n data types (a) • quantitative multivariate data over time or m conditions (b) • enables accurate comparisons across dimensions • robust graphical encodings • clear data relationships • limited suitability for data dominated by categorical information or small data ranges 301 Gehlenborg, N., Wong, B. Power of the plane. Nat Methods 9, 935 (2012)
  179. Scatter plot • the choice between these two plots depends

    on the analytical task • the how the data is represented is the difference • 1 data point in parallel coordinate is 1 line or 1 profile • supports pairwise correlations and other relationships between m dimensions • characteristic shapes of the point clouds • complement each other 302 Gehlenborg, N., Wong, B. Power of the plane. Nat Methods 9, 935 (2012)
  180. Complexity • focus on meaning instead of structure • anchor

    the figure to relevant domain knowledge content (versus method detail) • which findings are interesting? • what representation would communicate them clearly? • project data onto familiar visual paradigms • e.g., network or pathway to show biological effects • dimensions can be encoded as spatial or visual elements, such as along x and y axes or by color, size or symbol 305 Krzywinski, M., Savig, E. Multidimensional data. Nat Methods 10, 595 (2013)
  181. Small multiples • effective method for presentation • example: Study

    of drug effect on a network of signaling proteins 306 Krzywinski, M., Savig, E. Multidimensional data. Nat Methods 10, 595 (2013)
  182. Effective design • effective figures • use of spatial encoding

    to present the data domain (protein network) • small multiple maintains functional relationship between the proteins • assess the impact of n variables • incompatible for quantitative variables as it ‘muddles and confounds the analysis’ • small multiple scales well • original actually shows 392 cell type-drug combinations A RT I C L E S signaling in monocytes (Supplementary Figs. 21 and 22), independ- ent of stimulation conditions, indicating that under the conditions of our assay, SFK and JAK-STAT signaling pathways are active in mono- cytes, but inactive in T cells, B cells, dendritic cells and NK cells. The data also enabled the comparative analysis of cell-signaling- network responses to inhibition in closely related cell types. Such responses differ only to a few compounds, including imatinib (Gleevec; Supplementary Note 7 and Supplementary Fig. 23), the c-Jun Column 1 2 3 4 14 5 6 8 7 9 10 11 12 13 15 16 17 18 20 19 21 22 24 23 25 27 26 28 Row 1 2 3 4 5 6 8 7 9 10 11 12 IgM– IgM+ CD4+ CD8+ CD14+ HLA-DRlow CD14+ HLA-DRmid CD14+ HLA-DRhigh CD14– HLA-DRlow CD14– HLA-DRmid CD14– HLA-DRhigh CD14+ CD14– NK cells Dendritic AKT-1/2 Sorafenib BTK inhib. III Crassin Dasatinib G DC-0941 G o-6983 H89 IKK inhib. I Im atinib JAK3 inhib. Lck inhib. Lestaurtinib PP2 Rapam ycin Ruxolitinib SP600125 SB-202190 Sunitinib Syk inhib. IV Tofacitinib Cell type JAK2 inhib. UO 126 VX680 Streptonigrin Surf- Monocytes T cells B cells Column 1 2 3 4 14 5 6 8 7 9 10 11 12 13 15 16 17 18 20 19 21 22 24 23 25 26 27 28 Row 1 2 3 4 5 6 8 7 9 10 11 12 13 14 Inhibitor Inhibitor 5x 4x 3x 2x 1x 0x –1x –2x –3x –4x –5x T cells NK cells B cells Monocytes Surface– Dendritic cells ERK p38 SHP STAT1 STAT3 STAT5 NFb BTK S6 AKT PLC ZAP70 LAT SLP76 SYK BLNK ERK p38 SHP STAT1 STAT3 STAT5 NFb BTK S6 AKT PLC LAT SYK ERK p38 SHP STAT1 STAT3 STAT5 NFb BTK S6 AKT PLC LAT SLP76 No inhibitor EC 50 Drug potency <6.0*10–10 6.0*10–10 3.0*10–9 1.5*10–8 7.7*10–8 3.9*10–7 2.0*10–6 9.9*10–6 5.0*10–5 >5.0*10–5 Non-sigmoidal Response Unstim. Stimulation Inhibitor Fold change (vs. basal) Percent inhibition Induction ≥1,000 100 (Basal levels) 0 (No inhib.) ≤–1,000 JAK(pan) inhib. Staurosporine No inhib. (Fold change vs. basal) Inhibitor AKT-1/2 Sorafenib BTK inhib. III Crassin Dasatinib G DC-0941 G o-6983 H89 IKK inhib. I Im atinib JAK(pan) inhib. JAK3 inhib. Lck inhib. Lestaurtinib PP2 Rapam ycin Ruxolitinib SP600125 SB-202190 Staurosporine Streptonigrin Sunitinib Syk inhib. IV Tofacitinib Unstim. Stimulation (30 min) pVO4 IL-2 IL-3 IL-12 LPS GM-CSF IFN- IFN- G-CSF BCR/FcR-XL PMA/Iono. JAK2 inhib. UO 126 VX680 No inhib. (Fold change vs. basal) Phosphoprotein placement IFN- IgM+ B cells a b c Figure 5 Overview of inhibitor impact. (a) A miniaturized signaling network, guided by canonical pathways, including vertical ordering of nodes from membrane-proximal signaling proteins to nucleus-localized transcription, is used here to depict the effect of a stimulus or inhibitor on each quantified phosphorylation site after 15-min incubation with the inhibitor and subsequent 30-min cell stimulation. As some antibodies recognize different proteins in different cell types, three cell type–specific signaling networks are shown. In the absence of inhibitor (“No inhibitor”), the response to each stimulus relative to the untreated state is represented as fold change by a sized red or black circle (for induction and reduction of phosphorylation levels, respectively). To visualize the effects of inhibitors (“Inhibitor”), circles were sized inversely to the IC50 and colored by the percent inhibition (‘inhibition’). For example, in the presence of ruxolitinib, inhibition of phosphorylation of STAT1 (IC50 = 23 nM, 93% inhibition) and STAT3 (IC50 = 4 nM, 147% inhibition) was observed (a, “Inhibitor”), whereas without activation of the B cells, no observable effects of ruxolitinib on the quantified signaling nodes were visible (b, yellow box). Fold-change induction before inhibition and confidence intervals for IC50 values and percent inhibition are not visualized, but are given in Supplementary Results 3. (b) The impact of all inhibitors under all stimulation conditions is shown for IgM+ B cells. (c) The impact of all inhibitors on all cell types after 30 min IFN-A stimulation is shown. Sections highlighted by color are detailed in the main text. 307 Krzywinski, M., Savig, E. Multidimensional data. Nat Methods 10, 595 (2013)
  183. Helpful tips • focus the reader’s attention • focus on

    specific elements in displays of complex data • use a light visual style • use row and column numbers to aid data lookup • organize the presentation of high-dimensional data • leverage existing biological conceptual models • scope of data focused with a narrowed range or a table rearrangement 309 https://material.io/design/communication/data-visualization.html
  184. Spatial: China's maritime routes CHINA Strait of Malacca South China

    Sea P A C I F I C O C E A N A T L A N T I C O C E A N Bab al-Mandab Strait Suez Haifa 25 areas most a ected by a trade disruption Selected land connections to maritime-road ports Chinese trade routes Bagamoyo Djibouti Zeebrugge Tangier Cherchell Ambarli Piraeus Gwadar Kyaukphyu Laem Chabang Chongjin Rason Malé Colombo Aboadze Lagos Walvis Bay Nouakchott China’s maritime-road projects cluster where disruption to its trade would be most costly Sources: Mercator Institute for China Studies; World Bank; Journal of Contemporary China; European Space Agency; US National Centres for Environmental Information; NOAA Geosciences Lab/SOEST, University of Hawaii *Where work is under way with a Chinese organisation that has a majority stake or is being tasked with development or operation 0.01 10 1 5 0.1 0.5 Increase in length of trade routes if closed to Chinese trade, weighted by cargo value, % Maritime-road projects* Chinese maritime shipping routes, The Economist September 28th 2019 101 likely to aid China. They suggest it will be The results were conclusive. After holding China’s “maritime road” Graphic detail 311
  185. Let’s retro-engineer! ➡ Domain knowledge ➡ Task ➡ Data abstraction(s)

    ➡ Visual encoding(s) Domain situation Observe target users using existing tools Visual encoding/interaction idiom Justify design with respect to alternatives Algorithm Measure system time/memory Analyze computational complexity Observe target users after deployment ( ) Measure adoption Analyze results qualitatively Measure human time with lab experiment (lab study) Data/task abstraction Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009) 312
  186. Spatial: When the lights go out Shenyang Seoul Pyongyang N

    O R T H K O R E A C H I N A R U S S I A S O U T H K O R E A 1 10 100 1 0.1 10 100 Nocturnal luminosity per person GDP per person, at PPP* 2012 13 14 15 16 17 18 1,000 2,000 3,000 C H I N A Korean peninsula, nocturnal luminosity March average Lights in Pyongyang dimmed sharply in Sources: World Data Lab in collaboration with the NOMIS Foundation, Vienna University of Economics and Business, and the International Institute for Applied Systems Analysis; Earth Observation Group, NOAA NCEI; “Illuminating economic growth”, by Yingyao Hu and Jiaxiong Yao, IMF working paper, April *Purchasing-power parity †Bank of Korea, - , EIU Nocturnal luminosity v GDP - , by country, log scales North Korea, GDP per person $, at PPP*, converted using Chinese prices Conventional estimates† Luminosity-based estimate 10km 100km The Economist May 4th 2019 85 that among countries with similar lumi- nosity, autocracies reported gdp growth the country has brightened again. International sanctions are unlikely to When the lights North Korea’s economy Graphic detail Issue Date: 04-05-2019 Zone: UKPB Desk: GraphicDetail Output on: 01-05-2019----18:57 Page: GD1 Revision: 0 313
  187. Spatial: Playlists and politics Sources: Vivid Seats; US Census Bureau;

    MIT Elections and Data Science Lab; The Economist ↓ In the Northeast and Midwest, areas whose voters swung towards Donald Trump in stand out for liking hard-edged rock music African-American counties in the South have a particular a nity for hip-hop Latin music is prevalent in Hispanic areas along the Mexican border and in South Florida ↓ Rural mountain and plains states gravitate towards country and folk, as does much of the South Musical preferences mirror America’s demographic and political divides Most popular genre relative to national average, by county, share of live music tickets sold, , % Country/folk Dance/electronica Latin Pop Hip-hop/rap/R&B Rock/alternative Population, , m . . Los Los Angeles Angeles Los Los Angeles Angeles Chicago Chicago The Bronx, New York Miami Houston Houston Chicago Houston Los Angeles The Economist November 16th 2019 81 were more likely than Democrats to prefer dogs to cats, neat desks to messy ones, ac- folk, full of odes to wide-open spaces, pre- vail in plains and mountain states. Yet Playlists and American music Graphic detail Issue Date: 16-11-2019 Zone: UKPB Desk: GraphicDetail Output on: 13-11-2019----19:59 Page: GD1 Revision: 0 314
  188. Spatial: Ice would suffice Arctic sea ice, annual minimum extent

    Relative volume change Observed temperature change by latitude, °C , relative to - average The Arctic is the epicentre of global warming Sources: NSIDC; PIOMAS; NASA; Carbon Brief *Average for ice thicker than cm, data from August †Minimum at Sept th ‡ st- th percentile of values within each latitude band RUSSIA CANADA GREENLAND UNITED STATES A R C T I C O C E A N Arctic Circle . °N North Pole † † . m . m . m Annual minimum extent, km record low 90°S 45°S 45°N 0 90°N 0 2 4 5 3 1 ↑ The Arctic is warming much faster than everywhere else Latitude Equator Antarctic Arctic Average Range‡ Ice thickness* metres Annual minimum area ↑ Stronger jet stream Weaker jet stream According to one theory, a big temperature di erence yields a strong jet stream on a relatively straight path. This forms a barrier that keeps cold air in the Arctic Smaller temperature di erences produce a slower, wavier jet stream Cold air moves south and warm air moves north. Weaker winds slow the movement of weather systems, causing heatwaves and cold snaps to linger Warm air Warm air Cold air Cold air . . . The Economist September 21st 2019 101 white ice does.In turn, this speeds up melt- ing: a classic positive-feedback loop. The more carbon dioxide and methane. These gases can then speed up the greenhouse ef- Ice would suffice The altered Arctic Graphic detail Issue Date: 21-09-2019 Zone: UKPB Desk: GraphicDetail Output on: 18-09-2019----20:40 Page: GD1 Revision: 0 315 Arctic sea ice, annual minimum extent Relative volume change Observed temperature change by latitude, °C , relative to - average The Arctic is the epicentre of global warming Sources: NSIDC; PIOMAS; NASA; Carbon Brief *Average for ice thicker than cm, data from August †Minimum at Sept th ‡ st- th percentile of values within each latitude band RUSSIA CANADA GREENLAND UNITED STATES A R C T I C O C E A N Arctic Circle . °N North Pole † † . m . m . m Annual minimum extent, km record low 90°S 45°S 45°N 0 90°N 0 2 4 5 3 1 ↑ The Arctic is warming much faster than everywhere else Latitude Equator Antarctic Arctic Average Range‡ Ice thickness* metres Annual minimum area ↑ Stronger jet stream Weaker jet stream According to one theory, a big temperature di erence yields a strong jet stream on a relatively straight path. This forms a barrier that keeps cold air in the Arctic Smaller temperature di erences produce a slower, wavier jet stream Cold air moves south and warm air moves north. Weaker winds slow the movement of weather systems, causing heatwaves and cold snaps to linger Warm air Warm air Cold air Cold air . . . The Economist September 21st 2019 101 white ice does.In turn, this speeds up melt- ing: a classic positive-feedback loop. The more carbon dioxide and methane. These gases can then speed up the greenhouse ef- Ice would suffice The altered Arctic Graphic detail Issue Date: 21-09-2019 Zone: UKPB Desk: GraphicDetail Output on: 18-09-2019----20:40 Page: GD1 Revision: 0
  189. Non-spatial: Exalted valley *At July 31st †To Q2 ‡Forecast §To

    Q1 Sources: Datastream from Refinitiv; Bloomberg; BEA; eMarketer; Open Secrets; The Economist Amazon Amazon Microsoft Alphabet Alphabet Apple Alphabet Microsoft Facebook Facebook Facebook Apple Amazon Cisco Intel Top-five tech firms in each month Dotcom bubble IBM US non-financial corporate profits R&D spending among S&P firms US advertising revenue Federal lobbying spending 0 3 6 9 12 2010 12 14 16 18 19† 2010 12 14 16 18 19† 2010 12 14 16 18 19‡ 2010 12 14 16 18 19§ 0 10 20 30 40 0 10 20 30 40 0 0.5 1.0 1.5 2.0 Today’s biggest tech firms have surpassed their predecessors’ peak US technology companies Share of total US stockmarket value, % Top-five technology companies Share of total, % 0 5 10 15 20 25 30 * The Economist August 10th 2019 73 bubble, the industry is more concentrated today: Microsoft, Amazon, Apple, Alphabet $100bn in cash (and more in stock) to buy would-be rivals. Partly as a result, the num- Exalted valley Tech titans Graphic detail Issue Date: 10-08-2019 Zone: UKPB Desk: GraphicDetail Output on: 07-08-2019----19:30 Page: GD1 Revision: 0 316 *At July 31st †To Q2 ‡Forecast §To Q1 Sources: Datastream from Refinitiv; Bloomberg; BEA; eMarketer; Open Secrets; The Economist Amazon Amazon Microsoft Alphabet Alphabet Apple Alphabet Microsoft Facebook Facebook Facebook Apple Amazon Cisco Intel Top-five tech firms in each month Dotcom bubble IBM US non-financial corporate profits R&D spending among S&P firms US advertising revenue Federal lobbying spending 0 3 6 9 12 2010 12 14 16 18 19† 2010 12 14 16 18 19† 2010 12 14 16 18 19‡ 2010 12 14 16 18 19§ 0 10 20 30 40 0 10 20 30 40 0 0.5 1.0 1.5 2.0 Today’s biggest tech firms have surpassed their predecessors’ peak US technology companies Share of total US stockmarket value, % Top-five technology companies Share of total, % 0 5 10 15 20 25 30 * The Economist August 10th 2019 73 bubble, the industry is more concentrated today: Microsoft, Amazon, Apple, Alphabet $100bn in cash (and more in stock) to buy would-be rivals. Partly as a result, the num- Exalted valley Tech titans Graphic detail Issue Date: 10-08-2019 Zone: UKPB Desk: GraphicDetail Output on: 07-08-2019----19:30 Page: GD1 Revision: 0
  190. Non-spatial: Teenage wasteland Share of Americans using platforms at least

    once per month, estimate, by age group Advertising revenue, $bn Estimate Global monthly active users, bn Selected services, Q2 2018 Teenagers are avoiding Facebook, as older users flock to it Sources: eMarketer; KeyBanc Capital Markets; company reports; press reports *Q †Q ‡Estimated from daily active users - to -year-olds to to to + Facebook’s acquisitions of Instagram and WhatsApp have compensated for the greying of its core product 60 40 20 0 80% 60 40 20 0 80% Facebook Instagram Instagram Instagram Facebook Facebook Snapchat Snapchat 2008 17 17 23 2008 17 23 2008 17 23 2008 17 23 23 2008 FORECAST FORECAST 20 19 18 17 16 15 2014 80 60 40 20 0 Owned by Facebook 0 0.5 1.0 1.5 2.0 2.5 Snapchat‡ Reddit Twitter Weibo TikTok Instagram WeChat Facebook Messenger† WhatsApp* Facebook The Economist July 20th 2019 73 Measuring usage of Facebook is tricky: book Messenger, a chat app the company Teenage wasteland Ageing on Facebook Graphic detail Issue Date: 20-07-2019 Zone: UKPB Desk: GraphicDetail Output on: 17-07-2019----20:33 Page: GD1 Revision: 0 317 Share of Americans using platforms at least once per month, estimate, by age group Advertising revenue, $bn Estimate Global monthly active users, bn Selected services, Q2 2018 Teenagers are avoiding Facebook, as older users flock to it Sources: eMarketer; KeyBanc Capital Markets; company reports; press reports *Q †Q ‡Estimated from daily active users - to -year-olds to to to + Facebook’s acquisitions of Instagram and WhatsApp have compensated for the greying of its core product 60 40 20 0 80% 60 40 20 0 80% Facebook Instagram Instagram Instagram Facebook Facebook Snapchat Snapchat 2008 17 17 23 2008 17 23 2008 17 23 2008 17 23 23 2008 FORECAST FORECAST 20 19 18 17 16 15 2014 80 60 40 20 0 Owned by Facebook 0 0.5 1.0 1.5 2.0 2.5 Snapchat‡ Reddit Twitter Weibo TikTok Instagram WeChat Facebook Messenger† WhatsApp* Facebook The Economist July 20th 2019 73 Measuring usage of Facebook is tricky: book Messenger, a chat app the company Teenage wasteland Ageing on Facebook Graphic detail Issue Date: 20-07-2019 Zone: UKPB Desk: GraphicDetail Output on: 17-07-2019----20:33 Page: GD1 Revision: 0 Share of Americans using platforms at least once per month, estimate, by age group Advertising revenue, $bn Estimate Global monthly active users, bn Selected services, Q2 2018 Teenagers are avoiding Facebook, as older users flock to it Sources: eMarketer; KeyBanc Capital Markets; company reports; press reports *Q †Q ‡Estimated from daily active users - to -year-olds to to to + Facebook’s acquisitions of Instagram and WhatsApp have compensated for the greying of its core product 60 40 20 0 80% 60 40 20 0 80% Facebook Instagram Instagram Instagram Facebook Facebook Snapchat Snapchat 2008 17 17 23 2008 17 23 2008 17 23 2008 17 23 23 2008 FORECAST FORECAST 20 19 18 17 16 15 2014 80 60 40 20 0 Owned by Facebook 0 0.5 1.0 1.5 2.0 2.5 Snapchat‡ Reddit Twitter Weibo TikTok Instagram WeChat Facebook Messenger† WhatsApp* Facebook The Economist July 20th 2019 73 Measuring usage of Facebook is tricky: book Messenger, a chat app the company Teenage wasteland Ageing on Facebook Graphic detail Issue Date: 20-07-2019 Zone: UKPB Desk: GraphicDetail Output on: 17-07-2019----20:33 Page: GD1 Revision: 0
  191. Spatial or non-spatial? 100 300 200 70 400 Log scale

    1990 95 05 2000 10 15 Q2 2020 Q1 2019 Australia Canada Germany France Britain Ireland Italy New Zealand Spain United States ← Actual Forecast → % confidence interval Mean forecast House-price forecast, Q % change on a year earlier, real terms Confidence intervals, % 0 -5 -10 5 10 15 50 75 90 95 Median Australia N. Zealand Canada Britain France Germany Ireland Italy US Spain A decade after the financial crisis, house prices are at new highs Sources: OECD; BIS; IMF; national statistics; The Economist Real house prices Q = The Economist June 29th 2019 85 from the oecd and national agencies. And even an inexact forecast provides more in- sight than no forecast at all. As a result, we them, we used a machine-learning algo- rithm called a random forest. This method creates a “forest” of “decision trees”, each As safe as houses Residential property Graphic detail Issue Date: 29-06-2019 Zone: UKPB Desk: GraphicDetail Output on: 26-06-2019----19:34 Page: GD1 Revision: 0 318
  192. 1 10 50 1 10 100 0.1 60 FORECAST Total

    emissions, gigatonnes of CO equivalent 2.5 12.5 Nonetheless, China is so large that it has become the world’s biggest emitter—and will only get bigger China emits far less greenhouse gas per person than Western countries did at the same stage of economic development GDP per person, prices, $’ GDP per person v annual emissions per person - , log scales Annual emissions per person Tonnes of CO equivalent Global trend weighted by population → China now has the same emissions per person as Western countries did in ↑ Economies get more carbon-e cient once they get rich, causing their emissions per person to level o China United States, Britain, France & Germany India & Indonesia Other countries The Economist May 25th 2019 89 change 05-2019----19:36 Page: GD1 Revision: 0 319 1 1 10 100 0.1 0 20 10 40 50 30 60 FORECAST 1850 75 75 25 1900 50 2000 16 30 Total emissions, gigatonnes of CO equivalent 2.5 12.5 Nonetheless, China is so large that it has become the world’s biggest emitter—and will only get bigger Sources: Climate Action Tracker; Climate Watch; University of Groningen Growth and Development Centre; UN Intergovernmental Panel on Climate Change Total annual greenhouse-gas emissions Gigatonnes of CO equivalent GDP per person, prices, $’ ↑ Economies get more carbon-e cient once they get rich, causing their emissions per person to level o India & Indonesia China Other countries India & Indonesia United States, Britain, France & Germany $12,000-16,000 in 2019 dollars have pro- duced a population-weighted average of 10.6 tonnes of carbon dioxide-equivalent gases per person per year. In 2016 China’s gdp per head was $14,000, and it emitted just 9.3 tonnes per person. Moreover, China pollutes far less per person than Western countries did at the same stage of development. When Ameri- ca, France, Britain and Germany had in- comes similar to modern China’s, they re- lied on inefficient power stations and cars, and spewed out 16.6 tonnes per person. The combination of China’s huge popu- lation and rapid gdp growth has nonethe- less made it the world’s biggest emitter of carbon. China is predicted to produce 16bn tonnes of greenhouse gases in 2030—four times the entire world’s output in 1900. To prevent the stock of greenhouse gas- es in the atmosphere from reaching levels likely to cause disastrous warming, China must do better than merely beating the past records of richer countries. Instead, it will need an unprecedented decline in emis- sions per head—at least to the more car- bon-efficient level of similarly rich Latin American economies, and ideally onto the trajectory of poorer Asian giants like India and Indonesia, which rely less on heavy in- dustry and manufacturing. Those coun- tries, perched at the sweltering latitudes where farmers will be most hurt by climate change, must in turn work out how to reach upper-middle-income status without rep- licating China’s emissions path. To their credit, Chinese authorities, spurred by public concern about air pollu- tion, have prioritised green policies, such as switching from coal-fired power sta- tions to renewable sources and setting up an emissions-trading system. China’s an- nual rate of emissions growth has fallen from 9.3% in 2002-11to 0.6% in 2012-16. The waning of its cement-intensive construc- tion boom should slow emissions further. But it will take more than incremental gains to stave off severe warming. 7 Spatial or non-spatial?
  193. More-digitised countries use less cash. Enthusiastic governments can speed things

    along Cash use v internet penetration Sources: Bank of England; World Bank Number of retail cash transactions per person Greece Second-largest shadow economy among rich countries United States Home to the largest technology companies Japan Credit-card market historically protected from foreign competition GDP per person At market prices, $’ 0 250 500 2006 08 10 12 14 16 17 Bank of Korea announces plans for a cashless society by 2020 0 250 500 2006 08 10 12 14 16 17 Instant payment system launches 0 250 500 2006 08 10 12 14 16 17 iDEAL, a bank-backed payment system, is rolled out Amsterdam’s bus system goes cashless 0 250 500 2006 08 10 12 14 16 17 Swish payment system launches South Korea Denmark Netherlands Sweden Internet users, % of population Brazil France Estonia Portugal Sweden Latvia Czech Republic Poland Lithuania Finland Argentina Slovenia Turkey Russia Luxembourg Mexico Norway Bulgaria Australia Belgium Chile S. Africa China Netherlands Peru Thailand Indonesia Canada India Britain Romania Colombia Malaysia Saudi Arabia Hungary Switzerland Hong Kong Slovakia Denmark Philippines Morocco Singapore Italy South Korea Taiwan Spain Germany Ireland Austria 0 20 40 60 80 100 20 40 60 80 100 % of total transactions conducted in cash The Economist August 3rd 2019 73 digitised societies tend to make fewer cash payments. In Nordic countries like Norway MasterCard), tech giants (Apple, Google) and payment apps (PayPal, Venmo). Tossing the coin The cashless economy Graphic detail 320 More-digitised countries use less cash. Enthusiastic governments can speed things along Cash use v internet penetration Sources: Bank of England; World Bank Number of retail cash transactions per person Greece Second-largest shadow economy among rich countries United States Home to the largest technology companies Japan Credit-card market historically protected from foreign competition GDP per person At market prices, $’ 0 250 500 2006 08 10 12 14 16 17 Bank of Korea announces plans for a cashless society by 2020 0 250 500 2006 08 10 12 14 16 17 Instant payment system launches 0 250 500 2006 08 10 12 14 16 17 iDEAL, a bank-backed payment system, is rolled out Amsterdam’s bus system goes cashless 0 250 500 2006 08 10 12 14 16 17 Swish payment system launches South Korea Denmark Netherlands Sweden Internet users, % of population Brazil France Estonia Portugal Sweden Latvia Czech Republic Poland Lithuania Finland Argentina Slovenia Turkey Russia Luxembourg Mexico Norway Bulgaria Australia Belgium Chile S. Africa China Netherlands Peru Thailand Indonesia Canada India Britain Romania Colombia Malaysia Saudi Arabia Hungary Switzerland Hong Kong Slovakia Denmark Philippines Morocco Singapore Italy South Korea Taiwan Spain Germany Ireland Austria 0 20 40 60 80 100 20 40 60 80 100 % of total transactions conducted in cash The Economist August 3rd 2019 73 digitised societies tend to make fewer cash payments. In Nordic countries like Norway MasterCard), tech giants (Apple, Google) and payment apps (PayPal, Venmo). Tossing the coin The cashless economy Graphic detail Spatial or non-spatial?
  194. Data types • each with own inherent structure • specific

    visualization techniques • e.g., gene expression matrix values for cell measurements are meaningful as heat map or parallel coordinate plot • challenge of finding a vis that effectively integrates and combines the data types • understanding patterns and processes in research studies relies on data integration 324
  195. Merging 2+ graphical forms • need for balance between optimal

    representation of one data type versus the other • networks are naturally displayed as node-link diagrams or adjacency matrices • Goal: Discover correlations, common trends, or causal relationships • design depends on what the analysis task calls for Gehlenborg, N., Wong, B. Integrating data. Nat Methods 9, 315 (2012) 325
  196. Summary • suitability of data vis methods strongly depends on

    question • distinct graphing techniques emphasize different data aspects • ability to see data in discrete form enables deeper understanding • useful to have tools that implement all or at least several in one interface • e.g., Cytoscape plugin Cerebral • suitability to switch between data views and analysis tasks 329 Gehlenborg, N., Wong, B. Integrating data. Nat Methods 9, 315 (2012) https://xkcd.com/373/
  197. Data exploration The action of exploring the data and discovering

    patterns using graphical representation 332
  198. Presentation vs Exploration • present known data characteristics • emphasis

    of identified point(s) of interest • explore data to understand structure • suspicion of regularities or patterns • no knowledge of exactly what they are • provide meaningful overviews to find patterns Wikipedia. Patern Recognition. Alisneaky, svg version by User:Zirguezi333
  199. Anscombe’s quartet 334 Shoresh, N., Wong, B. Data exploration. Nat

    Methods 9, 5 (2012) The four sets of numbers in the quartet have many identical summary statistics (mean, variances, etc)
  200. The process • iterative • often overview-first, detail- later •

    graphical organization of the data is guided by expectations and hypotheses • observed patterns refine or germinate new hypotheses • Anscombe’s quartet shows not to rely solely on computational metrics Visualize Transform Model Data Communicate 336
  201. High-dimensional data • exploratory goal to find ‘classes of behavior’

    among multiple components • e.g., genes, populations, samples, etc • create simple representations of low-dimensional data ‘slices’ • useful strategy that restricts complexity • one plot for each component • makes the visual task of finding commonality between plots simpler • ensure consistency (same scale) 337 Shoresh, N., Wong, B. Data exploration. Nat Methods 9, 5 (2012)
  202. Helpful tips • visual burden by simultaneous representation of all

    the data • limit observation number • sample a subset of the data • less important features may be removed • focus on small number of features • attention on essential info • add features to support story Geologicl Maps of Henry Lake Quadrangle, Idaho and Montana, 1972 by Irvin J. Whitkind. USGS339
  203. The story Separation: A hero ventures forth from the world

    of common day into a region of supernatural wonder. Initiation: Fabulous forces are there encountered and a decisive victory is won. Return: The hero comes back from this mysterious adventure with the power to bestow boons on his fellow man. Info we trust, RJ Andrews. Joseph Campbell, 1949 345
  204. Nested model of visualization • domain situation: 
 - who

    are the target users? • abstraction: translate from specifics of domain to vocabulary of vis 
 - what is shown? data abstraction 
 - why is the user looking at it? task abstraction • idiom 
 - how is it shown? 
 + visual encoding idiom: how to draw 
 + interaction idiom: how to manipulate • algorithm for efficient computation alization design algorithm idiom abstraction domain Munzner. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009) 347
  205. Categorical data [Tableau software blog. How we designed the new

    color palettes in Tableau 10? Stone, 2016] Tableau10 350
  206. Natural hierarchies • data points represent genes classified by: 


    - type (gene, non processed pseudogene, processed) 
 - transcription state (on, off) • map salience to relevance • elevate important data using symbols with greater visual weight (fill and/or color) • single color isolates single var
  207. Complexity • focus on meaning instead of structure • anchor

    the figure to relevant domain knowledge content (versus method detail) • which findings are interesting? • what representation would communicate them clearly? • project data onto familiar visual paradigms • e.g., network or pathway to show biological effects • dimensions can be encoded as spatial or visual elements, such as along x and y axes or by color, size or symbol 360 Krzywinski, M., Savig, E. Multidimensional data. Nat Methods 10, 595 (2013)
  208. Small multiples • effective method for presentation • example: Study

    of drug effect on a network of signaling proteins 361 Krzywinski, M., Savig, E. Multidimensional data. Nat Methods 10, 595 (2013)
  209. The process • iterative • often overview-first, detail- later •

    graphical organization of the data is guided by expectations and hypotheses • observed patterns refine or germinate new hypotheses • Anscombe’s quartet shows not to rely solely on computational metrics Visualize Transform Model Data Communicate 363