April 18, 2024 Visual Encoding of Datasets ▪ So far, we have seen the presentation (visual encoding) of data types (see lecture 5) ▪ Visualisation techniques and visual encoding of datasets ▪ tables ▪ spatial data - geometry - fields ▪ network and trees ▪ We will see different idioms (ways) how information can be visualised
April 18, 2024 Arrange Tables … ▪ Why the arrange design choice? ▪ most crucial visual encoding since the use of space dominates a user's mental model of a dataset ▪ spatial position covers three most effective channels for ordered attributes → primacy of spatial position channels - planar position against a common scale - planar position along an unaligned scale - length ▪ best channel for categorical attributes (grouping items in the same region) is also about spatial position
April 18, 2024 Arrange by Keys and Values ▪ Distinction between key and value attributes introduced earlier is highly relevant for visually encoding table data ▪ Core design choices (idioms) for visually encoding tables depend on the semantics of the table attributes (key or value) ▪ scatterplot: visually encoding two value attributes ▪ bar chart: visually encoding one key and one value attribute ▪ heatmap: visually encoding two keys and one value attribute ▪ … ▪ Keys typically used to define a spatial region for each item in which one or multiple value attributes are shown
April 18, 2024 Express Values ▪ Spatial position channel can be used to visually encode quantitative attributes ▪ each item is encoded with a mark at some position along an axis ▪ additional attributes encoded on the same mark via other non- spatial channels (e.g.colour or size) ▪ glyphs can be used for more complex cases (multiple marks) as shown later in this course
April 18, 2024 Scatterplot Example (Bubble Plot) ▪ Each point mark represents a country ▪ horizontal and vertical positions encoding life expectancy and infant mortality - colour channel for categorical country attribute - size channel for quantitative population attribute (bubble plot) ▪ Highly negatively correlated dataset ▪ downward sloping diagonal
April 18, 2024 Scatterplot Example ▪ Relation between diamond price and weight ▪ Derived attributes (logarithmically scaled) in right figure ▪ strongly positively correlated attributes - calculated regression line often superimposed to support the task
April 18, 2024 Scatterplots Scatterplots What (Data) Table: two quantitative value attributes. Why (Task) Find trends, outliers, distribution, correlation; locate clusters. How (Encode) Express values with horizontal and vertical spatial position and point marks. Scale Hundreds of items.
April 18, 2024 Separate, Order & Align Regions ▪ Categorical attributes have unordered identity semantics ▪ encoding them with spatial position would violate the principle of expressiveness ▪ Use spatial regions to group similar categorical attributes ▪ distribution of regions via three operations - separation into regions (based on categorical attribute) - ordering of regions (based on ordered attribute) - aligning of regions (optional, based on ordered attribute) ▪ One-dimensional list alignment often used for a single key (separation with one region per item) ▪ view itself covers a two-dimensional area - aligned list of items on one spatial dimension - region in which the values are shown on a second dimension (e.g. bar charts)
April 18, 2024 Bar Chart Example ▪ Categorical species key attribute separates the marks along the horizontal spatial axis ▪ alphabetical ordering (left picture) → easy lookup by name ▪ data-driven ordering by weight (right picture) → easier to see data trends ▪ Separate line marks for weight attribute in each region
April 18, 2024 Bar Charts Bar Charts What (Data) Table: one quantitative value attribute, one categorical key attribute. Why (Task) Lookup and compare values. How (Encode) Line marks, express value attribute with aligned vertical position, separate key attribute with horizontal position. Scale Key attribute: dozens to hundreds of levels.
April 18, 2024 Stacked Bar Chart Example ▪ Bars distributed along the x-axis based on combina- tion of processor and pro- cedure ▪ More complex glyph for each bar ▪ multiple sub-bars stacked vertically - uses colour (type of cache miss) as well as length (number of misses) coding - common scale only for lowest bar component → order of stacking is relevant
April 18, 2024 Stacked Bar Charts Stacked Bar Charts What (Data) Multidimensional table: one quantitative value attribute, two categorical key attributes. Why (Task) Part-to-whole relationship, lookup values, find trends. How (Encode) Bar glyph with length-coded subcomponents of value attribute for each category of secondary key attribute. Separate bars by category of primary key attribute. Scale Key attribute (main axis): dozens to hundreds of levels. Key attribute (stacked glyph axis): several to one dozen.
April 18, 2024 Streamgraph Example ▪ Music listening history example ▪ one time series per artist counting the number of times their music was listened to each week ▪ Continuity of the horizontal layers ▪ emphasises legibility of individual streams ▪ deliberate organic silhouette (instead of x-axis as baseline
April 18, 2024 Streamgraphs Streamgraphs What (Data) Multidimensional table: one quantitative value attribute (e.g. counts), one ordered key attribute (time), one categorical key attribute. What (Derived) One quantitative attribute (for computation of layer ordering). Why (Task) Find trends. How (Encode) Use derived geometry showing artist layers across time, layer height encodes counts. Scale Key attributes (time, main axis): hundreds of time points. Key attributes (artists, not always over entire time axis): dozens to hundreds.
April 18, 2024 Dot Chart vs. Line Chart Example ▪ Dot chart (dot plot) and line chart for the same dataset showing a cat's weight over time ▪ trends are emphasised by line charts ▪ dot chart is like a bar chart where the quantitative attribute is encoded with point marks rather than line marks
April 18, 2024 Dot Charts Dot Charts What (Data) Table: one quantitative value attribute, one ordered key attribute. Why (Task) Lookup and compare values. How (Encode) Express value attribute with aligned vertical position and point marks. Separate/order into horizontal regions by key attribute. Scale Key attribute: dozens to hundreds of levels.
April 18, 2024 Line Charts ▪ Line charts should only be used for ordered key attri- butes but not for categorical key attributes! ▪ would imply trends that do not exist (violation of expressiveness principle) ▪ Aspect ratio (width/height) ▪ we are better in judging angles close to 45° ▪ banking to 45° idiom computes the best aspect ratio with as many lines as possible close to 45° Line Charts What (Data) Table: one quantitative value attribute, one ordered key attribute. Why (Task) Show trends. How (Encode) Dot chart with connection marks between dots. Scale Key attribute: hundreds of levels.
April 18, 2024 Matrix Alignment ▪ Datasets with two keys often arranged in a two-dimensional matrix alignment ▪ one key distributed along the rows and one key along the columns ▪ rectangular cell in the matrix shows item values ▪ Examples of matrix alignments are heatmaps or the scatterplot matrix (SPLOM)
April 18, 2024 Heatmaps Heatmaps What (Data) Table: two categorical key attributes, one quantitative value attribute. Why (Task) Find clusters, outliers; summarise. How (Encode) 2D matrix alignment of area marks, diverging colourmap. Scale Items: one million. Categorical attribute levels: hundreds. Quantitative attribute levels: 3 to11.
April 18, 2024 Scatterplot Matrix (SPLOM) ▪ Usually only the lower or upper triangle of the matrix is shown (avoid redundancy) Scatterplot Matrix (SPLOM) What (Data) Table. What (Derived) Ordered key attribute: list of original attributes. Why (Task) Find correlation, trends, outliers. How (Encode) Scatterplots in 2D matrix alignment. Scale Attributes: one dozen. Items: dozens to hundreds.
April 18, 2024 Volumetric Grids and Recursive Subdivison ▪ Volumetric grid aligns data in three dimensions based on three key attributes ▪ typically not recommended for non-spatial (abstract) data due to perceptual problems (e.g.occlusion or perspective distortion) ▪ Recursive Subdivision ▪ recursively subdivides a cell via a list or matrix and thereby supports multiple keys ▪ discussed later in the course
April 18, 2024 Spatial Axis Orientation ▪ Rectilinear layouts ▪ regions or items are distributed along an orthogonal horizontal and vertical axis ▪ Parallel layouts ▪ parallel coordinates can be used to visualise many quantitative attributes at once ▪ provides overview over all attributes - also shows the range of values for individual attributes ▪ Radial layouts ▪ items distributed around a circle using the angle channel in addition to one or multiple linear spatial channels - more efficient in showing periodic patterns
April 18, 2024 Parallel Coordinates Parallel Coordinates What (Data) Table: many value attributes. Why (Task) Find trends, outliers, extremes, correlation. How (Encode) Parallel layout: horizontal spatial position used to separate axes, vertical spatial position used to express value along each aligned axis with connection line marks as segments between them. Scale Attributes: dozens along secondary axis. Items: hundreds.
April 18, 2024 Radial Bar Chart Example Radial Bar Charts What (Data) Table: one quantitative attribute, one categorical attribute. Why (Task) Find periodic patterns. How (Encode) Length coding of line marks; radial layout.
April 18, 2024 Pie Chart Examples Pie chart Bar chart Polar area chart Pie chart Normalised stacked bar chart Relative contributions of parts to a whole Pie chart versus bar chart accuracy Stacked bar chart
April 18, 2024 Pie Charts and Polar Area Charts Pie Charts What (Data) Table: one quantitative attribute, one categorical attribute. Why (Task) Part-whole relationship. How (Encode) Area marks (wedges) with angle channel; radial layout. Scale One dozen categories. Polar Area Charts What (Data) Table: one quantitative attribute, one categorical attribute. Why (Task) Part-whole relationship. How (Encode) Area marks (wedges) with length channel; radial layout. Scale One dozen categories.
April 18, 2024 Normalised Stacked Bar Charts Normalised Stacked Bar Charts What (Data) Multidimensional table: one quantitative value attribute, two categorical key attributes. What (Derived) One quantitative value attribute (normalised version of original attribute). Why (Task) Part-whole relationship. How (Encode) Line marks with length channel; rectilinear layout. Scale One dozen categories for stacked attribute. Several dozen categories for axis attribute.
April 18, 2024 Spatial Layout Density ▪ Layout can be dense or sparse ▪ Dense layout uses small and densely packed marks to provide an overview of as many items as possible ▪ maximally dense layout uses a single pixel for each point mark - only planar position and colour channels can be used ▪ Space-filling layout fills all available space in the view ▪ typically uses area marks for items and containment marks for relationships (e.g.treemaps discussed later) ▪ maximises the available room for colour coding and might offer space for labels ▪ disadvantage: cannot make use of white space in the layout
April 18, 2024 Dense Software Overviews Dense Software Overviews What (Data) Text with numbered lines (source code, test results log). What (Derived) Two quantitative attributes (test execution results). Why (Task) Locate faults, summarise results and coverage. How (Encode) Dense layout. Spatial position and line length from text ordering, Colour channels of hue and brightness. Scale Lines of text: ten thousand.
April 18, 2024 Arrange Spatial Data … ▪ Two main spatial data types ▪ Geometry ▪ Spatial Fields - scalar fields with single value associated with each cell in the field - vector fields with multiple values associated with each cell (e.g.computational fluid dynamics) - tensor fields with matrix associated with each cell capturing more complex structure (e.g. for stress, conductivity etc.) ▪ Given spatial position is the attribute of primary importance ▪ use provided position as the substrate for the visual layout
April 18, 2024 Geometry ▪ Geometric data does not necessarily have attributes associated with it ▪ can be derived from raw source data (e.g.geographic data about the earth) ▪ Geographic Data ▪ derive a geometry dataset based on abstractions (e.g.filtering, aggregation or level of detail) on the underlying raw data ▪ cartographic data (non-spatial information) can be used to size code the marks - e.g.size of point marks representing cities by their population ▪ Other derived geometry data ▪ e.g.based on computations on spatial fields
April 18, 2024 Choropleth Map Choropleth Map What (Data) Geographic geometry data. Table with one quantitative attribute per region. Why (Task) Find clusters. How (Encode) Space: use given geometry for area mark boundaries. Colour: sequential segmented colourmap.
April 18, 2024 Scalar Fields ▪ Scalar spatial field has a single value associated with each spatially defined cell ▪ e.g.data from medical scans with radio-opacity (CT scan) or proton density (MRI scan) ▪ Isocontours use isolines to represent the contours of a particular level of the scalar value. ▪ isolines close together for regions with fast change
April 18, 2024 Topographic Terrain Maps Topographic Terrain Maps What (Data) 2D spatial field; geographic data. What (Derived) Geometry: set of isolines computed from the field. Why (Task) Query shape. How (Encode) Use given geographic data geometry of points, lines, and region marks. Used derived geometry as line marks (blue in example). Scale Dozens of contour levels.
April 18, 2024 Arrange Networks and Trees … ▪ Connections (links) can be represented in different ways ▪ node-link diagrams with explicit connection marks - well-suited for tasks that involve the understanding of the network topology (e.g.shortest path between two nodes or finding all adjacent nodes) ▪ adjacency matrix ▪ enclosure with containment marks - only works for trees but not for networks
April 18, 2024 Force-Directed Placement Example Force-directed placement with size coding of edges Force-directed placement of larger network with size coding of nodes
April 18, 2024 Force-directed Placement ▪ Increased scalability via multilevel network idioms ▪ e.g.multilevel scalable force-directed placement (SFDP) algorithm Force-directed Placement What (Data) Network. Why (Task) Explore topology, locate paths. How (Encode) Point marks for node, connection marks for links. Scale Nodes: dozens to hundreds. Links: hundreds. Node/link density: L<4N.
April 18, 2024 Multilevel Force-Directed Placement (SFDP) Multilevel Force-directed Placement (SFDP) What (Data) Network. What (Derived) Cluster hierarchy on top of original network. Why (Task) Explore topology, locate paths and clusters. How (Encode) Point marks for nodes, connection marks for links. Scale Nodes: 1000-10'000. Links: 1000-10'000. Node/link density: L<4N.
April 18, 2024 Adjacency Matrix View Example ▪ Network nodes laid out along the horizontal and vertical edges of a square region ▪ links between nodes indicated by colouring an area mark in the cell forming the intersection of the two nodes' row and column ▪ additional attribute can be visualised by colouring the matrix cells
April 18, 2024 Adjacency Matrix View ▪ Better scalability than node-link diagrams ▪ Predictability and stability and support for reordering ▪ Drawback: impossible to investigate topological structure Adjacency Matrix View What (Data) Network. What (Derived) Table: network nodes as keys, link status between two nodes as values. Why (Task) "Explore" strongly connected networks. How (Encode) Area marks in 2D matrix alignment. Scale Nodes: 1000. Links: one million.
April 18, 2024 Enclosure ▪ Containment marks are effective for showing complete information about hierarchical structure (instead of pairwise relationships only) ▪ Treemaps as an alternative to node-link diagrams ▪ hierarchical relationships shown via containment rather than connection marks
April 18, 2024 Treemaps Treemaps What (Data) Tree. Why (Task) Query attributes at leaf nodes. How (Encode) Area marks and containment, with rectilinear layout. Scale Leaf nodes: one million. Links: one million.
April 18, 2024 GrouseFlocks Example ▪ Visualisation for compound networks (tree on top of a network) ▪ network nodes form the leaves of the tree ▪ tree encoded via containment on top of the original graph (picture on the right)
April 18, 2024 GrouseFlocks Grouse Flocks What (Data) Network. What (Derived) Cluster hierarchy on top of the original network. How (Encode) Connection marks for original network, containment marks for cluster hierarchy
April 18, 2024 Further Reading ▪ This lecture is mainly based on the book Visualization Analysis & Design ▪ chapter 7 - Arrange Tables ▪ chapter 8 - Arrange Spatial Data ▪ chapter 9 - Arrange Networks and Trees
April 18, 2024 References ▪ Visualization Analysis & Design, Tamara Munzner, Taylor & Francis Inc, (Har/Psc edition), May, November 2014, ISBN-13: 978-1466508910