February 29, 2024 Data Representation and Abstraction ▪ Detailed look at the what part of the earlier what-why-how question → what-why-how analysis framework ▪ Provide a language that is meaningful and useful for vis design ▪ Data is typically described with domain language ▪ in order to find the suitable visual representations, we have to translate the data into more abstract structures that we know how to encode ▪ Data abstraction helps to narrow down the design space
February 29, 2024 Semantics and Types ▪ Many aspects of vis design driven by the kind of data ▪ semantics (real-world meaning) ▪ types (data as well as datasets) ▪ What do the following datasets represent? 15, 2.7, 27, 27, 15, 10021 Basil, 7, S, Pear
February 29, 2024 Data Types ▪ Item ▪ individual discrete entity ▪ e.g. table row or network node ▪ Attribute ▪ also referred to as variable or dimension ▪ property that can be measured, observed or logged ▪ e.g. price or temperature ▪ Link ▪ relationship between items ▪ e.g. between items (nodes) in a network
February 29, 2024 Data Types … ▪ Position ▪ spatial data ▪ e.g. location in two-dimensional or three-dimensional space ▪ Grids ▪ sampling continous data in terms of geometric and topological relationships between its cells
February 29, 2024 Dataset Types ▪ Dataset ▪ collection of information to be analysed ▪ made out of the five data types ▪ complex combinations of basic dataset types are common
February 29, 2024 Tables ▪ Flat table ▪ row represents an item of data ▪ column represents an attribute of the dataset ▪ a cell contains the value for a given item and attribute ▪ Multidimensional table ▪ indexing into a cell via multiple keys
February 29, 2024 Networks and Trees ▪ Network (graph) ▪ defines relationships between two or more nodes (items) via links ▪ nodes can have associated attributes ▪ links can have associated attributes ▪ e.g. people and their friendships or gene interaction network ▪ Trees ▪ hierarchical structure without cycles ▪ each child node has one parent node ▪ e.g. company organisation chart or biological tree of life
February 29, 2024 Fields ▪ Field ▪ each cell contains measurements or calculation from a continous domain ▪ continous data brings along the issues of sampling and interpolation
February 29, 2024 Fields … ▪ Spatial fields ▪ sampling at spatial positions ▪ e.g. medical scan of a human body or measurements in wind tunnel ▪ if spatial position is given with dataset, we talk about scientific visualisation (scivis) (in contrast to information visualisation (infovis) where the use of space is chosen by the designer) ▪ Grid types ▪ uniform grid: sampling at regular intervals without any need to store the grid geometry or grid topology (connection of cells) ▪ rectilinear grid: supports non-uniform sampling - efficient storage of information with high complexity in some areas and low complexity in others (also store grid geometry)
February 29, 2024 Fields … ▪ Grid types … ▪ structured grid: enables curvilinear shapes where the geometric location of each cell needs to be specified ▪ unstructured grid: complete flexibility but grid geometry as well as grid topology has to be stored explicitly
February 29, 2024 Geometry ▪ Information about the shape of items with spatial positions ▪ points and one-dimensional lines or curves ▪ two-dimensional surfaces or regions ▪ three-dimensional volumes ▪ Geometry datasets do not necessarily have attributes ▪ e.g. contours derived from a spatial field or shapes generated from raw geographic data (e.g. boundaries of a forest) ▪ Shown alone or as backdrop for other data
February 29, 2024 Other Combinations ▪ Cluster ▪ grouping items based on similarity of attributes ▪ Set ▪ unordered group of items ▪ List (array) ▪ ordered group of items ▪ Path ▪ ordered set of segments formed by links connecting nodes in a network ▪ Compound network (multilevel network) ▪ network combined with superimposed tree (with all the nodes of the network as leaves)
February 29, 2024 Dataset Availability ▪ Static file (offline) ▪ entire dataset is available all at once ▪ Dynamic stream (online) ▪ dataset information trickles in over time ▪ addition, update or deletion of items ▪ adds complexity to the vis process - no longer have all data at a given time
February 29, 2024 Attribute Types ▪ Categorical (nominal) attributes ▪ no implicit ordering (but often hierarchical structure) ▪ external ordering can be superimposed ▪ e.g. different types of fruits
February 29, 2024 Attribute Types … ▪ Ordered attributes ▪ ordinal data - well-defined ordering but cannot do full-fledged arithmetic - e.g. t-shirt size ▪ quantitative data - measurement of magnitude that supports arithmetic comparison (integers as well as real numbers) - e.g. height, temperature or stock price ▪ Ordering directions ▪ sequential - homogeneous range from minimum to maximum value - e.g. mountain heights (from sea level to height of Mount Everest) ▪ diverging - e.g. valleys in the sea and mountains on land
February 29, 2024 Attribute Types … ▪ Ordering directions … ▪ cyclic - values wrap around back to the starting point - e.g. time measurements like the hour of the day or the day of the week ▪ Hierarchical attributes ▪ hierarchical structures within or between multiple attributes ▪ e.g. time series of daily stock prices where time can be aggregated hierarchically (from days to weeks, months and years)
February 29, 2024 Key Versus Value Semantics ▪ Type of an attribute does not tell us about its semantics ▪ key attribute (independent attribute) represents an index that is used to look up value attributes (dependant attributes) ▪ key attributes can be categorical or ordinal ▪ value attributes can be categorical, ordinal or quantitative ▪ Flat tables ▪ key might be implicit (simply the index of the row) or explicit (attribute within table with unique values) ▪ Multidimensional tables ▪ multiple keys are required to look up an item ▪ combination of all keys must be unique for each item
February 29, 2024 Key Versus Value Semantics … ▪ Fields ▪ independent variable to look up dependant variable ▪ multivariate structure - depends on number of value attributes - scalar field: one attribute per cell - vector field: two or more attributes per cell - tensor field: many attributes per cell ▪ multidimensional structure - depends on number of keys - e.g. 2D or 3D fields
February 29, 2024 Temporal Semantics ▪ Temporal attribute is any kind of information that is related to time ▪ Data about time is complicated to handle ▪ time hierarchy is deeply multiscale (from nanoseconds to hours, decades or millennia) ▪ temporal scales do not all fit into a strict hierarchy (e.g. weeks do not cleanly fit into months) ▪ transformation and aggregation become complex ▪ Time-varying semantics ▪ time is one of the key attributes (opposed to being a value) ▪ Time-series dataset ▪ ordered sequence of time-value pairs
February 29, 2024 Task Abstraction ▪ Next we have to investigate the why part of the what-why-how analysis framework ▪ what is the goal of using the vis? ▪ Transform task description from domain-specific language into abstract form ▪ enables reasoning about similarities ▪ Who has the goal? ▪ designer of the vis or the end user?
February 29, 2024 Actions ▪ User goals can be defined by actions at three levels of abstractions ▪ Analyse - consume existing or also produce additional data ▪ Search - what kind of search is involved (are the target and location known)? ▪ Query - need to identify one target, compare some targets or summarise all of the targets?
February 29, 2024 Consume ▪ Discover (Explore) ▪ use vis to find new knowledge that was not previously known ▪ serendipitous observation of unexpected data ▪ may be motivated by theories, models or hypotheses ▪ outcome is to generate a new hypothesis or verify (or disconfirm) an existing hypothesis ▪ need for sophisticated interactive vis idioms since we do not know in advance what the user will need to see ▪ note that the why the vis is being used does not dictate the how ▪ Present (Explain) ▪ communication of information, telling a story with data or guiding an audience through a series of cognitive operations - decision making, planning, forecasting or instructional processes ▪ e.g. Gapminder application shown earlier
February 29, 2024 Consume … ▪ Present (Explain) … ▪ output of a discover session might become input for a present session ▪ Enjoy ▪ casual encounter with vis - not driven by need to verify or generate a hypothesis Name Voyager
February 29, 2024 Produce ▪ Generate new material which is often immediately used as input for a next instance ▪ Annotate ▪ graphical or textual annotations of existing visualisation elements - annotations of data items might be stored as a new attribute ▪ typically a manual user action ▪ Record ▪ save or capture visualisation elements ▪ screenshots, bookmarks, parameter settings or interaction logs ▪ e.g. graphical history with a snapshot of the output of each task
February 29, 2024 Produce … ▪ Derive ▪ produce new data elements based on existing data elements ▪ strong relationship between the form of the data (attribute and dataset types) and the vis idioms that are effective at presenting it ▪ derived attributes can be used to extend the dataset - from quantitative to ordinal data (water temperature → cold, warm or hot) - adding latitude and longitude to city names (via lookup in separate DB) - arithmetic operations on existing attributes
February 29, 2024 Targets ▪ Three high-level targets ▪ Trends ▪ high-level characterisation of a pattern in the data ▪ e.g. increases, decreases, peaks, plateaus, …
February 29, 2024 Search … ▪ Locate ▪ user knows what they are looking for but does not know where it is ▪ Browse ▪ user does not know exactly what they are looking for but has a location in mind where to look for it ▪ Explore ▪ user does not know what they are looking for and where to search ▪ often beginning from an overview of everything ▪ e.g. searching for outliers in a scatterplot
February 29, 2024 Query … ▪ Identify ▪ if the search returns known targets (lookup or locate) then identify returns their characteristics ▪ if the search returns targets matching particular characteristics (browse or explore) the identify returns specific references ▪ Compare ▪ comparing multiple targets ▪ more difficult than identify task and requires more sophisticated vis idioms to support the user ▪ Summarise (overview) ▪ scope are all possible targets
February 29, 2024 Further Reading ▪ This lecture is mainly based on the book Visualization Analysis & Design ▪ chapter 2 - What: Data Abstraction ▪ chapter 3 - Why: Task Abstraction
February 29, 2024 References ▪ Visualization Analysis & Design, Tamara Munzner, Taylor & Francis Inc, (Har/Psc edition), May, November 2014, ISBN-13: 978-1466508910 ▪ Name Voyager ▪ https://www.babynamewizard.com/voyager/ ▪ M. Brehmer and T. Munzner, A Multi-Level Typology of Abstract Visualization Tasks, IEEE Transactions on Visualization and Computer Graphics 19(12), 2013 ▪ https://doi.org/10.1109/TVCG.2013.124