Slide 1

Slide 1 text

2 December 2005 Information Visualisation Data Representation Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel beatsigner.com Department of Computer Science Vrije Universiteit Brussel beatsigner.com

Slide 2

Slide 2 text

Beat Signer - Department of Computer Science - [email protected] 2 February 27, 2025 Information Visualisation Process Data Representation Data Data Presentation Interaction mapping perception and visual thinking

Slide 3

Slide 3 text

Beat Signer - Department of Computer Science - [email protected] 3 February 27, 2025 Data Representation and Abstraction ▪ Detailed look at the what part of the earlier what-why-how question → what-why-how analysis framework ▪ Provide a language that is meaningful and useful for vis design ▪ Data is typically described with domain language ▪ in order to find the suitable visual representations, we have to translate the data into more abstract structures that we know how to encode ▪ Data abstraction helps to narrow down the design space

Slide 4

Slide 4 text

Beat Signer - Department of Computer Science - [email protected] 4 February 27, 2025 Semantics and Types ▪ Many aspects of vis design driven by the kind of data ▪ semantics (real-world meaning) ▪ types (data as well as datasets) ▪ What do the following datasets represent? 15, 2.7, 27, 27, 15, 10021 Basil, 7, S, Pear

Slide 5

Slide 5 text

Beat Signer - Department of Computer Science - [email protected] 5 February 27, 2025 Semantics and Types … [Visualization Analysis & Design, Tamara Munzner, 2014]

Slide 6

Slide 6 text

Beat Signer - Department of Computer Science - [email protected] 6 February 27, 2025 Data Types ▪ Item ▪ individual discrete entity ▪ e.g.table row or network node ▪ Attribute ▪ also referred to as variable or dimension ▪ property that can be measured, observed or logged ▪ e.g.price or temperature ▪ Link ▪ relationship between items ▪ e.g.between items (nodes) in a network

Slide 7

Slide 7 text

Beat Signer - Department of Computer Science - [email protected] 7 February 27, 2025 Data Types … ▪ Position ▪ spatial data ▪ e.g.location in two-dimensional or three-dimensional space ▪ Grids ▪ sampling continous data in terms of geometric and topological relationships between its cells

Slide 8

Slide 8 text

Beat Signer - Department of Computer Science - [email protected] 8 February 27, 2025 Dataset Types ▪ Dataset ▪ collection of information to be analysed ▪ made out of the five data types ▪ complex combinations of basic dataset types are common

Slide 9

Slide 9 text

Beat Signer - Department of Computer Science - [email protected] 9 February 27, 2025 Tables ▪ Flat table ▪ row represents an item of data ▪ column represents an attribute of the dataset ▪ a cell contains the value for a given item and attribute ▪ Multidimensional table ▪ indexing into a cell via multiple keys

Slide 10

Slide 10 text

Beat Signer - Department of Computer Science - [email protected] 10 February 27, 2025 Networks and Trees ▪ Network (graph) ▪ defines relationships between two or more nodes (items) via links ▪ nodes can have associated attributes ▪ links can have associated attributes ▪ e.g.people and their friendships or gene interaction network ▪ Trees ▪ hierarchical structure without cycles ▪ each child node has one parent node ▪ e.g.company organisation chart or biological tree of life

Slide 11

Slide 11 text

Beat Signer - Department of Computer Science - [email protected] 11 February 27, 2025 Fields ▪ Field ▪ each cell contains measurements or calculation from a continous domain ▪ continous data brings along the issues of sampling and interpolation

Slide 12

Slide 12 text

Beat Signer - Department of Computer Science - [email protected] 12 February 27, 2025 Fields … ▪ Spatial fields ▪ sampling at spatial positions ▪ e.g.medical scan of a human body or measurements in wind tunnel ▪ if spatial position is given with dataset, we talk about scientific visualisation (scivis) (in contrast to information visualisation (infovis) where the use of space is chosen by the designer) ▪ Grid types ▪ uniform grid: sampling at regular intervals without any need to store the grid geometry or grid topology (connection of cells) ▪ rectilinear grid: supports non-uniform sampling - efficient storage of information with high complexity in some areas and low complexity in others (also store grid geometry)

Slide 13

Slide 13 text

Beat Signer - Department of Computer Science - [email protected] 13 February 27, 2025 Fields … ▪ Grid types … ▪ structured grid: enables curvilinear shapes where the geometric location of each cell needs to be specified ▪ unstructured grid: complete flexibility but grid geometry as well as grid topology has to be stored explicitly

Slide 14

Slide 14 text

Beat Signer - Department of Computer Science - [email protected] 14 February 27, 2025 Geometry ▪ Information about the shape of items with spatial positions ▪ points and one-dimensional lines or curves ▪ two-dimensional surfaces or regions ▪ three-dimensional volumes ▪ Geometry datasets do not necessarily have attributes ▪ e.g.contours derived from a spatial field or shapes generated from raw geographic data (e.g.boundaries of a forest) ▪ Shown alone or as backdrop for other data

Slide 15

Slide 15 text

Beat Signer - Department of Computer Science - [email protected] 15 February 27, 2025 Other Combinations ▪ Cluster ▪ grouping items based on similarity of attributes ▪ Set ▪ unordered group of items ▪ List (array) ▪ ordered group of items ▪ Path ▪ ordered set of segments formed by links connecting nodes in a network ▪ Compound network (multilevel network) ▪ network combined with superimposed tree (with all the nodes of the network as leaves)

Slide 16

Slide 16 text

Beat Signer - Department of Computer Science - [email protected] 16 February 27, 2025 Dataset Availability ▪ Static file (offline) ▪ entire dataset is available all at once ▪ Dynamic stream (online) ▪ dataset information trickles in over time ▪ addition, update or deletion of items ▪ adds complexity to the vis process - no longer have all data at a given time

Slide 17

Slide 17 text

Beat Signer - Department of Computer Science - [email protected] 17 February 27, 2025 Attribute Types ▪ Categorical (nominal) attributes ▪ no implicit ordering (but often hierarchical structure) ▪ external ordering can be superimposed ▪ e.g.different types of fruits

Slide 18

Slide 18 text

Beat Signer - Department of Computer Science - [email protected] 18 February 27, 2025 Attribute Types … ▪ Ordered attributes ▪ ordinal data - well-defined ordering but cannot do full-fledged arithmetic - e.g. t-shirt size ▪ quantitative data - measurement of magnitude that supports arithmetic comparison (integers as well as real numbers) - e.g. height, temperature or stock price ▪ Ordering directions ▪ sequential - homogeneous range from minimum to maximum value - e.g. mountain heights (from sea level to height of Mount Everest) ▪ diverging - e.g. valleys in the sea and mountains on land

Slide 19

Slide 19 text

Beat Signer - Department of Computer Science - [email protected] 19 February 27, 2025 Attribute Types … ▪ Ordering directions … ▪ cyclic - values wrap around back to the starting point - e.g. time measurements like the hour of the day or the day of the week ▪ Hierarchical attributes ▪ hierarchical structures within or between multiple attributes ▪ e.g.time series of daily stock prices where time can be aggregated hierarchically (from days to weeks, months and years)

Slide 20

Slide 20 text

Beat Signer - Department of Computer Science - [email protected] 20 February 27, 2025 Key Versus Value Semantics ▪ Type of an attribute does not tell us about its semantics ▪ key attribute (independent attribute) represents an index that is used to look up value attributes (dependant attributes) ▪ key attributes can be categorical or ordinal ▪ value attributes can be categorical, ordinal or quantitative ▪ Flat tables ▪ key might be implicit (simply the index of the row) or explicit (attribute within table with unique values) ▪ Multidimensional tables ▪ multiple keys are required to look up an item ▪ combination of all keys must be unique for each item

Slide 21

Slide 21 text

Beat Signer - Department of Computer Science - [email protected] 21 February 27, 2025 Example: Order Table [Visualization Analysis & Design, Tamara Munzner, 2014]

Slide 22

Slide 22 text

Beat Signer - Department of Computer Science - [email protected] 22 February 27, 2025 Key Versus Value Semantics … ▪ Fields ▪ independent variable to look up dependant variable ▪ multivariate structure - depends on number of value attributes - scalar field: one attribute per cell - vector field: two or more attributes per cell - tensor field: many attributes per cell ▪ multidimensional structure - depends on number of keys - e.g. 2D or 3D fields

Slide 23

Slide 23 text

Beat Signer - Department of Computer Science - [email protected] 23 February 27, 2025 Temporal Semantics ▪ Temporal attribute is any kind of information that is related to time ▪ Data about time is complicated to handle ▪ time hierarchy is deeply multiscale (from nanoseconds to hours, decades or millennia) ▪ temporal scales do not all fit into a strict hierarchy (e.g.weeks do not cleanly fit into months) ▪ transformation and aggregation become complex ▪ Time-varying semantics ▪ time is one of the key attributes (opposed to being a value) ▪ Time-series dataset ▪ ordered sequence of time-value pairs

Slide 24

Slide 24 text

Beat Signer - Department of Computer Science - [email protected] 24 February 27, 2025 Task Abstraction ▪ Next we have to investigate the why part of the what-why-how analysis framework ▪ what is the goal of using the vis? ▪ Transform task description from domain-specific language into abstract form ▪ enables reasoning about similarities ▪ Who has the goal? ▪ designer of the vis or the end user?

Slide 25

Slide 25 text

Beat Signer - Department of Computer Science - [email protected] 25 February 27, 2025 Actions ▪ User goals can be defined by actions at three levels of abstractions ▪ Analyse - consume existing or also produce additional data ▪ Search - what kind of search is involved (are the target and location known)? ▪ Query - need to identify one target, compare some targets or summarise all of the targets?

Slide 26

Slide 26 text

Beat Signer - Department of Computer Science - [email protected] 26 February 27, 2025 Analyse ▪ Most common use case for vis is to consume information that has already been generated

Slide 27

Slide 27 text

Beat Signer - Department of Computer Science - [email protected] 27 February 27, 2025 Consume ▪ Discover (Explore) ▪ use vis to find new knowledge that was not previously known ▪ serendipitous observation of unexpected data ▪ may be motivated by theories, models or hypotheses ▪ outcome is to generate a new hypothesis or verify (or disconfirm) an existing hypothesis ▪ need for sophisticated interactive vis idioms since we do not know in advance what the user will need to see ▪ note that the why the vis is being used does not dictate the how ▪ Present (Explain) ▪ communication of information, telling a story with data or guiding an audience through a series of cognitive operations - decision making, planning, forecasting or instructional processes ▪ e.g.Gapminder application shown earlier

Slide 28

Slide 28 text

Beat Signer - Department of Computer Science - [email protected] 28 February 27, 2025 Consume … ▪ Present (Explain) … ▪ output of a discover session might become input for a present session ▪ Enjoy ▪ casual encounter with vis - not driven by need to verify or generate a hypothesis Name Voyager

Slide 29

Slide 29 text

Beat Signer - Department of Computer Science - [email protected] 29 February 27, 2025 Produce ▪ Generate new material which is often immediately used as input for a next instance ▪ Annotate ▪ graphical or textual annotations of existing visualisation elements - annotations of data items might be stored as a new attribute ▪ typically a manual user action ▪ Record ▪ save or capture visualisation elements ▪ screenshots, bookmarks, parameter settings or interaction logs ▪ e.g.graphical history with a snapshot of the output of each task

Slide 30

Slide 30 text

Beat Signer - Department of Computer Science - [email protected] 30 February 27, 2025 Produce … ▪ Derive ▪ produce new data elements based on existing data elements ▪ strong relationship between the form of the data (attribute and dataset types) and the vis idioms that are effective at presenting it ▪ derived attributes can be used to extend the dataset - from quantitative to ordinal data (water temperature → cold, warm or hot) - adding latitude and longitude to city names (via lookup in separate DB) - arithmetic operations on existing attributes

Slide 31

Slide 31 text

Beat Signer - Department of Computer Science - [email protected] 31 February 27, 2025 Targets ▪ Three high-level targets ▪ Trends ▪ high-level characterisation of a pattern in the data ▪ e.g.increases, decreases, peaks, plateaus, …

Slide 32

Slide 32 text

Beat Signer - Department of Computer Science - [email protected] 32 February 27, 2025 Targets … ▪ Outliers ▪ data that does not fit well with the backdrop ▪ Features ▪ task-dependent structures of interest

Slide 33

Slide 33 text

Beat Signer - Department of Computer Science - [email protected] 33 February 27, 2025 Targets … ▪ Single attributes ▪ individual values, minimum or maximum, … ▪ Multiple attributes ▪ dependencies, correlations and similarities

Slide 34

Slide 34 text

Beat Signer - Department of Computer Science - [email protected] 34 February 27, 2025 Targets … ▪ network topology as well as specific paths

Slide 35

Slide 35 text

Beat Signer - Department of Computer Science - [email protected] 35 February 27, 2025 Targets … ▪ understanding and comparing geometric shapes

Slide 36

Slide 36 text

Beat Signer - Department of Computer Science - [email protected] 36 February 27, 2025 Search ▪ Lookup ▪ user knows what they are looking for and where it is

Slide 37

Slide 37 text

Beat Signer - Department of Computer Science - [email protected] 37 February 27, 2025 Search … ▪ Locate ▪ user knows what they are looking for but does not know where it is ▪ Browse ▪ user does not know exactly what they are looking for but has a location in mind where to look for it ▪ Explore ▪ user does not know what they are looking for and where to search ▪ often beginning from an overview of everything ▪ e.g.searching for outliers in a scatterplot

Slide 38

Slide 38 text

Beat Signer - Department of Computer Science - [email protected] 38 February 27, 2025 Query ▪ Once a target or set of targets is found, query these targets to identify, compare or summarise the data

Slide 39

Slide 39 text

Beat Signer - Department of Computer Science - [email protected] 39 February 27, 2025 Query … ▪ Identify ▪ if the search returns known targets (lookup or locate) then identify returns their characteristics ▪ if the search returns targets matching particular characteristics (browse or explore) the identify returns specific references ▪ Compare ▪ comparing multiple targets ▪ more difficult than identify task and requires more sophisticated vis idioms to support the user ▪ Summarise (overview) ▪ scope are all possible targets

Slide 40

Slide 40 text

Beat Signer - Department of Computer Science - [email protected] 40 February 27, 2025 Exercise 3 ▪ Preprocessing and Data Analysis Using Python

Slide 41

Slide 41 text

Beat Signer - Department of Computer Science - [email protected] 41 February 27, 2025 Further Reading ▪ This lecture is mainly based on the book Visualization Analysis & Design ▪ chapter 2 - What: Data Abstraction ▪ chapter 3 - Why: Task Abstraction

Slide 42

Slide 42 text

Beat Signer - Department of Computer Science - [email protected] 42 February 27, 2025 References ▪ Visualization Analysis & Design, Tamara Munzner, Taylor & Francis Inc, (Har/Psc edition), May, November 2014, ISBN-13: 978-1466508910 ▪ Name Voyager ▪ https://www.babynamewizard.com/voyager/ ▪ M. Brehmer and T. Munzner, A Multi-Level Typology of Abstract Visualization Tasks, IEEE Transactions on Visualization and Computer Graphics 19(12), 2013 ▪ https://doi.org/10.1109/TVCG.2013.124

Slide 43

Slide 43 text

2 December 2005 Next Lecture Analysis and Validation