Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

Beat Signer
February 28, 2024

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

This lecture forms part of the course Information Visualisation given at the Vrije Universiteit Brussel.

Beat Signer

February 28, 2024
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005 Information Visualisation Data Representation Prof. Beat Signer

    Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - [email protected] 2

    February 29, 2024 Information Visualisation Process Data Representation Data Data Presentation Interaction mapping perception and visual thinking
  3. Beat Signer - Department of Computer Science - [email protected] 3

    February 29, 2024 Data Representation and Abstraction ▪ Detailed look at the what part of the earlier what-why-how question → what-why-how analysis framework ▪ Provide a language that is meaningful and useful for vis design ▪ Data is typically described with domain language ▪ in order to find the suitable visual representations, we have to translate the data into more abstract structures that we know how to encode ▪ Data abstraction helps to narrow down the design space
  4. Beat Signer - Department of Computer Science - [email protected] 4

    February 29, 2024 Semantics and Types ▪ Many aspects of vis design driven by the kind of data ▪ semantics (real-world meaning) ▪ types (data as well as datasets) ▪ What do the following datasets represent? 15, 2.7, 27, 27, 15, 10021 Basil, 7, S, Pear
  5. Beat Signer - Department of Computer Science - [email protected] 5

    February 29, 2024 Semantics and Types … [Visualization Analysis & Design, Tamara Munzner, 2014]
  6. Beat Signer - Department of Computer Science - [email protected] 6

    February 29, 2024 Data Types ▪ Item ▪ individual discrete entity ▪ e.g. table row or network node ▪ Attribute ▪ also referred to as variable or dimension ▪ property that can be measured, observed or logged ▪ e.g. price or temperature ▪ Link ▪ relationship between items ▪ e.g. between items (nodes) in a network
  7. Beat Signer - Department of Computer Science - [email protected] 7

    February 29, 2024 Data Types … ▪ Position ▪ spatial data ▪ e.g. location in two-dimensional or three-dimensional space ▪ Grids ▪ sampling continous data in terms of geometric and topological relationships between its cells
  8. Beat Signer - Department of Computer Science - [email protected] 8

    February 29, 2024 Dataset Types ▪ Dataset ▪ collection of information to be analysed ▪ made out of the five data types ▪ complex combinations of basic dataset types are common
  9. Beat Signer - Department of Computer Science - [email protected] 9

    February 29, 2024 Tables ▪ Flat table ▪ row represents an item of data ▪ column represents an attribute of the dataset ▪ a cell contains the value for a given item and attribute ▪ Multidimensional table ▪ indexing into a cell via multiple keys
  10. Beat Signer - Department of Computer Science - [email protected] 10

    February 29, 2024 Networks and Trees ▪ Network (graph) ▪ defines relationships between two or more nodes (items) via links ▪ nodes can have associated attributes ▪ links can have associated attributes ▪ e.g. people and their friendships or gene interaction network ▪ Trees ▪ hierarchical structure without cycles ▪ each child node has one parent node ▪ e.g. company organisation chart or biological tree of life
  11. Beat Signer - Department of Computer Science - [email protected] 11

    February 29, 2024 Fields ▪ Field ▪ each cell contains measurements or calculation from a continous domain ▪ continous data brings along the issues of sampling and interpolation
  12. Beat Signer - Department of Computer Science - [email protected] 12

    February 29, 2024 Fields … ▪ Spatial fields ▪ sampling at spatial positions ▪ e.g. medical scan of a human body or measurements in wind tunnel ▪ if spatial position is given with dataset, we talk about scientific visualisation (scivis) (in contrast to information visualisation (infovis) where the use of space is chosen by the designer) ▪ Grid types ▪ uniform grid: sampling at regular intervals without any need to store the grid geometry or grid topology (connection of cells) ▪ rectilinear grid: supports non-uniform sampling - efficient storage of information with high complexity in some areas and low complexity in others (also store grid geometry)
  13. Beat Signer - Department of Computer Science - [email protected] 13

    February 29, 2024 Fields … ▪ Grid types … ▪ structured grid: enables curvilinear shapes where the geometric location of each cell needs to be specified ▪ unstructured grid: complete flexibility but grid geometry as well as grid topology has to be stored explicitly
  14. Beat Signer - Department of Computer Science - [email protected] 14

    February 29, 2024 Geometry ▪ Information about the shape of items with spatial positions ▪ points and one-dimensional lines or curves ▪ two-dimensional surfaces or regions ▪ three-dimensional volumes ▪ Geometry datasets do not necessarily have attributes ▪ e.g. contours derived from a spatial field or shapes generated from raw geographic data (e.g. boundaries of a forest) ▪ Shown alone or as backdrop for other data
  15. Beat Signer - Department of Computer Science - [email protected] 15

    February 29, 2024 Other Combinations ▪ Cluster ▪ grouping items based on similarity of attributes ▪ Set ▪ unordered group of items ▪ List (array) ▪ ordered group of items ▪ Path ▪ ordered set of segments formed by links connecting nodes in a network ▪ Compound network (multilevel network) ▪ network combined with superimposed tree (with all the nodes of the network as leaves)
  16. Beat Signer - Department of Computer Science - [email protected] 16

    February 29, 2024 Dataset Availability ▪ Static file (offline) ▪ entire dataset is available all at once ▪ Dynamic stream (online) ▪ dataset information trickles in over time ▪ addition, update or deletion of items ▪ adds complexity to the vis process - no longer have all data at a given time
  17. Beat Signer - Department of Computer Science - [email protected] 17

    February 29, 2024 Attribute Types ▪ Categorical (nominal) attributes ▪ no implicit ordering (but often hierarchical structure) ▪ external ordering can be superimposed ▪ e.g. different types of fruits
  18. Beat Signer - Department of Computer Science - [email protected] 18

    February 29, 2024 Attribute Types … ▪ Ordered attributes ▪ ordinal data - well-defined ordering but cannot do full-fledged arithmetic - e.g. t-shirt size ▪ quantitative data - measurement of magnitude that supports arithmetic comparison (integers as well as real numbers) - e.g. height, temperature or stock price ▪ Ordering directions ▪ sequential - homogeneous range from minimum to maximum value - e.g. mountain heights (from sea level to height of Mount Everest) ▪ diverging - e.g. valleys in the sea and mountains on land
  19. Beat Signer - Department of Computer Science - [email protected] 19

    February 29, 2024 Attribute Types … ▪ Ordering directions … ▪ cyclic - values wrap around back to the starting point - e.g. time measurements like the hour of the day or the day of the week ▪ Hierarchical attributes ▪ hierarchical structures within or between multiple attributes ▪ e.g. time series of daily stock prices where time can be aggregated hierarchically (from days to weeks, months and years)
  20. Beat Signer - Department of Computer Science - [email protected] 20

    February 29, 2024 Key Versus Value Semantics ▪ Type of an attribute does not tell us about its semantics ▪ key attribute (independent attribute) represents an index that is used to look up value attributes (dependant attributes) ▪ key attributes can be categorical or ordinal ▪ value attributes can be categorical, ordinal or quantitative ▪ Flat tables ▪ key might be implicit (simply the index of the row) or explicit (attribute within table with unique values) ▪ Multidimensional tables ▪ multiple keys are required to look up an item ▪ combination of all keys must be unique for each item
  21. Beat Signer - Department of Computer Science - [email protected] 21

    February 29, 2024 Example: Order Table [Visualization Analysis & Design, Tamara Munzner, 2014]
  22. Beat Signer - Department of Computer Science - [email protected] 22

    February 29, 2024 Key Versus Value Semantics … ▪ Fields ▪ independent variable to look up dependant variable ▪ multivariate structure - depends on number of value attributes - scalar field: one attribute per cell - vector field: two or more attributes per cell - tensor field: many attributes per cell ▪ multidimensional structure - depends on number of keys - e.g. 2D or 3D fields
  23. Beat Signer - Department of Computer Science - [email protected] 23

    February 29, 2024 Temporal Semantics ▪ Temporal attribute is any kind of information that is related to time ▪ Data about time is complicated to handle ▪ time hierarchy is deeply multiscale (from nanoseconds to hours, decades or millennia) ▪ temporal scales do not all fit into a strict hierarchy (e.g. weeks do not cleanly fit into months) ▪ transformation and aggregation become complex ▪ Time-varying semantics ▪ time is one of the key attributes (opposed to being a value) ▪ Time-series dataset ▪ ordered sequence of time-value pairs
  24. Beat Signer - Department of Computer Science - [email protected] 24

    February 29, 2024 Task Abstraction ▪ Next we have to investigate the why part of the what-why-how analysis framework ▪ what is the goal of using the vis? ▪ Transform task description from domain-specific language into abstract form ▪ enables reasoning about similarities ▪ Who has the goal? ▪ designer of the vis or the end user?
  25. Beat Signer - Department of Computer Science - [email protected] 25

    February 29, 2024 Actions ▪ User goals can be defined by actions at three levels of abstractions ▪ Analyse - consume existing or also produce additional data ▪ Search - what kind of search is involved (are the target and location known)? ▪ Query - need to identify one target, compare some targets or summarise all of the targets?
  26. Beat Signer - Department of Computer Science - [email protected] 26

    February 29, 2024 Analyse ▪ Most common use case for vis is to consume information that has already been generated
  27. Beat Signer - Department of Computer Science - [email protected] 27

    February 29, 2024 Consume ▪ Discover (Explore) ▪ use vis to find new knowledge that was not previously known ▪ serendipitous observation of unexpected data ▪ may be motivated by theories, models or hypotheses ▪ outcome is to generate a new hypothesis or verify (or disconfirm) an existing hypothesis ▪ need for sophisticated interactive vis idioms since we do not know in advance what the user will need to see ▪ note that the why the vis is being used does not dictate the how ▪ Present (Explain) ▪ communication of information, telling a story with data or guiding an audience through a series of cognitive operations - decision making, planning, forecasting or instructional processes ▪ e.g. Gapminder application shown earlier
  28. Beat Signer - Department of Computer Science - [email protected] 28

    February 29, 2024 Consume … ▪ Present (Explain) … ▪ output of a discover session might become input for a present session ▪ Enjoy ▪ casual encounter with vis - not driven by need to verify or generate a hypothesis Name Voyager
  29. Beat Signer - Department of Computer Science - [email protected] 29

    February 29, 2024 Produce ▪ Generate new material which is often immediately used as input for a next instance ▪ Annotate ▪ graphical or textual annotations of existing visualisation elements - annotations of data items might be stored as a new attribute ▪ typically a manual user action ▪ Record ▪ save or capture visualisation elements ▪ screenshots, bookmarks, parameter settings or interaction logs ▪ e.g. graphical history with a snapshot of the output of each task
  30. Beat Signer - Department of Computer Science - [email protected] 30

    February 29, 2024 Produce … ▪ Derive ▪ produce new data elements based on existing data elements ▪ strong relationship between the form of the data (attribute and dataset types) and the vis idioms that are effective at presenting it ▪ derived attributes can be used to extend the dataset - from quantitative to ordinal data (water temperature → cold, warm or hot) - adding latitude and longitude to city names (via lookup in separate DB) - arithmetic operations on existing attributes
  31. Beat Signer - Department of Computer Science - [email protected] 31

    February 29, 2024 Targets ▪ Three high-level targets ▪ Trends ▪ high-level characterisation of a pattern in the data ▪ e.g. increases, decreases, peaks, plateaus, …
  32. Beat Signer - Department of Computer Science - [email protected] 32

    February 29, 2024 Targets … ▪ Outliers ▪ data that does not fit well with the backdrop ▪ Features ▪ task-dependent structures of interest
  33. Beat Signer - Department of Computer Science - [email protected] 33

    February 29, 2024 Targets … ▪ Single attributes ▪ individual values, minimum or maximum, … ▪ Multiple attributes ▪ dependencies, correlations and similarities
  34. Beat Signer - Department of Computer Science - [email protected] 34

    February 29, 2024 Targets … ▪ network topology as well as specific paths
  35. Beat Signer - Department of Computer Science - [email protected] 35

    February 29, 2024 Targets … ▪ understanding and comparing geometric shapes
  36. Beat Signer - Department of Computer Science - [email protected] 36

    February 29, 2024 Search ▪ Lookup ▪ user knows what they are looking for and where it is
  37. Beat Signer - Department of Computer Science - [email protected] 37

    February 29, 2024 Search … ▪ Locate ▪ user knows what they are looking for but does not know where it is ▪ Browse ▪ user does not know exactly what they are looking for but has a location in mind where to look for it ▪ Explore ▪ user does not know what they are looking for and where to search ▪ often beginning from an overview of everything ▪ e.g. searching for outliers in a scatterplot
  38. Beat Signer - Department of Computer Science - [email protected] 38

    February 29, 2024 Query ▪ Once a target or set of targets is found, query these targets to identify, compare or summarise the data
  39. Beat Signer - Department of Computer Science - [email protected] 39

    February 29, 2024 Query … ▪ Identify ▪ if the search returns known targets (lookup or locate) then identify returns their characteristics ▪ if the search returns targets matching particular characteristics (browse or explore) the identify returns specific references ▪ Compare ▪ comparing multiple targets ▪ more difficult than identify task and requires more sophisticated vis idioms to support the user ▪ Summarise (overview) ▪ scope are all possible targets
  40. Beat Signer - Department of Computer Science - [email protected] 40

    February 29, 2024 Exercise 3 ▪ Preprocessing and Data Analysis Using Python
  41. Beat Signer - Department of Computer Science - [email protected] 41

    February 29, 2024 Further Reading ▪ This lecture is mainly based on the book Visualization Analysis & Design ▪ chapter 2 - What: Data Abstraction ▪ chapter 3 - Why: Task Abstraction
  42. Beat Signer - Department of Computer Science - [email protected] 42

    February 29, 2024 References ▪ Visualization Analysis & Design, Tamara Munzner, Taylor & Francis Inc, (Har/Psc edition), May, November 2014, ISBN-13: 978-1466508910 ▪ Name Voyager ▪ https://www.babynamewizard.com/voyager/ ▪ M. Brehmer and T. Munzner, A Multi-Level Typology of Abstract Visualization Tasks, IEEE Transactions on Visualization and Computer Graphics 19(12), 2013 ▪ https://doi.org/10.1109/TVCG.2013.124