Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

This lecture forms part of the course Information Visualisation given at the Vrije Universiteit Brussel.

1135dc242dcff3b90ae46fc586ff4da8?s=128

Beat Signer
PRO

February 25, 2021
Tweet

Transcript

  1. 2 December 2005 Information Visualisation Data Representation Prof. Beat Signer

    Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2

    February 25, 2021 Information Visualisation Process Data Representation Data Data Presentation Interaction mapping perception and visual thinking
  3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3

    February 25, 2021 Data Representation and Abstraction ▪ Detailed look at the what part of the earlier what-why-how question → what-why-how analysis framework ▪ Provide a language that is meaningful and useful for vis design ▪ Data is typically described with domain language ▪ in order to find the suitable visual representations, we have to translate the data into more abstract structures that we know how to encode ▪ Data abstraction helps to narrow down the design space
  4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4

    February 25, 2021 Semantics and Types ▪ Many aspects of vis design driven by the kind of data ▪ semantics (real-world meaning) ▪ types (data as well as datasets) ▪ What do the following datasets represent? 15, 2.7, 27, 27, 15, 10021 Basil, 7, S, Pear
  5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5

    February 25, 2021 Semantics and Types … [Visualization Analysis & Design, Tamara Munzner, 2014]
  6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6

    February 25, 2021 Data Types ▪ Item ▪ individual discrete entity ▪ e.g. table row or network node ▪ Attribute ▪ also referred to as variable or dimension ▪ property that can be measured, observed or logged ▪ e.g. price or temperature ▪ Link ▪ relationship between items ▪ e.g. between items (nodes) in a network
  7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7

    February 25, 2021 Data Types … ▪ Position ▪ spatial data ▪ e.g. location in two-dimensional or three-dimensional space ▪ Grids ▪ sampling continous data in terms of geometric and topological relationships between its cells
  8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8

    February 25, 2021 Dataset Types ▪ Dataset ▪ collection of information to be analysed ▪ made out of the five data types ▪ complex combinations of basic dataset types are common
  9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9

    February 25, 2021 Tables ▪ Flat table ▪ row represents an item of data ▪ column represents an attribute of the dataset ▪ a cell contains the value for a given item and attribute ▪ Multidimensional table ▪ indexing into a cell via multiple keys
  10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10

    February 25, 2021 Networks and Trees ▪ Network (graph) ▪ defines relationships between two or more nodes (items) via links ▪ nodes can have associated attributes ▪ links can have associated attributes ▪ e.g. people and their friendships or gene interaction network ▪ Trees ▪ hierarchical structure without cycles ▪ each child node has one parent node ▪ e.g. company organisation chart or biological tree of life
  11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11

    February 25, 2021 Fields ▪ Field ▪ each cell contains measurements or calculation from a continous domain ▪ continous data brings along the issues of sampling and interpolation
  12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12

    February 25, 2021 Fields … ▪ Spatial fields ▪ sampling at spatial positions ▪ e.g. medical scan of a human body or measurements in wind tunnel ▪ if the spatial position is given with the dataset, we talk about scientific visualisation (scivis) (in contrast to information visualisation (infovis) where the use of space is chosen by the designer) ▪ Grid types ▪ uniform grid: sampling at regular intervals without any need to store the grid geometry of grid topology (connection of cells) ▪ rectilinear grid: supports non-uniform sampling - efficient storage of information with high complexity in some areas and low complexity in others (have to store geometric location of each row)
  13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13

    February 25, 2021 Fields … ▪ Grid types … ▪ structured grid: enables curvilinear shapes where the geometric location of each cell needs to be specified ▪ unstructured grid: complete flexibility but grid geometry as well as grid topology has to be stored explicitly
  14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14

    February 25, 2021 Geometry ▪ Information about the shape of items with spatial positions ▪ points and one-dimensional lines or curves ▪ two-dimensional surfaces or regions ▪ three-dimensional volumes ▪ Geometry datasets do not necessarily have attributes ▪ e.g. contours derived from a spatial field or shapes generated from raw geographic data (e.g. boundaries of a forest) ▪ Shown alone or as backdrop for other data
  15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15

    February 25, 2021 Other Combinations ▪ Cluster ▪ grouping items based on similarity of attributes ▪ Set ▪ unordered group of items ▪ List (array) ▪ ordered group of items ▪ Path ▪ ordered set of segments formed by links connecting nodes in a network ▪ Compound network (multilevel network) ▪ network combined with superimposed tree (with all the nodes of the network as leaves)
  16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16

    February 25, 2021 Dataset Availability ▪ Static file (offline) ▪ entire dataset is available all at one ▪ Dynamic stream (online) ▪ dataset information trickles in over time ▪ addition, update or deletion of items ▪ adds complexity to the vis process
  17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17

    February 25, 2021 Attribute Types ▪ Categorical (nominal) attributes ▪ no implicit ordering (but often hierarchical structure) ▪ external ordering can be superimposed ▪ e.g. different types of fruits
  18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18

    February 25, 2021 Attribute Types … ▪ Ordered attributes ▪ ordinal data - well-defined ordering but cannot do full-fledged arithmetic - e.g. t-shirt size ▪ quantitative data - measurement of magnitude that supports arithmetic comparison (integers as well as real numbers) - e.g. height, temperature or stock price ▪ Ordering directions ▪ sequential - homogeneous range from minimum to maximum value - e.g. mountain heights (from sea level to height of Mount Everest) ▪ diverging - e.g. valleys in the sea and mountains on land
  19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19

    February 25, 2021 Attribute Types … ▪ Ordering directions … ▪ cyclic - values wrap around back to the starting point - e.g. time measurements like the hour of the day or the day of the week ▪ Hierarchical attributes ▪ hierarchical structures within or between multiple attributes ▪ e.g. time series of daily stock prices where time can be aggregated hierarchically (from days to weeks, months and years)
  20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20

    February 25, 2021 Key Versus Value Semantics ▪ Type of an attribute does not tell us about its semantics ▪ key attribute (independent attribute) represents an index that is used to look up value attributes (dependant attributes) ▪ key attributes can be categorical or ordinal ▪ value attributes can be categorical, ordinal or quantitative ▪ Flat tables ▪ key might be implicit (simply the index of the row) or explicit (attribute within table with unique values) ▪ Multidimensional tables ▪ multiple keys are required to look up an item ▪ combination of all keys must be unique for each item
  21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21

    February 25, 2021 Example: Order Table [Visualization Analysis & Design, Tamara Munzner, 2014]
  22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22

    February 25, 2021 Key Versus Value Semantics … ▪ Fields ▪ independent variable to lookup dependant variable ▪ multivariate structure - depends on number of value attributes - scalar field: one attribute per cell - vector field: two or more attributes per cell - tensor field: many attributes per cell ▪ multidimensional structure - depends on number of keys - e.g. 2D or 3D fields
  23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23

    February 25, 2021 Temporal Semantics ▪ Temporal attribute is any kind of information that is related to time ▪ Data about time is complicated to handle ▪ time hierarchy is deeply multiscale (from nanoseconds to hours, decades or millennia) ▪ temporal scales do not all fit into a strict hierarchy (e.g. weeks do not cleanly fit into months) ▪ transformation and aggregation become complex ▪ Time-varying semantics ▪ time is one of the key attributes (opposed to being a value) ▪ Time-series dataset ▪ ordered sequence of time-value pairs
  24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24

    February 25, 2021 Task Abstraction ▪ Next we have to investigate the why part of the what-why-how analysis framework ▪ what is the goal of using the vis? ▪ Transform task description from domain-specific language into abstract form ▪ enables reasoning about similarities ▪ Who has the goal? ▪ designer of the vis or the end user?
  25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25

    February 25, 2021 Actions ▪ User goals can be defined by actions at three levels of abstractions ▪ Analyse - consume existing or also produce additional data ▪ Search - what kind of search is involved (are the target and location known)? ▪ Query - need to identify one target, compare some targets or summarise all of the targets?
  26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26

    February 25, 2021 Analyse ▪ Most common use case for vis is to consume information that has already been generated
  27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27

    February 25, 2021 Consume ▪ Discover (Explore) ▪ use vis to find new knowledge that was not previously known ▪ serendipitous observation of unexpected data ▪ may be motivated by theories, models or hypotheses ▪ outcome is to generate a new hypothesis or verify (or disconfirm) an existing hypothesis ▪ need for sophisticated interactive vis idioms since we do not know in advance what the user will need to see ▪ note that the why the vis is being used does not dictate the how ▪ Present (Explain) ▪ communication of information, telling a story with data or guiding an audience through a series of cognitive operations - decision making, planning, forecasting or instructional processes ▪ e.g. Gapminder video shown earlier
  28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28

    February 25, 2021 Consume … ▪ Present (Explain) … ▪ output of a discover session might become input for a present session ▪ Enjoy ▪ casual encounter with vis - not driven by need to verify or generate a hypothesis Name Voyager
  29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29

    February 25, 2021 Produce ▪ Generate new material which is often immediately used as input for a next instance ▪ Annotate ▪ graphical or textual annotations of existing visualisation elements - annotations of data items might be stored as a new attribute ▪ typically a manual user action ▪ Record ▪ save or capture visualisation elements ▪ screenshots, bookmarks, parameter settings or interaction logs ▪ e.g. graphical history with a snapshot of the output of each task
  30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30

    February 25, 2021 Produce … ▪ Derive ▪ produce new data elements based on existing data elements ▪ strong relationship between the form of the data (attribute and dataset types) and the vis idioms that are effective at presenting it ▪ derived attributes can be used to extend the dataset - from quantitative to ordered data (water temperature → cold, warm or hot) - adding latitude and longitude to city names (via lookup in separate DB) - arithmetic operations on existing attributes
  31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31

    February 25, 2021 Targets ▪ Three high-level targets ▪ Trends ▪ high-level characterisation of a pattern in the data ▪ e.g. increases, decreases, peaks, plateaus, …
  32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32

    February 25, 2021 Targets … ▪ Outliers ▪ data that does not fit well with the backdrop ▪ Features ▪ task-dependent structures of interest
  33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33

    February 25, 2021 Targets … ▪ Single attributes ▪ individual values, minimum or maximum, … ▪ Multiple attributes ▪ dependencies, correlations and similarities
  34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34

    February 25, 2021 Targets … ▪ network topology as well as specific paths
  35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35

    February 25, 2021 Targets … ▪ understanding and comparing geometric shapes
  36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36

    February 25, 2021 Search ▪ Lookup ▪ user knows what they are looking for and where it is
  37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37

    February 25, 2021 Search … ▪ Locate ▪ user knows what they are looking for but does not know where it is ▪ Browse ▪ user does not know exactly what they are looking for but has a location in mind where to look for it ▪ Explore ▪ user does not know what they are looking for and where to search ▪ often beginning from an overview of everything ▪ e.g. searching for outliers in a scatterplot
  38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38

    February 25, 2021 Query ▪ Once a target or set of targets is found, query these targets to identify, compare or summarise the data
  39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39

    February 25, 2021 Query … ▪ Identify ▪ if the search returns known targets (lookup or locate) then identify returns their characteristics ▪ if the search returns targets matching particular characteristics (browse or explore) the identify returns specific references ▪ Compare ▪ comparing multiple targets ▪ more difficult than identify task and requires more sophisticated vis idioms to support the user ▪ Summarise (Overview) ▪ scope are all possible targets
  40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40

    February 25, 2021 Exercise 3 ▪ Preprocessing and Data Analysis Using Python
  41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41

    February 25, 2021 Further Reading ▪ This lecture is mainly based on the book Visualization Analysis & Design ▪ chapter 2 - What: Data Abstraction ▪ chapter 3 - Why: Task Abstraction
  42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42

    February 25, 2021 References ▪ Visualization Analysis & Design, Tamara Munzner, Taylor & Francis Inc, (Har/Psc edition), May, November 2014, ISBN-13: 978-1466508910 ▪ Name Voyager ▪ https://www.babynamewizard.com/voyager/ ▪ M. Brehmer and T. Munzner, A Multi-Level Typology of Abstract Visualization Tasks, IEEE Transactions on Visualization and Computer Graphics 19(12), 2013 ▪ https://doi.org/10.1109/TVCG.2013.124
  43. 2 December 2005 Next Lecture Analysis and Validation