Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

1135dc242dcff3b90ae46fc586ff4da8?s=47 Beat Signer
February 25, 2020

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

This lecture forms part of the course Information Visualisation given at the Vrije Universiteit Brussel.

1135dc242dcff3b90ae46fc586ff4da8?s=128

Beat Signer

February 25, 2020
Tweet

Transcript

  1. 2 December 2005 Information Visualisation Data Representation Prof. Beat Signer

    Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2

    February 25, 2020 Information Visualisation Process Data Representation Data Data Presentation Interaction perception and visual thinking mapping
  3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3

    February 25, 2020 Data Representation and Abstraction  Detailed look at the what part of the earlier what-why-how question  what-why-how analysis framework  Provide a language that is meaningful and useful for vis design  Data is typically described with domain language  in order to find the suitable visual representations we have to translate the data into more abstract structures that we know how to encode  Data abstraction helps to narrow down the design space
  4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4

    February 25, 2020 Semantics and Types  Many aspects of vis design driven by the kind of data  semantics (real-world meaning)  types (data as well as datasets)  What do the following datasets represent? 15, 2.7, 27, 27, 15, 10021 Basil, 7, S, Pear
  5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5

    February 25, 2020 Semantics and Types … [Visualization Analysis & Design, Tamara Munzner, 2014]
  6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6

    February 25, 2020 Data Types  Item  individual discrete entity  e.g. table row or network node  Attribute  also referred to as variable or dimension  property that can be measured, observed or logged  e.g. price or temperature  Link  relationship between items  e.g. between items (nodes) in a network
  7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7

    February 25, 2020 Data Types …  Position  spatial data  e.g. location in two-dimensional or three-dimensional space  Grids  sampling continous data in terms of geometric and topological relationships between its cells
  8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8

    February 25, 2020 Dataset Types  Dataset  collection of information to be analysed  made out of the five data types  complex combinations of basic dataset types are common
  9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9

    February 25, 2020 Tables  Flat table  row represents an item of data  column represents an attribute of the dataset  a cell contains the value for a given item and attribute  Multidimensional table  indexing into a cell via multiple keys
  10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10

    February 25, 2020 Networks and Trees  Network (graph)  defines relationships between two or more nodes (items) via links  nodes can have associated attributes  links can have associated attributes  e.g. people and their friendships or gene interaction network  Trees  hierarchical structure without cycles  each child node has one parent node  e.g. company organisation chart or biological tree of life
  11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11

    February 25, 2020 Fields  Field  each cell contains measurements or calculation from a continous domain  continous data brings along the issues of sampling and interpolation
  12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12

    February 25, 2020 Fields …  Spatial fields  sampling at spatial positions  e.g. medical scan of a human body or measurements in wind tunnel  if the spatial position is given with the dataset, we talk about scientific visualisation (scivis) (in contrast to information visualisation (infovis) where the use of space is chosen by the designer)  Grid types  uniform grid: sampling at regular intervals without any need to stored the grid geometry of grid topology (connection of cells)  rectilinear grid: supports non-uniform sampling - efficient storage of information with high complexity in some areas and low complexity in others (have to store geometric location of each row)
  13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13

    February 25, 2020 Fields …  Grid types …  structured grid: enables curvilinear shapes where the geometric location of each cell needs to be specified  unstructured grid: complete flexibility but grid geometry as well as grid topology has to be stored explicitly
  14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14

    February 25, 2020 Geometry  Information about the shape of items with spatial positions  points and one-dimensional lines or curves  two-dimensional surfaces or regions  three-dimensional volumes  Geometry datasets do not necessarily have attributes  e.g. contours derived from a spatial field  Shown alone or as backdrop for other data
  15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15

    February 25, 2020 Other Combinations  Cluster  grouping based on similarity of attributes  Set  unordered group of items  List (array)  ordered group of items  Path  ordered set of segments formed by links connecting nodes in a network  Compound network (multilevel network)  network combined with superimposed tree (with all the nodes of the network as leaves)
  16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16

    February 25, 2020 Dataset Availability  Static file (offline)  entire dataset is available all at one  Dynamic stream (online)  dataset information trickles in over time  addition, update or deletion of items  adds complexity to the vis process
  17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17

    February 25, 2020 Attribute Types  Categorical attributes  no implicit ordering (but often hierarchical structure)  external ordering can be superimposed  e.g. different types of fruits
  18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18

    February 25, 2020 Attribute Types …  Ordered attributes  ordinal data - well-defined ordering but cannot do full-fledged arithmetic - e.g. t-shirt size  quantitative data - measurement of magnitude that supports arithmetic comparison (integers as well as real numbers) - e.g. height, temperature or stock price  Ordering directions  sequential - homogeneous range from minimum to maximum value - e.g. mountain heights (from sea level to height of Mount Everest)  diverging - e.g. valleys in the sea and mountains on land
  19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19

    February 25, 2020 Attribute Types …  Ordering directions  cyclic - values wrap around back to the starting point - e.g. time measurements like the hour of the day or the day of the week  Hierarchical attributes  hierarchical structures within or between multiple attributes  e.g. time series of daily stock prices where time can be aggregated hierarchically (from days to weeks, months and years)
  20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20

    February 25, 2020 Key Versus Value Semantics  Type of an attribute does not tell us about its semantics  key attribute (independent attribute) represents an index that is used to look up value attributes (dependant attributes)  Flat tables  key might be implicit (simply the index of the row) or explicit (attribute within table with unique values)  Multidimensional tables  multiple keys are required to look up an item  combination of all keys must be unique for each item
  21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21

    February 25, 2020 Example: Order Table [Visualization Analysis & Design, Tamara Munzner, 2014]
  22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22

    February 25, 2020 Key Versus Value Semantics …  Fields  independent variable to lookup dependant variable  multivariate structure - depends on number of value attributes - scalar field: one attribute per cell - vector field: two or more attributes per cell - tensor field: many attributes per cell  multidimensional structure - depends on number of keys - e.g. 2D or 3D fields
  23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23

    February 25, 2020 Temporal Semantics  Temporal attribute is any kind of information that related to time  Data about time is complicated to handle  time hierarchy is deeply multiscale (from nanoseconds to hours, decades or millennia)  temporal scales do not all fit into a strict hierarchy (e.g. weeks do not cleanly fit into months)  transformation and aggregation become complex  Time-varying semantics  time is one of the key attributes (opposed to being a value)  Time-series dataset  ordered sequence of time-value pairs
  24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24

    February 25, 2020 Task Abstraction  Next we have to investigate the why part of the what-why-how analysis framework  what is the goal of using the vis?  Transform task description from domain-specific language into abstract form  enables reasoning about similarities  Who has the goal?  designer of the vis or the end user?
  25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25

    February 25, 2020 Actions  User goals can be defined by actions at three levels of abstractions  Analyse - consume existing or also produce additional data  Search - what kind of search is involved (are the target and location known)?  Query - need to identify one target, compare some targets or summarise all of the targets?
  26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26

    February 25, 2020 Analyse  Most common use case for vis is to consume information that has already been generated
  27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27

    February 25, 2020 Consume  Discover (Explore)  use vis to find new knowledge that was not previously known  serendipitous observation of unexpected data  may be motivated by theories, models or hypotheses  outcome is to generate a new hypothesis or verify (or disconfirm) an existing hypothesis  need for sophisticated interactive idioms since we do not know in advance what the user will need to see  note that the why the vis is being used does not dictate how  Present (Explain)  communication of information, telling a story with data or guiding an audience through a series of cognitive operations - decision making, planning, forecasting or instructional processes  e.g. Gapminder video shown earlier
  28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28

    February 25, 2020 Consume …  Present (Explain) …  output of a discover session might become input for a present session  Enjoy  casual encounter with vis - not driven by need to verify or generate a hypothesis Name Voyager
  29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29

    February 25, 2020 Produce  Generate new material which is often immediately used as input for a next instance  Annotate  graphical or textual annotations of existing visualisation elements - annotations of data items might be stored as a new attribute  typically a manual user action  Record  save or capture visualisation elements  screenshots, bookmarks, parameter settings or interaction logs  e.g. graphical history with a snapshot of the output of each task
  30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30

    February 25, 2020 Produce …  Derive  produce new data elements based on existing data elements  strong relationship between the form of the data (attribute and dataset types) and the vis idioms that are effective at presenting it  derived attributes can be used to extend the dataset - from quantitative to ordered data (water temperature  cold, warm or hot) - adding latitude and longitude to city names (via lookup in separate DB) - arithmetic operations on existing attributes
  31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31

    February 25, 2020 Targets  Three high-level targets  Trend  high-level characterisation of a pattern in the data  e.g. increases, decreases, peaks, plateaus, …
  32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32

    February 25, 2020 Targets …  Outliers  data that does not fit well with the backdrop  Features  task-dependant structures of interest
  33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33

    February 25, 2020 Targets …  Single attributes  individual values, minimum or maximum, …  Multiple attributes  dependencies, correlations and similarities
  34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34

    February 25, 2020 Targets …  network topology as well as specific paths
  35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35

    February 25, 2020 Targets …  understanding and comparing geometric shapes
  36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36

    February 25, 2020 Search  Lookup  user knows what they are looking for and where it is
  37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37

    February 25, 2020 Search …  Locate  user knows what they are looking for but does not know where it is  Browse  user does not know exactly what they are looking for but has a location in mind where to look for it  Explore  user does not know what they are looking for and where to search  often beginning from an overview of everything  e.g. searching for outliers in a scatterplot
  38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38

    February 25, 2020 Query  Once a target or set of targets is found, query these targets to identify, compare or summarise the data
  39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39

    February 25, 2020 Query …  Identify  if the search returns known targets (lookup or locate) then identify returns their characteristics  if the search returns targets matching particular characteristics (browse or explore) the identify returns specific references  Compare  comparing multiple targets  more difficult than identify task and requires more sophisticated vis idioms to support the user  Summarise (Overview)  scope are all possible targets
  40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40

    February 25, 2020 Exercise 3  Frameworks Hands-on
  41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41

    February 25, 2020 Further Reading  This lecture is mainly based on the book Visualization Analysis & Design  chapter 2 - What: Data Abstraction  chapter 3 - Why: Task Abstraction
  42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42

    February 25, 2020 References  Visualization Analysis & Design, Tamara Munzner, Taylor & Francis Inc, (Har/Psc edition), May, November 2014, ISBN-13: 978-1466508910  Name Voyager  https://www.babynamewizard.com/voyager#prefix=&sw=both&exact=false  M. Brehmer and T. Munzner, A Multi-Level Typology of Abstract Visualization Tasks, IEEE Transactions on Visualization and Computer Graphics 19(12), 2013  https://doi.org/10.1109/TVCG.2013.124
  43. 2 December 2005 Next Lecture Data Presentation