Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

Data Representation - Lecture 3 - Information Visualisation (4019538FNR)

This lecture forms part of the course Information Visualisation given at the Vrije Universiteit Brussel.

Beat Signer
PRO

March 02, 2023
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005
    Information Visualisation
    Data Representation
    Prof. Beat Signer
    Department of Computer Science
    Vrije Universiteit Brussel
    beatsigner.com

    View Slide

  2. Beat Signer - Department of Computer Science - [email protected] 2
    March 2, 2023
    Information Visualisation Process
    Data
    Representation
    Data
    Data
    Presentation
    Interaction
    mapping
    perception and
    visual thinking

    View Slide

  3. Beat Signer - Department of Computer Science - [email protected] 3
    March 2, 2023
    Data Representation and Abstraction
    ▪ Detailed look at the what part of the earlier
    what-why-how question
    → what-why-how analysis framework
    ▪ Provide a language that is meaningful and useful for
    vis design
    ▪ Data is typically described with domain language
    ▪ in order to find the suitable visual representations, we have to
    translate the data into more abstract structures that we know
    how to encode
    ▪ Data abstraction helps to narrow down the design space

    View Slide

  4. Beat Signer - Department of Computer Science - [email protected] 4
    March 2, 2023
    Semantics and Types
    ▪ Many aspects of vis design driven by the kind of data
    ▪ semantics (real-world meaning)
    ▪ types (data as well as datasets)
    ▪ What do the following datasets represent?
    15, 2.7, 27, 27, 15, 10021
    Basil, 7, S, Pear

    View Slide

  5. Beat Signer - Department of Computer Science - [email protected] 5
    March 2, 2023
    Semantics and Types …
    [Visualization Analysis & Design, Tamara Munzner, 2014]

    View Slide

  6. Beat Signer - Department of Computer Science - [email protected] 6
    March 2, 2023
    Data Types
    ▪ Item
    ▪ individual discrete entity
    ▪ e.g. table row or network node
    ▪ Attribute
    ▪ also referred to as variable or dimension
    ▪ property that can be measured, observed or logged
    ▪ e.g. price or temperature
    ▪ Link
    ▪ relationship between items
    ▪ e.g. between items (nodes) in a network

    View Slide

  7. Beat Signer - Department of Computer Science - [email protected] 7
    March 2, 2023
    Data Types …
    ▪ Position
    ▪ spatial data
    ▪ e.g. location in two-dimensional or three-dimensional space
    ▪ Grids
    ▪ sampling continous data in terms of geometric and topological
    relationships between its cells

    View Slide

  8. Beat Signer - Department of Computer Science - [email protected] 8
    March 2, 2023
    Dataset Types
    ▪ Dataset
    ▪ collection of information to be analysed
    ▪ made out of the five data types
    ▪ complex combinations of basic dataset types are common

    View Slide

  9. Beat Signer - Department of Computer Science - [email protected] 9
    March 2, 2023
    Tables
    ▪ Flat table
    ▪ row represents an item of data
    ▪ column represents an attribute of the dataset
    ▪ a cell contains the value for a given item and attribute
    ▪ Multidimensional table
    ▪ indexing into a cell via multiple keys

    View Slide

  10. Beat Signer - Department of Computer Science - [email protected] 10
    March 2, 2023
    Networks and Trees
    ▪ Network (graph)
    ▪ defines relationships between two or more nodes (items) via links
    ▪ nodes can have associated attributes
    ▪ links can have associated attributes
    ▪ e.g. people and their friendships or gene interaction network
    ▪ Trees
    ▪ hierarchical structure without cycles
    ▪ each child node has one parent node
    ▪ e.g. company organisation chart or biological tree of life

    View Slide

  11. Beat Signer - Department of Computer Science - [email protected] 11
    March 2, 2023
    Fields
    ▪ Field
    ▪ each cell contains measurements or calculation from a continous
    domain
    ▪ continous data brings along the issues of sampling and
    interpolation

    View Slide

  12. Beat Signer - Department of Computer Science - [email protected] 12
    March 2, 2023
    Fields …
    ▪ Spatial fields
    ▪ sampling at spatial positions
    ▪ e.g. medical scan of a human body or measurements in
    wind tunnel
    ▪ if spatial position is given with dataset, we talk about scientific
    visualisation (scivis) (in contrast to information visualisation
    (infovis) where the use of space is chosen by the designer)
    ▪ Grid types
    ▪ uniform grid: sampling at regular intervals without any need to
    store the grid geometry or grid topology (connection of cells)
    ▪ rectilinear grid: supports non-uniform sampling
    - efficient storage of information with high complexity in some
    areas and low complexity in others (also store grid geometry)

    View Slide

  13. Beat Signer - Department of Computer Science - [email protected] 13
    March 2, 2023
    Fields …
    ▪ Grid types …
    ▪ structured grid: enables curvilinear shapes where the geometric
    location of each cell needs to be specified
    ▪ unstructured grid: complete flexibility but grid geometry as well as
    grid topology has to be stored explicitly

    View Slide

  14. Beat Signer - Department of Computer Science - [email protected] 14
    March 2, 2023
    Geometry
    ▪ Information about the shape of items with spatial
    positions
    ▪ points and one-dimensional lines or curves
    ▪ two-dimensional surfaces or regions
    ▪ three-dimensional volumes
    ▪ Geometry datasets do not necessarily have attributes
    ▪ e.g. contours derived from a spatial field or shapes generated
    from raw geographic data (e.g. boundaries of a forest)
    ▪ Shown alone or as backdrop for other data

    View Slide

  15. Beat Signer - Department of Computer Science - [email protected] 15
    March 2, 2023
    Other Combinations
    ▪ Cluster
    ▪ grouping items based on similarity of attributes
    ▪ Set
    ▪ unordered group of items
    ▪ List (array)
    ▪ ordered group of items
    ▪ Path
    ▪ ordered set of segments formed by links connecting nodes in a
    network
    ▪ Compound network (multilevel network)
    ▪ network combined with superimposed tree (with all the nodes of
    the network as leaves)

    View Slide

  16. Beat Signer - Department of Computer Science - [email protected] 16
    March 2, 2023
    Dataset Availability
    ▪ Static file (offline)
    ▪ entire dataset is available all at once
    ▪ Dynamic stream (online)
    ▪ dataset information trickles in over time
    ▪ addition, update or deletion of items
    ▪ adds complexity to the vis process
    - no longer have all data at a given time

    View Slide

  17. Beat Signer - Department of Computer Science - [email protected] 17
    March 2, 2023
    Attribute Types
    ▪ Categorical (nominal) attributes
    ▪ no implicit ordering (but often hierarchical structure)
    ▪ external ordering can be superimposed
    ▪ e.g. different types of fruits

    View Slide

  18. Beat Signer - Department of Computer Science - [email protected] 18
    March 2, 2023
    Attribute Types …
    ▪ Ordered attributes
    ▪ ordinal data
    - well-defined ordering but cannot do full-fledged arithmetic
    - e.g. t-shirt size
    ▪ quantitative data
    - measurement of magnitude that supports arithmetic comparison (integers as
    well as real numbers)
    - e.g. height, temperature or stock price
    ▪ Ordering directions
    ▪ sequential
    - homogeneous range from minimum to maximum value
    - e.g. mountain heights (from sea level to height of Mount Everest)
    ▪ diverging
    - e.g. valleys in the sea and mountains on land

    View Slide

  19. Beat Signer - Department of Computer Science - [email protected] 19
    March 2, 2023
    Attribute Types …
    ▪ Ordering directions …
    ▪ cyclic
    - values wrap around back to the starting point
    - e.g. time measurements like the hour of the day or the day of the week
    ▪ Hierarchical attributes
    ▪ hierarchical structures within or between multiple attributes
    ▪ e.g. time series of daily stock prices where time can be
    aggregated hierarchically (from days to weeks, months and years)

    View Slide

  20. Beat Signer - Department of Computer Science - [email protected] 20
    March 2, 2023
    Key Versus Value Semantics
    ▪ Type of an attribute does not tell us about its semantics
    ▪ key attribute (independent attribute) represents an index
    that is used to look up value attributes (dependant
    attributes)
    ▪ key attributes can be categorical or ordinal
    ▪ value attributes can be categorical, ordinal or quantitative
    ▪ Flat tables
    ▪ key might be implicit (simply the index of the row) or explicit
    (attribute within table with unique values)
    ▪ Multidimensional tables
    ▪ multiple keys are required to look up an item
    ▪ combination of all keys must be unique for each item

    View Slide

  21. Beat Signer - Department of Computer Science - [email protected] 21
    March 2, 2023
    Example: Order Table
    [Visualization Analysis & Design, Tamara Munzner, 2014]

    View Slide

  22. Beat Signer - Department of Computer Science - [email protected] 22
    March 2, 2023
    Key Versus Value Semantics …
    ▪ Fields
    ▪ independent variable to look up dependant variable
    ▪ multivariate structure
    - depends on number of value attributes
    - scalar field: one attribute per cell
    - vector field: two or more attributes per cell
    - tensor field: many attributes per cell
    ▪ multidimensional structure
    - depends on number of keys
    - e.g. 2D or 3D fields

    View Slide

  23. Beat Signer - Department of Computer Science - [email protected] 23
    March 2, 2023
    Temporal Semantics
    ▪ Temporal attribute is any kind of information that is
    related to time
    ▪ Data about time is complicated to handle
    ▪ time hierarchy is deeply multiscale (from nanoseconds to hours,
    decades or millennia)
    ▪ temporal scales do not all fit into a strict hierarchy (e.g. weeks do
    not cleanly fit into months)
    ▪ transformation and aggregation become complex
    ▪ Time-varying semantics
    ▪ time is one of the key attributes (opposed to being a value)
    ▪ Time-series dataset
    ▪ ordered sequence of time-value pairs

    View Slide

  24. Beat Signer - Department of Computer Science - [email protected] 24
    March 2, 2023
    Task Abstraction
    ▪ Next we have to investigate the why part of the
    what-why-how analysis framework
    ▪ what is the goal of using the vis?
    ▪ Transform task description from domain-specific
    language into abstract form
    ▪ enables reasoning about similarities
    ▪ Who has the goal?
    ▪ designer of the vis or the end user?

    View Slide

  25. Beat Signer - Department of Computer Science - [email protected] 25
    March 2, 2023
    Actions
    ▪ User goals can be defined by actions at three levels of
    abstractions
    ▪ Analyse
    - consume existing or also produce additional data
    ▪ Search
    - what kind of search is involved (are the target and location known)?
    ▪ Query
    - need to identify one target, compare some targets or summarise
    all of the targets?

    View Slide

  26. Beat Signer - Department of Computer Science - [email protected] 26
    March 2, 2023
    Analyse
    ▪ Most common use case for vis is to consume information
    that has already been generated

    View Slide

  27. Beat Signer - Department of Computer Science - [email protected] 27
    March 2, 2023
    Consume
    ▪ Discover (Explore)
    ▪ use vis to find new knowledge that was not previously known
    ▪ serendipitous observation of unexpected data
    ▪ may be motivated by theories, models or hypotheses
    ▪ outcome is to generate a new hypothesis or verify (or disconfirm)
    an existing hypothesis
    ▪ need for sophisticated interactive vis idioms since we do not know
    in advance what the user will need to see
    ▪ note that the why the vis is being used does not dictate the how
    ▪ Present (Explain)
    ▪ communication of information, telling a story with data or guiding
    an audience through a series of cognitive operations
    - decision making, planning, forecasting or instructional processes
    ▪ e.g. Gapminder application shown earlier

    View Slide

  28. Beat Signer - Department of Computer Science - [email protected] 28
    March 2, 2023
    Consume …
    ▪ Present (Explain) …
    ▪ output of a discover session might become input for
    a present session
    ▪ Enjoy
    ▪ casual encounter with vis
    - not driven by need to verify or generate a hypothesis
    Name Voyager

    View Slide

  29. Beat Signer - Department of Computer Science - [email protected] 29
    March 2, 2023
    Produce
    ▪ Generate new material which is often immediately used
    as input for a next instance
    ▪ Annotate
    ▪ graphical or textual annotations of existing visualisation elements
    - annotations of data items might be stored as a new attribute
    ▪ typically a manual user action
    ▪ Record
    ▪ save or capture visualisation elements
    ▪ screenshots, bookmarks, parameter settings or interaction logs
    ▪ e.g. graphical history with a snapshot of the output of each task

    View Slide

  30. Beat Signer - Department of Computer Science - [email protected] 30
    March 2, 2023
    Produce …
    ▪ Derive
    ▪ produce new data elements based on existing data elements
    ▪ strong relationship between the form of the data (attribute and
    dataset types) and the vis idioms that are effective at presenting it
    ▪ derived attributes can be used to extend the dataset
    - from quantitative to ordinal data (water temperature → cold, warm or hot)
    - adding latitude and longitude to city names (via lookup in separate DB)
    - arithmetic operations on existing attributes

    View Slide

  31. Beat Signer - Department of Computer Science - [email protected] 31
    March 2, 2023
    Targets
    ▪ Three high-level targets
    ▪ Trends
    ▪ high-level characterisation of a pattern in the data
    ▪ e.g. increases, decreases, peaks, plateaus, …

    View Slide

  32. Beat Signer - Department of Computer Science - [email protected] 32
    March 2, 2023
    Targets …
    ▪ Outliers
    ▪ data that does not fit well with the backdrop
    ▪ Features
    ▪ task-dependent structures of interest

    View Slide

  33. Beat Signer - Department of Computer Science - [email protected] 33
    March 2, 2023
    Targets …
    ▪ Single attributes
    ▪ individual values, minimum or maximum, …
    ▪ Multiple attributes
    ▪ dependencies, correlations and similarities

    View Slide

  34. Beat Signer - Department of Computer Science - [email protected] 34
    March 2, 2023
    Targets …
    ▪ network topology as well as specific paths

    View Slide

  35. Beat Signer - Department of Computer Science - [email protected] 35
    March 2, 2023
    Targets …
    ▪ understanding and comparing geometric shapes

    View Slide

  36. Beat Signer - Department of Computer Science - [email protected] 36
    March 2, 2023
    Search
    ▪ Lookup
    ▪ user knows what they are looking for and where it is

    View Slide

  37. Beat Signer - Department of Computer Science - [email protected] 37
    March 2, 2023
    Search …
    ▪ Locate
    ▪ user knows what they are looking for but does not know where it is
    ▪ Browse
    ▪ user does not know exactly what they are looking for but
    has a location in mind where to look for it
    ▪ Explore
    ▪ user does not know what they are looking for and where to search
    ▪ often beginning from an overview of everything
    ▪ e.g. searching for outliers in a scatterplot

    View Slide

  38. Beat Signer - Department of Computer Science - [email protected] 38
    March 2, 2023
    Query
    ▪ Once a target or set of targets is found, query these
    targets to identify, compare or summarise the data

    View Slide

  39. Beat Signer - Department of Computer Science - [email protected] 39
    March 2, 2023
    Query …
    ▪ Identify
    ▪ if the search returns known targets (lookup or locate) then
    identify returns their characteristics
    ▪ if the search returns targets matching particular characteristics
    (browse or explore) the identify returns specific references
    ▪ Compare
    ▪ comparing multiple targets
    ▪ more difficult than identify task and requires more sophisticated
    vis idioms to support the user
    ▪ Summarise (overview)
    ▪ scope are all possible targets

    View Slide

  40. Beat Signer - Department of Computer Science - [email protected] 40
    March 2, 2023
    Exercise 3
    ▪ Preprocessing and Data Analysis Using Python

    View Slide

  41. Beat Signer - Department of Computer Science - [email protected] 41
    March 2, 2023
    Further Reading
    ▪ This lecture is mainly based on the
    book Visualization Analysis & Design
    ▪ chapter 2
    - What: Data Abstraction
    ▪ chapter 3
    - Why: Task Abstraction

    View Slide

  42. Beat Signer - Department of Computer Science - [email protected] 42
    March 2, 2023
    References
    ▪ Visualization Analysis & Design, Tamara
    Munzner, Taylor & Francis Inc, (Har/Psc edition),
    May, November 2014,
    ISBN-13: 978-1466508910
    ▪ Name Voyager
    ▪ https://www.babynamewizard.com/voyager/
    ▪ M. Brehmer and T. Munzner, A Multi-Level Typology of
    Abstract Visualization Tasks, IEEE Transactions on
    Visualization and Computer Graphics 19(12), 2013
    ▪ https://doi.org/10.1109/TVCG.2013.124

    View Slide

  43. 2 December 2005
    Next Lecture
    Analysis and Validation

    View Slide