Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accommodating Big Data in Visual Analytics: Pairing Computation with Cognition

Accommodating Big Data in Visual Analytics: Pairing Computation with Cognition

This talk covers opportunities and challenges related to big data visual analytics, with an emphasis on the kinds of questions developers, designers, and analysts alike must consider in the era of big data visualization.

Presented at AAG 2013 in Los Angeles, CA.

Joshua Stevens

April 12, 2013
Tweet

More Decks by Joshua Stevens

Other Decks in Science

Transcript

  1. Pairing Computation
    with Cognition
    JoshuaStevens AlanMacEachren
    Big Data
    Accommodating
    in Visual Analytics
    Plus: Big Data Social Science
    @ Penn State

    View full-size slide

  2. Outline
    • Introduction + Context
    • Computation: Considering End-users & Cognition
    • Big Data @ Penn State
    • Relating These Topics
    • Questions

    View full-size slide


  3. Big data are more than big.
    “Big data is more than simply a matter of
    size; it is an opportunity to !nd insights
    in new and emerging types of data and
    content...”
    - IBM

    View full-size slide


  4. Big data are more than big.
    Volume Variety Velocity Vinculation

    View full-size slide

  5. • The 4 v’s of ‘big data’ are the norm for
    GIScientists
    • Ex: Climate models, terrain, networks, mobility
    patterns...
    • Many platforms are ready for cloud, AWS, and
    HPC
    Introduction + Context

    View full-size slide

  6. Introduction + Context
    • Cartographers and their tools are ready, too.
    “Among many capabilities the
    HTML5 standard provides,
    there is one crucial for
    improving GIS, and that is
    HTML5 Canvas”
    Ravnić, D. HTML5 Canvas: An Open Standard for High
    Performing GIS Map Visualization in Web Browsers.
    Directions Magazine, April 5, 2012.

    View full-size slide

  7. Computation
    • Computation will in!uence visualization from (at
    least) 2 angles:
    1. Data wrangling, calculation, and storage
    2. Choosing the right technology for
    representation and visualization

    View full-size slide

  8. Computation: Calculation + Storage
    • “Each day, we create 2.5 quintillion bytes of data...” - IBM
    • 12 TB/day are tweets
    • ....more than 2 billion copies of Wikipedia, or
    • Algorithms and statistical techniques are essential
    • Basic example:
    • Clustering is simple and common in GIS, but scales
    poorly with N (i.e., not O(n) or O(n log n))
    • Typical solution: multiple machines (parallel and
    distributed GIS)
    (Stacked floppy disks x 19)

    View full-size slide

  9. Computation: Representation
    • Tools matter.
    • But...choose toolchains over tools.
    From Stevens, J., Smith, J., and M. Idris (2012). NVizABLE: A Network
    Visualization and Big Data Learning Environment.

    View full-size slide

  10. “Attack computation from 3
    perspectives:
    Find success in the middle.
    1: Machine-level
    2: HPC
    3: Representation technology

    View full-size slide

  11. “Attack computation from 3
    perspectives:
    Find success in the middle.
    (maybe)
    1: Machine-level
    2: HPC
    3: Representation technology

    View full-size slide

  12. Approaching Computation
    • Important questions (I think):
    1. Should we divide e"orts between machine-level
    and HPC? Or focus on both simultaneously?
    2. Bigger is (probably) not always better. How do we
    determine when big is big enough?
    3. How should visualization goals in!uence our
    answers?

    View full-size slide

  13. Computation Cognition
    • Many analytical evaluations and UI/UX case
    studies focus on carefully controlled data and
    settings (for obvious reasons)
    • But....such scenarios are rare in big data.
    Bakshy, E. Showing Support for Marriage Equality on Facebook. Facebook Data Science.
    March 29, 2013.

    View full-size slide

  14. • What this means for cognition...
    • Designers must expect the unexpected, then
    build tools that support edge cases (and beyond)
    • Think through computational issues,
    visualization goals, and users’ POV
    • Example: Incremental Visualization
    • Danyel Fisher, Igor Popov, Steven M. Drucker, and MC Schraefel, Trust
    Me, I'm Partially Right: Incremental Visualization Lets Analysts Explore
    Large Datasets Faster, in Proceedings of the 2012 Conference on
    Human Factors in Computing Systems (CHI 2012), 5 May 2012.
    Computation Cognition

    View full-size slide

  15. Fisher et al. (2012).
    Computation Cognition

    View full-size slide

  16. • Key considerations and questions emerge
    1. How do non-experts interpret visualizations that
    continuously change?
    2. Are incremental approaches e"ective in
    geographic displays?
    3. What should it be like to interact with ≥ millions
    of data points?
    • When/if points should be interactive? Tie to computation and
    representation (e.g., Canvas vs SVG)
    Computation Cognition

    View full-size slide


  17. Overview !rst, zoom and !lter,
    then details on demand.
    - Shneiderman (1996)

    View full-size slide


  18. Overview !rst, zoom and !lter,
    then details on demand.
    - Shneiderman (1996)
    Still applicable in the era of big
    data?

    View full-size slide

  19. Big Data @ Penn State
    • NSF IGERT in Big Data Social Science
    • PI: Burt Monroe (Poli Sci)
    • Co-PIs:
    • Alan MacEachren (Geog)
    • Lee Giles (IST)
    • Melissa Hardy (Soc and Demography)
    • Aleksandra Slavkovic (Stats and Public Health)
    • Project Coordinator: Dee Bagshaw
    • www.bdss.psu.edu

    View full-size slide

  20. Big Data @ Penn State
    • 7 PhD Fellows in initial cohort
    • Dual-degree program w/ BDSS-speci$c courses
    • Years 2 and 3
    • Research rotations in year 3
    • Summer externships (at least 1 non-academic)
    Beatrice
    Abiero,
    Health Policy
    and
    Demographics
    Molly
    Ariotti,
    Political Science
    Muhammed
    Idris,
    Political Science
    Jennifer
    (Smith) Mason
    Geography
    Joshua
    Stevens,
    Geography
    Stephanie
    Wilson,
    Human
    Development and
    Family Studies
    Mo Yu,
    Information
    Science and
    Technolgy

    View full-size slide

  21. Big Data @ Penn State
    • Workshops, demos, and hackathons
    • Projects and publications
    Stevens, J., Smith, J., and M. Idris (2012). NVizABLE: A
    Network Visualization and Big Data Learning
    Environment.
    Yanomine, J, and J. Stevens (2012). Political Events in
    Afghanistan: Analysis of 200 Million Events in the
    GDELT Database.
    Presented at AAG by M. Idris on Tuesday and
    upcoming NetMob @ MIT, May 1-3, 2013.
    Reported in Foreign Policy, April 10 2013.
    http://ideas.foreignpolicy.com/posts/2013/04/10/
    what_can_we_learn_from_the_last_200_million_th
    ings_that_happened_in_the_world

    View full-size slide

  22. How does this all relate?
    • Computation must consider visualization goals
    and cognitive impacts
    • No single approach is best, allocation of e"orts
    remains unclear
    • Must prepare students to deal with these issues
    early in their careers

    View full-size slide

  23. “ Thank you.
    [email protected] | @jscarto
    https://speakerdeck.com/jscarto

    View full-size slide