Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enhancing Geospatial Data Collection and Visualisation via Custom Toolkits, Consumer Devices and Mass Participation

Steven Gray
December 04, 2013

Enhancing Geospatial Data Collection and Visualisation via Custom Toolkits, Consumer Devices and Mass Participation

PhD Upgrade Seminar - Steven Gray - 04/12/2013

Steven Gray

December 04, 2013
Tweet

More Decks by Steven Gray

Other Decks in Research

Transcript

  1. Enhancing Geospatial Data
    Collection and Visualisation
    via Custom Toolkits,
    Consumer Devices and
    Mass Participation
    Steven Gray, Research Associate
    [email protected]
    UCL Centre for Advanced Spatial Analysis

    View full-size slide

  2. About Me
    Research Associate (UCL CASA)
    September 2009 -- Present
    Research Associate (University of Glasgow - GIST Dept Computing Science)
    January 2008 -- September 2009
    Txt
    Part-time PhD - Started January 2011
    Projects worked on at CASA
    National e-Infrastructure for Social Science (NeISS)
    JISC (GEMMA)
    Talisman

    View full-size slide

  3. Research Question
    Can targeted data collection and aggregation enhance data visualisation?
    Can mining data from multiple sources derive meaningful
    patterns in social behaviour?
    Sub Questions
    Can we mine large data sets in realtime for specific insights to
    reduce the problem set before building visualisations?
    Main Question

    View full-size slide

  4. Background
    Data Collection
    Utilising the Cloud
    Analysing Data
    New Methods

    View full-size slide

  5. Background
    Data Collection
    Utilising the Cloud
    Analysing Data
    New Methods

    View full-size slide

  6. Collecting data from the crowd.

    View full-size slide

  7. Enter the Social Revolution
    Rise of Social Media

    View full-size slide

  8. Technological Traditional Approach:

    Build an application to collect specific data
    Mappiness EpiCollect Nature Locator
    Collecting Apps

    View full-size slide

  9. Turning API’s into meaning is challenging

    View full-size slide

  10. New Insights for Social Science
    http://www.cosmosproject.net

    View full-size slide

  11. Background
    Data Collection
    Utilising the Cloud
    Analysing Data
    New Methods

    View full-size slide

  12. Talisman Project Goals
    • Develop and extend state of the art geospatial methods in the form of
    new data analysis techniques and new simulation models.
    • Build new methods of data acquisition and visualisation that will help
    illuminate and address key policy challenges at local, national and
    global levels.
    !
    !
    • Improve the uptake and dissemination of skills in geospatial analysis
    through a comprehensive suite of training and capacity-building
    activities.
    • Contribute to the success of the NCRM programme and participate
    fully in its activities. See our Past Events and Upcoming Events pages
    for some examples.
    • Build new methods of data acquisition and visualisation that will help
    illuminate and address key policy challenges at local, national and
    global levels.

    View full-size slide

  13. Custom Endpoint
    Carling Cup
    Internet of
    Schools
    iPad Video
    Wall
    ESRA2013
    Tweet-o-Meter
    CityDashboard
    Internet
    of Me
    QRator
    UKSnow Maps
    Physical TOM
    New City
    Landscapes
    AV
    Referendum
    GEMMA
    Textal
    Analogies
    Olympic
    Collection
    AWS 200 server
    collection
    Usage of Toolkit
    Mobile
    Websites
    Twitter
    EE Project
    DataSift Data
    Comparison
    10 Cities
    Collection
    Collections
    SurveyMapper
    #5Acts
    London Mayor
    Park Survey
    BBC Old Age
    Scottish Water
    Services
    Grant
    Petrie
    Popup
    Brands

    View full-size slide

  14. BigDataToolkit
    Aims

    • A single toolkit for collecting and analysing data

    • Easy to setup, run and collect data

    • Leverage Cloud Computing to power advanced analytics

    • Create a toolkit for the public to collect and process data

    • Analysing unstructured and unlinked data

    • Feed data into models, large processing platforms for further analysis

    Open Source Data Collection Platform
    which is platform agnostic and easy to use

    View full-size slide

  15. SurveyMapper.com
    World/Nation/City/Borough/Ward/Street
    Survey Anything
    Realtime Mapping/Data Download
    !
    !
    Used by The Mayor of London
    BBC - 5000 responses in 1 hour
    Scottish Water
    The Public....
    !
    !
    !
    !
    !

    View full-size slide

  16. BBC Look East Survey - Broadband Speed Test
    !
    !
    !
    !
    !
    !
    !

    View full-size slide

  17. 3 hours - Search: Walkman 7,948 tweets
    Monday 25th 2010 16:00 - Monday 25th 2010 18:00
    Raw Data - Tweets per minute
    Search API vs Streaming API running mean
    Search API Streaming API

    View full-size slide

  18. 3 hours - Search: Walkman 7,948 tweets
    Monday 25th 2010 15:00 - Monday 25th 2010 18:00
    Mean Results - Tweets per minute
    Search API vs Streaming API running mean
    Search API Streaming API

    View full-size slide

  19. In more detail

    View full-size slide

  20. Stats Service
    (Node JS)
    Web Interface
    Desktop App Wrapper
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    What is the BigDataToolkit
    Collection of tools to mine data from API’s

    View full-size slide

  21. Stats Service
    (Node JS)
    Web Interface
    Desktop App Wrapper
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    Twitter Collector
    PID: 6198
    Facebook Collector
    PID: 5390
    What is the BigDataToolkit

    View full-size slide

  22. Background
    Why are we doing this
    Utilising the Cloud
    Analysing Data
    New Methods

    View full-size slide

  23. Collecting on Local Cloud
    Our Setup - Inside the Virtual Machine Manager

    View full-size slide

  24. Collecting on the Local Cloud
    EE Collection

    32 Collectors on 8 servers
    Olympic Collection

    24 Collectors on 6 servers
    9,647,651 records 1,497,696 records

    View full-size slide

  25. 4 collectors per machine - 200 machines on Amazon EC2

    View full-size slide

  26. Stats Service
    (Node JS)
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    Twitter Collector
    PID: 6198
    Facebook Collector
    PID: 5390
    BDTK Host Proxy
    BDTK Job Server
    Web Interface
    Desktop App Wrapper
    BigData Toolkit in the Cloud

    View full-size slide

  27. Background
    Why are we doing this
    Utilising the Cloud
    Analysing Data
    New Methods

    View full-size slide

  28. What is Textal?
    • iPhone App for Text Analysis
    • Explore the relationships between words in the text
    • Tool for the Public (non experts)
    • Launched July 2013
    http://www.textal.org

    View full-size slide

  29. • Create Word Clouds from Text
    • Websites
    • Twitter + Social Media
    • Books
    • Own text
    (Emails,Documents, etc.)
    What is Textal?

    View full-size slide

  30. What is Textal?
    • More than just a Word Cloud
    • Interactive and Dynamic
    • Generates Stats for Each Word
    • Collocations
    • Common Pairs
    • Scrabble Scores
    • Frequency Counts
    !

    View full-size slide

  31. What is Textal?
    • More than just a Word Cloud
    • Interactive and Dynamic
    • Generates Stats for Each Word
    • Collocations
    • Common Pairs
    • Scrabble Scores
    • Frequency Counts
    !

    View full-size slide

  32. What is Textal?
    • More than just a Word Cloud
    • Interactive and Dynamic
    • Generates Stats for Each Word
    • Collocations
    • Common Pairs
    • Scrabble Scores
    • Frequency Counts
    !

    View full-size slide

  33. So what’s the connection?

    View full-size slide

  34. Introducing Smart Collectors
    Data Feedback Loop
    collect data
    process data
    from each area
    collected
    alert user to
    changes in
    collection

    View full-size slide

  35. Background
    Why are we doing this
    Utilising the Cloud
    Analysing Data
    New Methods

    View full-size slide

  36. New Methods for Data Collection

    View full-size slide

  37. SurveyMapperLive

    View full-size slide

  38. SurveyMapperLive

    View full-size slide

  39. Why are Smart Collectors Important?

    View full-size slide

  40. Realtime Processing of Large Datasets
    Stats Service
    (Node JS)
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    Twitter Collector
    PID: 6198
    Facebook Collector
    PID: 5390
    BDTK Host Proxy
    BDTK Job Server
    Web Interface
    Desktop App Wrapper

    View full-size slide

  41. *RRJ
    Mixer 0
    Mixer 1 Mixer 1
    Leaf Leaf Leaf Leaf
    Distributed Storage
    SELECT collector_message, PBLG
    10 GB / s
    COUNT (id)
    GROUP BY collector_message
    WHERE timestamp > CUTOFF
    25'(5%<WLPHVWDPS'(6&
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    Spanner: a Globally-Distributed Database
    James C. Corbett, Jeffrey Dean, et. al
    Published in the Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, October, 2012
    Realtime Processing of Large Datasets

    View full-size slide

  42. *RRJOHFRQȴGHQWLDO_'RQRWGLVWULEXWH
    Mixer 0
    Mixer 1 Mixer 1
    Leaf Leaf Leaf Leaf
    Distributed Storage
    SELECT collector_message, PBLG
    10 GB / s
    COUNT (id)
    GROUP BY collector_message
    WHERE timestamp > CUTOFF
    25'(5%<WLPHVWDPS'(6&
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    Realtime Processing of Large Datasets
    Spanner: a Globally-Distributed Database
    James C. Corbett, Jeffrey Dean, et. al
    Published in the Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, October, 2012

    View full-size slide

  43. Ethical Issues

    View full-size slide

  44. Evaluation of Success

    View full-size slide

  45. Chapter Outline
    1.Introduction
    1.1 A brief history of Geospatial systems
    1.2 Where do they come from
    1.3 Visualisations, Infographics and the web
    1.4 Rise of Open Data and Crowd Sourcing
    1.5 Open Data movement and effects on the Geo community
    1.6 The advent of the API
    1.7 Restrictions of collection data via API's
    1.8 Need for Open Data
    2. Public Participation and Data Collection
    2.1 The problems associated with Data collection
    2.2 Affecting policy change with participation
    2.3 Growth of the Web
    2.4 Traditional Data collection to the Social Network
    2.5 The rise of the social network
    2.6 Commercial Services for collecting social data
    2.7 The problems associated with collection
    2.8 Data Pervasiveness and Automated Participation
    2.9 History of Big Data

    View full-size slide

  46. Chapter Outline
    3. Data Collection, Data Analysis, and Mining
    3.1 Introduction to Applications
    3.2 Application Type Overview
    3.3 Automating Collection
    3.4 Linking to Public Participation
    3.5 Updating the Ladder of Participation
    3.6 Introducing the Geography Engine
    4. Geography Engine
    4.1 Building a Geography Engine
    4.2 Data behind the Geography Engine
    4.3 How the system was built
    5. Applications of the Engine
    4.1 Tweet-o-Meter
    4.2 SurveyMapper
    4.3 How the system was built
    4.4 SurveyMapper Live
    4.5 SurveyMapper Mobile
    4.6 Social Media Collection Suite
    4.7 Gemma - Geospatial Engine for Mass Mapping Applications

    View full-size slide

  47. Chapter Outline
    6. Leveraging Big Data and the Cloud
    6.1 Distributed Systems
    6.2 Feedback Loop
    6.3 Pulling together the Engine and Cloud Computing
    6.4 Communication Methods between Servers
    6.5 Real-time analysis of live data to influence collection
    7. Humanities and the Engine
    7.1 QRator
    7.2 Textal
    7.3 Feedback loop into the Engine
    8. Real-time Data and Exhibition Visualisation
    8.1 CityDashboard
    8.2 iPad Video Wall
    8.3 Tweet-o-Meter wall
    8.4 Real-time Video to Policy change

    View full-size slide

  48. Chapter Outline
    9. Making Sense of Data and the System
    9.1 Impact of Policy
    9.2 Impact of Data
    9.3 Impact of Applications
    10. Conclusions and Future Work

    View full-size slide

  49. Publications
    Exploring the Geography of Communities in Social Networks
    A Comber, M Batty, C Brunsdon, A Hudson-Smith, F Neuhaus, S Gray
    !
    Calibration of a spatial simulation model with volunteered geographical information
    M Birkin, N Malleson, A Hudson-Smith, S Gray, R Milton
    International Journal of Geographical Information Science 25 (8), 1221-1239
    !
    Geographic Analysis of Social Network Data
    M Batty, A Hudson-Smith, F Neuhaus, S Gray
    Proceedings of the Agile 2012 International Conference on Geographic Information Science, 2012
    !
    A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic.
    A Comber, M Batty, C Brunsdon, A Hudson-Smith, F Neuhaus, S Gray
    !
    Text mining with Textal
    S Gray, M Terras
    National Centre for Research Methods
    !
    GEMMA–Making Maps Even Easier
    O O’Brien, S Gray, A Hudson-Smith
    1st European State of the Map

    View full-size slide

  50. Publications
    Enhancing Museum Narratives: Tales of Things and UCL's Grant Museum
    C Ross, M Carnall, A Hudson-Smith, C Warwick, M Terras, S Gray
    Routledge (Book Chapter)
    !
    Engaging the Museum Space: Mobilising Visitor Engagement with Digital Content Creation
    C Ross, S Gray, C Warwick, A Hudson-Smith, M Terras
    24th Joint International Conference of the Association for Literacy and Linguistic Computing and the Association
    for Computers and the Humanities - Digital Humanities 2012
    !
    Experiments with the internet of things in museum space: QRator
    A Hudson-Smith, S Gray, C Ross, R Barthel, M de Jode, C Warwick, M Terras
    Proceedings of the 2012 ACM Conference on Ubiquitous Computing, 1183-1184
    !
    Enhancing Museum Narratives with the QRator Project: a Tasmanian devil, a Platypus and a Dead Man in a Box
    S Gray, C Ross, A Hudson-Smith, M Terras, C Warwick
    Museums and the Web 2012
    !
    The QRator Project: Promoting Personal Meaning Making in Museums
    S Gray, C Ross, A Hudson-Smith, M Terras, C Warwick
    Dimensions May-June 2013

    View full-size slide

  51. Thank you
    [email protected]
    Twitter: @frogo
    Google+: +StevenGray
    http://www.stevenjamesgray.com
    Any Questions
    ?

    View full-size slide