Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enhancing Geospatial Data Collection and Visualisation via Custom Toolkits, Consumer Devices and Mass Participation

Steven Gray
December 04, 2013

Enhancing Geospatial Data Collection and Visualisation via Custom Toolkits, Consumer Devices and Mass Participation

PhD Upgrade Seminar - Steven Gray - 04/12/2013

Steven Gray

December 04, 2013
Tweet

More Decks by Steven Gray

Other Decks in Research

Transcript

  1. Enhancing Geospatial Data
    Collection and Visualisation
    via Custom Toolkits,
    Consumer Devices and
    Mass Participation
    Steven Gray, Research Associate
    [email protected]
    UCL Centre for Advanced Spatial Analysis

    View Slide

  2. About Me
    Research Associate (UCL CASA)
    September 2009 -- Present
    Research Associate (University of Glasgow - GIST Dept Computing Science)
    January 2008 -- September 2009
    Txt
    Part-time PhD - Started January 2011
    Projects worked on at CASA
    National e-Infrastructure for Social Science (NeISS)
    JISC (GEMMA)
    Talisman

    View Slide

  3. Research Question
    Can targeted data collection and aggregation enhance data visualisation?
    Can mining data from multiple sources derive meaningful
    patterns in social behaviour?
    Sub Questions
    Can we mine large data sets in realtime for specific insights to
    reduce the problem set before building visualisations?
    Main Question

    View Slide

  4. Background
    Data Collection
    Utilising the Cloud
    Analysing Data
    New Methods

    View Slide

  5. Background
    Data Collection
    Utilising the Cloud
    Analysing Data
    New Methods

    View Slide

  6. Collecting data from the crowd.

    View Slide

  7. Enter the Social Revolution
    Rise of Social Media

    View Slide

  8. View Slide

  9. Technological Traditional Approach:

    Build an application to collect specific data
    Mappiness EpiCollect Nature Locator
    Collecting Apps

    View Slide

  10. Turning API’s into meaning is challenging

    View Slide

  11. New Insights for Social Science
    http://www.cosmosproject.net

    View Slide

  12. Background
    Data Collection
    Utilising the Cloud
    Analysing Data
    New Methods

    View Slide

  13. View Slide

  14. Talisman Project Goals
    • Develop and extend state of the art geospatial methods in the form of
    new data analysis techniques and new simulation models.
    • Build new methods of data acquisition and visualisation that will help
    illuminate and address key policy challenges at local, national and
    global levels.
    !
    !
    • Improve the uptake and dissemination of skills in geospatial analysis
    through a comprehensive suite of training and capacity-building
    activities.
    • Contribute to the success of the NCRM programme and participate
    fully in its activities. See our Past Events and Upcoming Events pages
    for some examples.
    • Build new methods of data acquisition and visualisation that will help
    illuminate and address key policy challenges at local, national and
    global levels.

    View Slide

  15. View Slide

  16. Custom Endpoint
    Carling Cup
    Internet of
    Schools
    iPad Video
    Wall
    ESRA2013
    Tweet-o-Meter
    CityDashboard
    Internet
    of Me
    QRator
    UKSnow Maps
    Physical TOM
    New City
    Landscapes
    AV
    Referendum
    GEMMA
    Textal
    Analogies
    Olympic
    Collection
    AWS 200 server
    collection
    Usage of Toolkit
    Mobile
    Websites
    Twitter
    EE Project
    DataSift Data
    Comparison
    10 Cities
    Collection
    Collections
    SurveyMapper
    #5Acts
    London Mayor
    Park Survey
    BBC Old Age
    Scottish Water
    Services
    Grant
    Petrie
    Popup
    Brands

    View Slide

  17. BigDataToolkit
    Aims

    • A single toolkit for collecting and analysing data

    • Easy to setup, run and collect data

    • Leverage Cloud Computing to power advanced analytics

    • Create a toolkit for the public to collect and process data

    • Analysing unstructured and unlinked data

    • Feed data into models, large processing platforms for further analysis

    Open Source Data Collection Platform
    which is platform agnostic and easy to use

    View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. SurveyMapper.com
    World/Nation/City/Borough/Ward/Street
    Survey Anything
    Realtime Mapping/Data Download
    !
    !
    Used by The Mayor of London
    BBC - 5000 responses in 1 hour
    Scottish Water
    The Public....
    !
    !
    !
    !
    !

    View Slide

  24. BBC Look East Survey - Broadband Speed Test
    !
    !
    !
    !
    !
    !
    !

    View Slide

  25. 3 hours - Search: Walkman 7,948 tweets
    Monday 25th 2010 16:00 - Monday 25th 2010 18:00
    Raw Data - Tweets per minute
    Search API vs Streaming API running mean
    Search API Streaming API

    View Slide

  26. 3 hours - Search: Walkman 7,948 tweets
    Monday 25th 2010 15:00 - Monday 25th 2010 18:00
    Mean Results - Tweets per minute
    Search API vs Streaming API running mean
    Search API Streaming API

    View Slide

  27. View Slide

  28. In more detail

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. Stats Service
    (Node JS)
    Web Interface
    Desktop App Wrapper
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    What is the BigDataToolkit
    Collection of tools to mine data from API’s

    View Slide

  34. Stats Service
    (Node JS)
    Web Interface
    Desktop App Wrapper
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    Twitter Collector
    PID: 6198
    Facebook Collector
    PID: 5390
    What is the BigDataToolkit

    View Slide

  35. Background
    Why are we doing this
    Utilising the Cloud
    Analysing Data
    New Methods

    View Slide

  36. View Slide

  37. Collecting on Local Cloud
    Our Setup - Inside the Virtual Machine Manager

    View Slide

  38. Collecting on the Local Cloud
    EE Collection

    32 Collectors on 8 servers
    Olympic Collection

    24 Collectors on 6 servers
    9,647,651 records 1,497,696 records

    View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. View Slide

  44. 4 collectors per machine - 200 machines on Amazon EC2

    View Slide

  45. Stats Service
    (Node JS)
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    Twitter Collector
    PID: 6198
    Facebook Collector
    PID: 5390
    BDTK Host Proxy
    BDTK Job Server
    Web Interface
    Desktop App Wrapper
    BigData Toolkit in the Cloud

    View Slide

  46. Background
    Why are we doing this
    Utilising the Cloud
    Analysing Data
    New Methods

    View Slide

  47. View Slide

  48. What is Textal?
    • iPhone App for Text Analysis
    • Explore the relationships between words in the text
    • Tool for the Public (non experts)
    • Launched July 2013
    http://www.textal.org

    View Slide

  49. • Create Word Clouds from Text
    • Websites
    • Twitter + Social Media
    • Books
    • Own text
    (Emails,Documents, etc.)
    What is Textal?

    View Slide

  50. What is Textal?
    • More than just a Word Cloud
    • Interactive and Dynamic
    • Generates Stats for Each Word
    • Collocations
    • Common Pairs
    • Scrabble Scores
    • Frequency Counts
    !

    View Slide

  51. What is Textal?
    • More than just a Word Cloud
    • Interactive and Dynamic
    • Generates Stats for Each Word
    • Collocations
    • Common Pairs
    • Scrabble Scores
    • Frequency Counts
    !

    View Slide

  52. What is Textal?
    • More than just a Word Cloud
    • Interactive and Dynamic
    • Generates Stats for Each Word
    • Collocations
    • Common Pairs
    • Scrabble Scores
    • Frequency Counts
    !

    View Slide

  53. View Slide

  54. View Slide

  55. So what’s the connection?

    View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. Introducing Smart Collectors
    Data Feedback Loop
    collect data
    process data
    from each area
    collected
    alert user to
    changes in
    collection

    View Slide

  60. View Slide

  61. Background
    Why are we doing this
    Utilising the Cloud
    Analysing Data
    New Methods

    View Slide

  62. New Methods for Data Collection

    View Slide

  63. SurveyMapperLive

    View Slide

  64. SurveyMapperLive

    View Slide

  65. View Slide

  66. View Slide

  67. Why are Smart Collectors Important?

    View Slide

  68. Realtime Processing of Large Datasets
    Stats Service
    (Node JS)
    Local
    Database
    Twitter
    Facebook
    Google+
    Foursquare
    Collector Modules
    Process Proxy
    (Node JS)
    Local Server
    (Node JS)
    Twitter Collector
    PID: 6198
    Facebook Collector
    PID: 5390
    BDTK Host Proxy
    BDTK Job Server
    Web Interface
    Desktop App Wrapper

    View Slide

  69. *RRJ
    Mixer 0
    Mixer 1 Mixer 1
    Leaf Leaf Leaf Leaf
    Distributed Storage
    SELECT collector_message, PBLG
    10 GB / s
    COUNT (id)
    GROUP BY collector_message
    WHERE timestamp > CUTOFF
    25'(5%<WLPHVWDPS'(6&
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    Spanner: a Globally-Distributed Database
    James C. Corbett, Jeffrey Dean, et. al
    Published in the Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, October, 2012
    Realtime Processing of Large Datasets

    View Slide

  70. *RRJOHFRQȴGHQWLDO_'RQRWGLVWULEXWH
    Mixer 0
    Mixer 1 Mixer 1
    Leaf Leaf Leaf Leaf
    Distributed Storage
    SELECT collector_message, PBLG
    10 GB / s
    COUNT (id)
    GROUP BY collector_message
    WHERE timestamp > CUTOFF
    25'(5%<WLPHVWDPS'(6&
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    &2817PBLG
    *5283%<FROOHFWRUBPHVVDJH
    Realtime Processing of Large Datasets
    Spanner: a Globally-Distributed Database
    James C. Corbett, Jeffrey Dean, et. al
    Published in the Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, October, 2012

    View Slide

  71. Ethical Issues

    View Slide

  72. Evaluation of Success

    View Slide

  73. Chapter Outline
    1.Introduction
    1.1 A brief history of Geospatial systems
    1.2 Where do they come from
    1.3 Visualisations, Infographics and the web
    1.4 Rise of Open Data and Crowd Sourcing
    1.5 Open Data movement and effects on the Geo community
    1.6 The advent of the API
    1.7 Restrictions of collection data via API's
    1.8 Need for Open Data
    2. Public Participation and Data Collection
    2.1 The problems associated with Data collection
    2.2 Affecting policy change with participation
    2.3 Growth of the Web
    2.4 Traditional Data collection to the Social Network
    2.5 The rise of the social network
    2.6 Commercial Services for collecting social data
    2.7 The problems associated with collection
    2.8 Data Pervasiveness and Automated Participation
    2.9 History of Big Data

    View Slide

  74. Chapter Outline
    3. Data Collection, Data Analysis, and Mining
    3.1 Introduction to Applications
    3.2 Application Type Overview
    3.3 Automating Collection
    3.4 Linking to Public Participation
    3.5 Updating the Ladder of Participation
    3.6 Introducing the Geography Engine
    4. Geography Engine
    4.1 Building a Geography Engine
    4.2 Data behind the Geography Engine
    4.3 How the system was built
    5. Applications of the Engine
    4.1 Tweet-o-Meter
    4.2 SurveyMapper
    4.3 How the system was built
    4.4 SurveyMapper Live
    4.5 SurveyMapper Mobile
    4.6 Social Media Collection Suite
    4.7 Gemma - Geospatial Engine for Mass Mapping Applications

    View Slide

  75. Chapter Outline
    6. Leveraging Big Data and the Cloud
    6.1 Distributed Systems
    6.2 Feedback Loop
    6.3 Pulling together the Engine and Cloud Computing
    6.4 Communication Methods between Servers
    6.5 Real-time analysis of live data to influence collection
    7. Humanities and the Engine
    7.1 QRator
    7.2 Textal
    7.3 Feedback loop into the Engine
    8. Real-time Data and Exhibition Visualisation
    8.1 CityDashboard
    8.2 iPad Video Wall
    8.3 Tweet-o-Meter wall
    8.4 Real-time Video to Policy change

    View Slide

  76. Chapter Outline
    9. Making Sense of Data and the System
    9.1 Impact of Policy
    9.2 Impact of Data
    9.3 Impact of Applications
    10. Conclusions and Future Work

    View Slide

  77. Publications
    Exploring the Geography of Communities in Social Networks
    A Comber, M Batty, C Brunsdon, A Hudson-Smith, F Neuhaus, S Gray
    !
    Calibration of a spatial simulation model with volunteered geographical information
    M Birkin, N Malleson, A Hudson-Smith, S Gray, R Milton
    International Journal of Geographical Information Science 25 (8), 1221-1239
    !
    Geographic Analysis of Social Network Data
    M Batty, A Hudson-Smith, F Neuhaus, S Gray
    Proceedings of the Agile 2012 International Conference on Geographic Information Science, 2012
    !
    A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic.
    A Comber, M Batty, C Brunsdon, A Hudson-Smith, F Neuhaus, S Gray
    !
    Text mining with Textal
    S Gray, M Terras
    National Centre for Research Methods
    !
    GEMMA–Making Maps Even Easier
    O O’Brien, S Gray, A Hudson-Smith
    1st European State of the Map

    View Slide

  78. Publications
    Enhancing Museum Narratives: Tales of Things and UCL's Grant Museum
    C Ross, M Carnall, A Hudson-Smith, C Warwick, M Terras, S Gray
    Routledge (Book Chapter)
    !
    Engaging the Museum Space: Mobilising Visitor Engagement with Digital Content Creation
    C Ross, S Gray, C Warwick, A Hudson-Smith, M Terras
    24th Joint International Conference of the Association for Literacy and Linguistic Computing and the Association
    for Computers and the Humanities - Digital Humanities 2012
    !
    Experiments with the internet of things in museum space: QRator
    A Hudson-Smith, S Gray, C Ross, R Barthel, M de Jode, C Warwick, M Terras
    Proceedings of the 2012 ACM Conference on Ubiquitous Computing, 1183-1184
    !
    Enhancing Museum Narratives with the QRator Project: a Tasmanian devil, a Platypus and a Dead Man in a Box
    S Gray, C Ross, A Hudson-Smith, M Terras, C Warwick
    Museums and the Web 2012
    !
    The QRator Project: Promoting Personal Meaning Making in Museums
    S Gray, C Ross, A Hudson-Smith, M Terras, C Warwick
    Dimensions May-June 2013

    View Slide

  79. Thank you
    [email protected]
    Twitter: @frogo
    Google+: +StevenGray
    http://www.stevenjamesgray.com
    Any Questions
    ?

    View Slide