Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Sense of IoT Data w/ Big Data + Data Science

charles-cai
November 25, 2015

Making Sense of IoT Data w/ Big Data + Data Science

IoT = Big Data

 In this talk we are going to discuss the latest development in Big Data, Machine Learning and Data Science and the latest IoT use cases in healthcare new drug trial, geospatial mapping, disaster relief, retails and insurance etc to cover life cycle of IoT data analytics: capturing, storing, cleansing, analysing, predicting and maintaining…
- There will be 30~50 billion Internet connected devices in 5 years
- How IoT can drive innovations in various industries
- IoT = Big Data, how open source big data eco-system supports IoT Driven business cases

charles-cai

November 25, 2015
Tweet

More Decks by charles-cai

Other Decks in Science

Transcript

  1. Making Sense of IoT Data w/ Big Data + Data

    Science Charles Cai - The views expressed here are of my own and not my employer
  2. Making Sense of IoT w/ Big Data + Data Science

    u IoT = Big Data u  In this talk we are going to discuss the latest development in Big Data, Machine Learning and Data Science and the latest IoT use cases in healthcare new drug trial, geospatial mapping, disaster relief, retails and insurance etc to cover life cycle of IoT data analytics: capturing, storing, cleansing, analysing, predicting and maintaining… u  There will be 30~50 billion Internet connected devices in 5 years u  How IoT can drive innovations in various industries u  IoT = Big Data, how open source big data eco-system supports IoT Driven business cases
  3. Making Sense of IoT Data with Big Data + Data

    Science Big Data Week Conference 2015 Charles Cai Big Data + Data Science Leading Oil and Gas Trading Company u  Innovating with Disruptive Technologies Data Center Operation System Data Operation System 2.0 Data Science Maturity Model D - I - K - W Crowdsourcing MOOC / OSH / OSS Data Science Maturity Model Big Data DevOps / Data Scientist Shortage Operating BDA: Microservices Graph Database / Graph Computing Open Source Hardware / Software Data – Information – Knowledge - Wisdom The Power of Crowdsourcing
  4. Intro u  Bio u  #FO #FICC: Investment Banking Front Office:

    FX/Commodities u  #ETRM: Energy Trading & Risk Management u  #entrepreneur #innovator #disruptor u  Voted as one of the UK’s Top 50 Data Leaders & Influencers u  Twitter: @caidong u #big-data #IoT #data-science #MOOC #Mobile #Cloud #UX u  LinkedIn: http://uk.linkedin.com/in/charlescai/en
  5. Where we are at with Big Data Analytics? By Thomas

    Davenport – Harvard Business Review
  6. u  Use Case: Parkinson Disease New Drug Trial u  there’s

    no cure for Parkinson’s disease u  New medicine trial is an extremely slow process, daily doses x8! u  Traditional feedbacks from the patients are not frequent at all u  Wireless enabled wearable device + IMU sensors u  Classification of wearer activities u  sitting, standing, walking, running, sleeping… u  Detect pattern of Parkinson’s Disease symptoms u  predicting deterioration / improvement speed u  new trial medicine effectiveness u  Sensor data 10Hz sampling = 1GB / day / patient IoT = Big Data
  7. Open Source Data Science Toolbox Hadoop / Mesos Distributed Storage

    + Scalable Computation Open Source Big Data / Data Science Platform 10 COTS Apps (Excel, Tableau, Qlik...) Statistical Time Series Analysis Wider Big Data Analytics eco-systems •  Shell/APIs: HDFS, Hive, Spark, HBase, Sqoop, JDBC/ODBC •  Languages: Julia, Python, R, Scala - Developed on: - Operated by: NLTK: Natural Language Distributed Time Series / Geospatial / Graph Databases GIT Repo Data Products WebSocket Drag + Drop (CZML/GeoJSON) Web Browser (collaboration) Export to CSV/ Excel Geospatial data Time Series data Public Data Market data Real-time Streaming Open Gov Data JDBC via phoenix HDFS Hive/Pig w/ Geospatial
  8. Key Sub-systems in Modern Big Data Analytics Stack Data Analytics

    Streaming Graph Computing Machine Learning …
  9. Data Science Maturity Map – where we are, where we

    are going can go Information Data Knowledge Wisdom / Intelligence “Note: The current version focuses mainly around data / machine learning - a new version for cross industry use cases with more coverage on IoT, container, data flow etc… is being developed – ETA Dec 2015 / Jan 2016. Please follow Twitter: @caidong to receive the latest version soon”
  10. From Classic to Modern Architecture Full Text Search Natural Language

    Process CCTV / Voice Computer Vision + Q&A Deep Learning (CNN/RNN) RDBMS / DW KV + GraphDB + BD DW Business Intelligence Big Data, Machine Learning Lightweight Container + Microservices + API Harvesting n-tier architecture Semantic Search Keyword Search Named Entity Extraction Q&A N-Grams Faceted Search Geospatial Search Tables Primary Keys Foreign Keys Node / Vertex Label Edge / Relationship Properties Colours Shapes Complex Shapes Textiles Accessories Context What happened? What’s happening? Predictive Analytics Prescriptive Analysis “Make the trend!” Database App Server Web Front Cloud Distributed and Fault Tolerant “Data Centre as One Computer” Unstructured
  11. u  Working with HR Training team u  VTA Training Sessions

    u  Big Data Bootcamp u  Lunch and Learn KT Sessions Big Data Technology is evolving so fast… here’s Hadoop related: Big data ELT with Apache Sqoop BI vs Data Science Data Scientist Career Path MOOC and Machine Learning Machine Learning with Apache Spark Map Reduce 101 Big Data Security: Kerberos/Knox/Sentry Deep Learning and Use Cases Time Series and Geospatial Big Data Analytics with Impala HBase: Distributed Key-value BigTable Distributed Time Sereis DB: OpenTSDB Machine Learning with Hadoop and R Advanced Machine Bayesian Network
  12. Big Data / Data Science Learning Resource: free e-Books Data

    Jujitsu: The Art of Turning Data into Product Data Mining Algorithms In R A Programmer's Guide to Data Mining Data Mining and Analysis: Fundamental Concepts and Algorithms Mining of Massive Datasets The School of Data Handbook Theory and Applications for Advanced Text Mining An Introduction to Data Science
  13. Big Data / Data Science Certifications: EMC, Cloudera, … CCP:

    Data Scientists: -  elite level -  real-world designing and developing -  production-ready data science solution -  peer-evaluated for accuracy, scalability, and robustness EMC Data Science Associate: -  Data Analytics Lifecycle -  Analyzing / exploring data w/ R -  Statistics modelling, theory and advanced methods -  Advanced technology & tools -  Operationalizing
  14. BI vs DS: from Descriptive to Prescriptive - SAP SAP

    – Analytics Maturity by Competitive Advantage