Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building the Right Data & Analytics Team for th...

Building the Right Data & Analytics Team for the Big Impact

PRESENTING: BUILDING THE RIGHT BIG DATA AND ANALYTICS TEAM FOR THE BIG IMPACT - 9 MAR 2016, 02:45

Charles has been pioneering Big Data journeys in the enterprise world for many years, not only being instrumental in massive Big Data initiatives to revolutionize supply demand chains in Finance and the Oil & Gas industries, but more importantly, his evangelism and community work has earned him one of the UK’s 50 Top Data Leaders and Influencers. 
In this talk, Charles shares best practices in strategizing from ground zero, in involving the CFO/CIO early, in gaining buy-in and in new ways of developing in-house and “virtual” Big Data and Data Science teams. Such disciplines are very transferable and there’s a huge market shortage for these in the foreseeable future.
By now, most industries have realized that the right data science team can turn silos of diverse data into game-changing business insights. The next stage is working out how to piece together the perfect combination of technical skills and personality traits which isn’t easy.
This session will cover:
* How to define an enterprise-wide Big Data/Data science maturity model
* Assessing the characteristics and skills of winning big data teams
* Fostering the right culture to integrate new data science strategies and practices into your existing organisational structure
* Highlighting proven big impacts on a few vertical industries using real examples of data strategy to win big on the bottom line

http://munich.pi.tv

charles-cai

March 09, 2016
Tweet

More Decks by charles-cai

Other Decks in Technology

Transcript

  1. Building the Right Data & Analytics Team for the Big

    Impact PI Munich 2016 8-9th March 2016 Charles Cai Senior Advisor to the President Deputy GM, Chief Architect IDC, Wanda Finance Group u Innovating with Disruptive Technologies Data Center Operation System Data Operation System 2.0 Data Science Maturity Model D - I - K - W Crowdsourcing MOOC / OSH / OSS Data Science Maturity Model Operating BDA: Microservices
  2. Brief Intro u Bio u #FO #FICC: Investment Banking Front

    Office: FX/Commodities u #ETRM: Energy Trading & Risk Management u #entrepreneur #innovator #disruptor (#data-science + #robotics) u Twitter: @caidong u #big-data #IoT #data-science #MOOC #Mobile #Cloud #UX u LinkedIn: http://uk.linkedin.com/in/charlescai/en u My current projects: Software Defined Data Center, IoT, BlockChain, Deep Learning Charles Cai Senior Advisor to the President Deputy GM, Chief Architect, IDC Finance Group, Wanda
  3. My Talks on Big Data, Data Science, IoT, BlockChain Big

    Data and Data Science • QCon London • Chief Data Officer Exchange • Cloud Forum • Big Data Week • MoD Information Symposium • IDG Data+ Conference IoT + Big Data • IoT World Europe 2015 • IoT Tech Expo Hackathons • Connected Life IoT Hackathon (winner) • Hack Coin – BitCoin / BlockChain Hackathon • TFL Urban Traffic Hack
  4. PLM Big Data Case 1: Parkinson Disease IoT New Drug

    Trial u Use Case: Parkinson Disease New Drug Trial IoT solution u There’s no cure for Parkinson’s disease u New medicine trial is an extremely slow process, daily doses x8! u Traditional feedbacks from the patients are not frequent at all u Solution: wireless enabled wearable device + IMU sensors u Classification of wearer activities u sitting, standing, walking, running, sleeping… u Detect pattern of Parkinson’s Disease symptoms u predicting deterioration / improvement speed u new trial medicine effectiveness u Sensor data 10Hz sampling = 1GB / day / patient
  5. Home Environmental Sensors Bio-sensing Wearable Devices Personal Bio-metrics Health tracking

    Speech, face and emotions recognition, interaction Home Environmental Metrics Predictive Home Care Analytics dslogix Internet of Things Reminders, tasks monitor movement tracking Doctors / Carers / MIners Family Other IOT Other Patients Intelligent Algorithms Big Data IoT Gateway w/ Connected e-Health Instruments Smart Medicine Container Smart Cane https://dsrobotix.io e-Diagnostics VR / AR PLM Big Data Case#2 – Connected e-Health Solution for the Elderly
  6. Where we are at with Big Data Analytics? By Thomas

    Davenport – Harvard Business Review
  7. Open Source Data Science Toolbox Hadoop / Mesos Distributed Storage

    + Scalable Computation Open Source Enterprise Big Data / Data Science Platform 10 COTS Apps (Excel, Tableau, Qlik...) Statistical Time Series Analysis Wider Big Data Analytics eco- systems • Shell/APIs: HDFS, Hive, Spark, HBase, Sqoop, JDBC/ODBC • Languages: Julia, Python, R, Scala - Developed on: - Operated by: NLTK: Natural Language Distributed Time Series / Geospatial / Graph Databases GIT Repo D a t a P r o d u c t s Data Products WebSocket Drag + Drop (CZML/GeoJSON) Web Browser (collaboration) Export to CSV/Excel Geospatial data Time Series data Public Data Market data Real-time Streaming Open Gov Data JDBC via phoenix HDFS Hive/Pig w/ Geospatial
  8. Key Sub-systems in Modern Big Data Analytics Stack Data Analytics

    Streaming Graph Computing Machine Learning …
  9. Data Science Maturity Map – where we are, where we

    are going can go DIKW Pyramid WISDOM KNOWLEDGE INFORMATION DATA https://en.wikipedia.org/wiki/DIKW_Pyramid
  10. Data Science Maturity Map – where we are, where we

    are going can go Information Data Knowledge Wisdom / Intelligence RDBMS Ingestion RESTful / JSON Ingestion Text Ingestion HTML/XML Ingestion API Harvesting Image / Video Ingestion Natural Language Processing Metadata Management Ontology / Graph Management Elastics / Cloud Computing Case Management Streaming Processing Topic / Entity Extraction Feature Extraction Temporal Identification Geo-location Detection Descriptive Analysis Co-reference Extraction TF/Inv-TF Analysis N-Gram Analysis Sentiment Analysis Social Influence Network Analysis Crowd-sourcing Analysis Time Series Analysis Differencing Analysis Event / Causation Analysis Profile & Trend Analysis Classification Analysis Clustering Analysis Geospatial Analysis Inference Analysis Deep Learning (Deep Neural Network) Insight Visualization Monitoring & Alerting Collaborative Filtering Decision Trees / Random Forest Predictive Analysis Neural Network Analysis Simulation & Optimization Workflow Integration Dashboards Recommendation Engine AI Engine Anticipatory Intelligence AI Expert Q&A System Prescriptive Analysis Minimum Basic Intermediate Advanced/Complete Big Data: Map/Reduce | in-memory Data Science / Machine Learning Platform Engineering / Data Product Visualization Engineering MR D S P E VIZ Vendor Products / Services Open Source Software / Hardware MR D S P E VIZ P E MR D S MR P E D S D S P E Feature Matching Anomaly Detection MR MR MR MR D S MR P E P E P E P E MR D S MR D S MR MR D S D S MR D S D S D S D S MR MR D S D S D S D S MR D S VIZ P E P E D S D S D S D S D S D S P E VIZ D S P E VIZ D S MR D S D S D S D S P E P E D S Word2Vec / Doc2Vec Analysis MR Sensors / Actuators P E Sensor Data Fusion P E IPv6 Mesh Wireless Network P E
  11. From Classic to Modern Architecture Full Text Search Natural Language

    Process CCTV / Voice Computer Vision + Q&A Deep Learning (CNN/RNN) RDBMS / DW KV + GraphDB + BD DW Business Intelligence Big Data, Machine Learning Lightweight Container + Microservices + API Harvesting n-tier architecture Semantic Search Keyword Search Named Entity Extraction Q&A N-Grams Faceted Search Geospatial Search Tables Primary Keys Foreign Keys Node / Vertex Label Edge / Relationship Properties Colours Shapes Complex Shapes Textiles Accessories Context What happened? What’s happening? Predictive Analytics Prescriptive Analysis “Make the trend!” Database App Server Web Front Cloud Distributed and Fault Tolerant “Data Centre as One Computer” Unstructured
  12. u Working with HR Training team u VTA Training Sessions

    u Big Data Bootcamp u Lunch and Learn KT Sessions Big Data Technology is evolving so fast… here’s Hadoop related: Big data ELT with Apache Sqoop BI vs Data Science Data Scientist Career Path MOOC and Machine Learning Machine Learning with Apache Spark Map Reduce 101 Big Data Security: Kerberos/Knox/Sentry Deep Learning and Use Cases Time Series and Geospatial Big Data Analytics with Impala HBase: Distributed Key-value BigTable Distributed Time Sereis DB: OpenTSDB Machine Learning with Hadoop and R Advanced Machine Bayesian Network
  13. Big Data / Data Science Learning Resource: free e-Books Data

    Jujitsu: The Art of Turning Data into Product Data Mining Algorithms In R A Programmer's Guide to Data Mining Data Mining and Analysis: Fundamental Concepts and Algorithms Mining of Massive Datasets The School of Data Handbook Theory and Applications for Advanced Text Mining An Introduction to Data Science
  14. Big Data / Data Science Certifications: EMC, Cloudera, … CCP:

    Data Scientists: - elite level - real-world designing and developing - production-ready data science solution - peer-evaluated for accuracy, scalability, and robustness EMC Data Science Associate: - Data Analytics Lifecycle - Analyzing / exploring data w/ R - Statistics modelling, theory and advanced methods - Advanced technology & tools - Operationalizing
  15. Appendix - More BI vs DS: from Descriptive to Prescriptive

    - Gartner Gartner – Analytical Difficulty by Value
  16. Appendix - More BI vs DS: from Descriptive to Prescriptive

    - SAP SAP – Analytics Maturity by Competitive Advantage
  17. u Embrace Open Source Big Data Stack u Start with

    multiple disciplined “virtual teams” u Seek top management support u Aim high but make small steps to prove values u Embrace Open Innovation – Hackathons, Meetups, Open Data Initiative u Be Disruptive – business model, connected products - to not be disrupted Summary