Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Data Science and Apache Spark from Coursera and edX 1

VeryFatBoy
August 20, 2015

Learning Data Science and Apache Spark from Coursera and edX 1

Originally presented at:

Eurostaff Connect - Big Data and Advanced Analytics, London, UK, 20 August 2015
http://www.meetup.com/Eurostaff-Big-Data/events/223936870/

VeryFatBoy

August 20, 2015
Tweet

More Decks by VeryFatBoy

Other Decks in Technology

Transcript

  1. Learning Data Science and
    Apache Spark from Coursera
    and edX
    Akmal B. Chaudhri
    (艾克摩 曹理)

    View full-size slide

  2. Abstract
    In this presentation, the speaker will share his
    experiences studying on several MOOCs. The University
    of California at Berkeley recently offered two courses on
    Apache Spark. And Johns Hopkins University has been
    offering a multi-module Data Science Specialization
    course for several years. The courses from both
    universities have been very popular. However, how do
    such courses compare with other forms of training and
    education? What is the time commitment required? What
    is the skill level required? This presentation will answer
    these questions and more.

    View full-size slide

  3. My background
    •  ~25 years experience in IT
    –  Developer (Reuters)
    –  Academic (City University)
    –  Consultant (Logica)
    –  Technical Architect (CA)
    –  Senior Architect (Informix)
    –  Senior IT Specialist (IBM)
    –  TI (Hortonworks)
    –  SA (DataStax)
    •  Worked with various
    technologies
    –  Programming languages
    –  IDE
    –  Database Systems
    •  Client-facing roles
    –  Developers
    –  Senior executives
    –  Journalists
    •  Broad industry experience
    •  Community outreach
    •  University relations
    •  10 books, many presentations

    View full-size slide

  4. Why data science?
    Data Scientist: The Sexiest Job of the 21st
    Century
    -- Thomas H. Davenport and D.J. Patil
    Source: “Data Scientist: The Sexiest Job of the 21st Century” Thomas H. Davenport and D.J. Patil
    (October 2012)

    View full-size slide

  5. Why Apache Spark?
    The Sexiest Technology of the 21st
    Century
    -- Me

    View full-size slide

  6. Increasing interest in Apache Spark
    and Data Science
    Source: Shutterstock Image ID 216333160

    View full-size slide

  7. Apache Spark jobs in the UK
    (permanent)
    •  Top related IT skills
    –  Hadoop (541)
    –  Big Data (437)
    –  Java (401)
    –  Python (308)
    –  Scala (295)
    –  Agile (240)
    –  SQL (235)
    –  Analytics (228)
    Source: http://www.itjobswatch.co.uk/jobs/uk/apache spark.do (18 August 2015)

    View full-size slide

  8. Apache Spark jobs in the UK
    (contract)
    •  Top related IT skills
    –  Hadoop (259)
    –  Big Data (231)
    –  Java (154)
    –  Scala (141)
    –  Apache Hive (129)
    –  Agile (113)
    –  HBase (89)
    –  Finance (87)
    Source: http://www.itjobswatch.co.uk/contracts/uk/apache spark.do (18 August 2015)

    View full-size slide

  9. Data scientist jobs in the UK
    (permanent)
    •  Top related IT skills
    –  Analytics (464)
    –  R (440)
    –  Python (417)
    –  SQL (379)
    –  Big Data (346)
    –  Hadoop (317)
    –  Analytical Skills (292)
    –  Statistics (289)
    Source: http://www.itjobswatch.co.uk/jobs/uk/data scientist.do (18 August 2015)

    View full-size slide

  10. Data scientist jobs in the UK
    (contract)
    •  Top related IT skills
    –  Python (71)
    –  R (61)
    –  Hadoop (57)
    –  SQL (49)
    –  Analytics (48)
    –  Analytical Skills (43)
    –  Big Data (42)
    –  Data Analysis (34)
    Source: http://www.itjobswatch.co.uk/contracts/uk/data scientist.do (18 August 2015)

    View full-size slide

  11. Education choices ...
    Source: Shutterstock Image ID 159183185

    View full-size slide

  12. Education choices
    •  Read a book
    •  On-the-job training
    •  Internal training courses
    •  External training courses
    •  University courses
    •  MOOCs
    •  ...

    View full-size slide

  13. Apache Spark ...
    •  Introduction to Big Data with Apache
    Spark
    – Advanced undergraduate-level material

    View full-size slide

  14. Apache Spark ...
    •  Scalable Machine Learning

    View full-size slide

  15. Apache Spark ...
    •  Prereq: Python (for both), programming,
    maths, algorithms, ML, probability, linear
    algebra, calculus
    •  Length: 5 weeks
    •  Effort: 5-7 hours per week
    •  Upgrade: Verified ID for $50
    •  No experience with Spark or DC
    •  Labs use PySpark

    View full-size slide

  16. Apache Spark ...
    •  Tools: VirtualBox, Vagrant, modern
    computer
    •  Assessment: setup, weekly quizzes, labs

    View full-size slide

  17. Apache Spark ...
    •  The Good
    – Discussion forum (great community help)
    •  The Bad
    – Time commitment (more than 5-7 hours)
    •  The Ugly
    – Auto-grader
    – Highest mark
    – Piazza

    View full-size slide

  18. Apache Spark
    •  Extra XSeries certificate also available if
    both courses taken with verified ID.

    View full-size slide

  19. Data Science ...
    •  Data Science Specialization

    View full-size slide

  20. Data Science
    1.  The Data Scientist’s Toolbox
    2.  R Programming
    3.  Getting and Cleaning Data
    4.  Exploratory Data Analysis
    5.  Reproducible Research
    6.  Statistical Inference
    7.  Regression Models
    8.  Practical Machine Learning
    9.  Developing Data Products
    10.  Data Science Capstone

    View full-size slide

  21. Data Science ...
    •  Prereq: Some programming
    •  Length: 4 weeks
    •  Effort: up to 9 hours per week
    •  Upgrade: Verified ID for £19 - £32
    •  Labs use R

    View full-size slide

  22. Data Science ...
    •  Tools: Markdown, git, GitHub, R, RStudio
    •  Assessment: quizzes, projects

    View full-size slide

  23. Data Science ...
    •  The Good
    – Discussion forum (great community help)
    – Course certificates (distinction)
    •  The Bad
    – Peer assessments
    •  The Ugly
    – None

    View full-size slide

  24. Data Science

    View full-size slide

  25. Contact details

    View full-size slide

  26. Find me on
    – http://www.linkedin.com/in/akmalchaudhri/
    – http://twitter.com/akmalchaudhri/
    – http://www.quora.com/Akmal-Chaudhri/
    – http://www.facebook.com/akmal.chaudhri/
    – http://plus.google.com/+AkmalChaudhri/
    – http://www.slideshare.net/VeryFatBoy/
    – http://www.youtube.com/VeryFatBoyVideos/

    View full-size slide