Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Data Science and Apache Spark from Coursera and edX 1

D92714958de22e97c3c0461a3238a2c3?s=47 VeryFatBoy
August 20, 2015

Learning Data Science and Apache Spark from Coursera and edX 1

Originally presented at:

Eurostaff Connect - Big Data and Advanced Analytics, London, UK, 20 August 2015
http://www.meetup.com/Eurostaff-Big-Data/events/223936870/

D92714958de22e97c3c0461a3238a2c3?s=128

VeryFatBoy

August 20, 2015
Tweet

More Decks by VeryFatBoy

Other Decks in Technology

Transcript

  1. Learning Data Science and Apache Spark from Coursera and edX

    Akmal B. Chaudhri (艾克摩 曹理)
  2. Thanks ...

  3. Abstract In this presentation, the speaker will share his experiences

    studying on several MOOCs. The University of California at Berkeley recently offered two courses on Apache Spark. And Johns Hopkins University has been offering a multi-module Data Science Specialization course for several years. The courses from both universities have been very popular. However, how do such courses compare with other forms of training and education? What is the time commitment required? What is the skill level required? This presentation will answer these questions and more.
  4. My background •  ~25 years experience in IT –  Developer

    (Reuters) –  Academic (City University) –  Consultant (Logica) –  Technical Architect (CA) –  Senior Architect (Informix) –  Senior IT Specialist (IBM) –  TI (Hortonworks) –  SA (DataStax) •  Worked with various technologies –  Programming languages –  IDE –  Database Systems •  Client-facing roles –  Developers –  Senior executives –  Journalists •  Broad industry experience •  Community outreach •  University relations •  10 books, many presentations
  5. None
  6. Why data science? Data Scientist: The Sexiest Job of the

    21st Century -- Thomas H. Davenport and D.J. Patil Source: “Data Scientist: The Sexiest Job of the 21st Century” Thomas H. Davenport and D.J. Patil (October 2012)
  7. Why Apache Spark? The Sexiest Technology of the 21st Century

    -- Me
  8. Increasing interest in Apache Spark and Data Science Source: Shutterstock

    Image ID 216333160
  9. Apache Spark jobs in the UK (permanent) •  Top related

    IT skills –  Hadoop (541) –  Big Data (437) –  Java (401) –  Python (308) –  Scala (295) –  Agile (240) –  SQL (235) –  Analytics (228) Source: http://www.itjobswatch.co.uk/jobs/uk/apache spark.do (18 August 2015)
  10. Apache Spark jobs in the UK (contract) •  Top related

    IT skills –  Hadoop (259) –  Big Data (231) –  Java (154) –  Scala (141) –  Apache Hive (129) –  Agile (113) –  HBase (89) –  Finance (87) Source: http://www.itjobswatch.co.uk/contracts/uk/apache spark.do (18 August 2015)
  11. Data scientist jobs in the UK (permanent) •  Top related

    IT skills –  Analytics (464) –  R (440) –  Python (417) –  SQL (379) –  Big Data (346) –  Hadoop (317) –  Analytical Skills (292) –  Statistics (289) Source: http://www.itjobswatch.co.uk/jobs/uk/data scientist.do (18 August 2015)
  12. Data scientist jobs in the UK (contract) •  Top related

    IT skills –  Python (71) –  R (61) –  Hadoop (57) –  SQL (49) –  Analytics (48) –  Analytical Skills (43) –  Big Data (42) –  Data Analysis (34) Source: http://www.itjobswatch.co.uk/contracts/uk/data scientist.do (18 August 2015)
  13. Education choices ... Source: Shutterstock Image ID 159183185

  14. Education choices •  Read a book •  On-the-job training • 

    Internal training courses •  External training courses •  University courses •  MOOCs •  ...
  15. Apache Spark ... •  Introduction to Big Data with Apache

    Spark – Advanced undergraduate-level material
  16. Apache Spark ... •  Scalable Machine Learning

  17. Apache Spark ... •  Prereq: Python (for both), programming, maths,

    algorithms, ML, probability, linear algebra, calculus •  Length: 5 weeks •  Effort: 5-7 hours per week •  Upgrade: Verified ID for $50 •  No experience with Spark or DC •  Labs use PySpark
  18. Apache Spark ... •  Tools: VirtualBox, Vagrant, modern computer • 

    Assessment: setup, weekly quizzes, labs
  19. Apache Spark ... •  The Good – Discussion forum (great community

    help) •  The Bad – Time commitment (more than 5-7 hours) •  The Ugly – Auto-grader – Highest mark – Piazza
  20. Apache Spark •  Extra XSeries certificate also available if both

    courses taken with verified ID.
  21. Data Science ... •  Data Science Specialization

  22. Data Science 1.  The Data Scientist’s Toolbox 2.  R Programming

    3.  Getting and Cleaning Data 4.  Exploratory Data Analysis 5.  Reproducible Research 6.  Statistical Inference 7.  Regression Models 8.  Practical Machine Learning 9.  Developing Data Products 10.  Data Science Capstone
  23. Data Science ... •  Prereq: Some programming •  Length: 4

    weeks •  Effort: up to 9 hours per week •  Upgrade: Verified ID for £19 - £32 •  Labs use R
  24. Data Science ... •  Tools: Markdown, git, GitHub, R, RStudio

    •  Assessment: quizzes, projects
  25. Data Science ... •  The Good – Discussion forum (great community

    help) – Course certificates (distinction) •  The Bad – Peer assessments •  The Ugly – None
  26. Data Science

  27. Contact details

  28. Find me on – http://www.linkedin.com/in/akmalchaudhri/ – http://twitter.com/akmalchaudhri/ – http://www.quora.com/Akmal-Chaudhri/ – http://www.facebook.com/akmal.chaudhri/ – http://plus.google.com/+AkmalChaudhri/ – http://www.slideshare.net/VeryFatBoy/ – http://www.youtube.com/VeryFatBoyVideos/

  29. Akmal B. Chaudhri firstname.lastname@live.com

  30. None
  31. Thank you