Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Data Science and Apache Spark from Coursera and edX 1

VeryFatBoy
August 20, 2015

Learning Data Science and Apache Spark from Coursera and edX 1

Originally presented at:

Eurostaff Connect - Big Data and Advanced Analytics, London, UK, 20 August 2015
http://www.meetup.com/Eurostaff-Big-Data/events/223936870/

VeryFatBoy

August 20, 2015
Tweet

More Decks by VeryFatBoy

Other Decks in Technology

Transcript

  1. Abstract In this presentation, the speaker will share his experiences

    studying on several MOOCs. The University of California at Berkeley recently offered two courses on Apache Spark. And Johns Hopkins University has been offering a multi-module Data Science Specialization course for several years. The courses from both universities have been very popular. However, how do such courses compare with other forms of training and education? What is the time commitment required? What is the skill level required? This presentation will answer these questions and more.
  2. My background •  ~25 years experience in IT –  Developer

    (Reuters) –  Academic (City University) –  Consultant (Logica) –  Technical Architect (CA) –  Senior Architect (Informix) –  Senior IT Specialist (IBM) –  TI (Hortonworks) –  SA (DataStax) •  Worked with various technologies –  Programming languages –  IDE –  Database Systems •  Client-facing roles –  Developers –  Senior executives –  Journalists •  Broad industry experience •  Community outreach •  University relations •  10 books, many presentations
  3. Why data science? Data Scientist: The Sexiest Job of the

    21st Century -- Thomas H. Davenport and D.J. Patil Source: “Data Scientist: The Sexiest Job of the 21st Century” Thomas H. Davenport and D.J. Patil (October 2012)
  4. Apache Spark jobs in the UK (permanent) •  Top related

    IT skills –  Hadoop (541) –  Big Data (437) –  Java (401) –  Python (308) –  Scala (295) –  Agile (240) –  SQL (235) –  Analytics (228) Source: http://www.itjobswatch.co.uk/jobs/uk/apache spark.do (18 August 2015)
  5. Apache Spark jobs in the UK (contract) •  Top related

    IT skills –  Hadoop (259) –  Big Data (231) –  Java (154) –  Scala (141) –  Apache Hive (129) –  Agile (113) –  HBase (89) –  Finance (87) Source: http://www.itjobswatch.co.uk/contracts/uk/apache spark.do (18 August 2015)
  6. Data scientist jobs in the UK (permanent) •  Top related

    IT skills –  Analytics (464) –  R (440) –  Python (417) –  SQL (379) –  Big Data (346) –  Hadoop (317) –  Analytical Skills (292) –  Statistics (289) Source: http://www.itjobswatch.co.uk/jobs/uk/data scientist.do (18 August 2015)
  7. Data scientist jobs in the UK (contract) •  Top related

    IT skills –  Python (71) –  R (61) –  Hadoop (57) –  SQL (49) –  Analytics (48) –  Analytical Skills (43) –  Big Data (42) –  Data Analysis (34) Source: http://www.itjobswatch.co.uk/contracts/uk/data scientist.do (18 August 2015)
  8. Education choices •  Read a book •  On-the-job training • 

    Internal training courses •  External training courses •  University courses •  MOOCs •  ...
  9. Apache Spark ... •  Introduction to Big Data with Apache

    Spark – Advanced undergraduate-level material
  10. Apache Spark ... •  Prereq: Python (for both), programming, maths,

    algorithms, ML, probability, linear algebra, calculus •  Length: 5 weeks •  Effort: 5-7 hours per week •  Upgrade: Verified ID for $50 •  No experience with Spark or DC •  Labs use PySpark
  11. Apache Spark ... •  The Good – Discussion forum (great community

    help) •  The Bad – Time commitment (more than 5-7 hours) •  The Ugly – Auto-grader – Highest mark – Piazza
  12. Data Science 1.  The Data Scientist’s Toolbox 2.  R Programming

    3.  Getting and Cleaning Data 4.  Exploratory Data Analysis 5.  Reproducible Research 6.  Statistical Inference 7.  Regression Models 8.  Practical Machine Learning 9.  Developing Data Products 10.  Data Science Capstone
  13. Data Science ... •  Prereq: Some programming •  Length: 4

    weeks •  Effort: up to 9 hours per week •  Upgrade: Verified ID for £19 - £32 •  Labs use R
  14. Data Science ... •  The Good – Discussion forum (great community

    help) – Course certificates (distinction) •  The Bad – Peer assessments •  The Ugly – None