Slide 1

Slide 1 text

Learning Data Science and Apache Spark from Coursera and edX Akmal B. Chaudhri (艾克摩 曹理)

Slide 2

Slide 2 text

Thanks ...

Slide 3

Slide 3 text

Abstract In this presentation, the speaker will share his experiences studying on several MOOCs. The University of California at Berkeley recently offered two courses on Apache Spark. And Johns Hopkins University has been offering a multi-module Data Science Specialization course for several years. The courses from both universities have been very popular. However, how do such courses compare with other forms of training and education? What is the time commitment required? What is the skill level required? This presentation will answer these questions and more.

Slide 4

Slide 4 text

My background •  ~25 years experience in IT –  Developer (Reuters) –  Academic (City University) –  Consultant (Logica) –  Technical Architect (CA) –  Senior Architect (Informix) –  Senior IT Specialist (IBM) –  TI (Hortonworks) –  SA (DataStax) •  Worked with various technologies –  Programming languages –  IDE –  Database Systems •  Client-facing roles –  Developers –  Senior executives –  Journalists •  Broad industry experience •  Community outreach •  University relations •  10 books, many presentations

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Why data science? Data Scientist: The Sexiest Job of the 21st Century -- Thomas H. Davenport and D.J. Patil Source: “Data Scientist: The Sexiest Job of the 21st Century” Thomas H. Davenport and D.J. Patil (October 2012)

Slide 7

Slide 7 text

Why Apache Spark? The Sexiest Technology of the 21st Century -- Me

Slide 8

Slide 8 text

Increasing interest in Apache Spark and Data Science Source: Shutterstock Image ID 216333160

Slide 9

Slide 9 text

Apache Spark jobs in the UK (permanent) •  Top related IT skills –  Hadoop (541) –  Big Data (437) –  Java (401) –  Python (308) –  Scala (295) –  Agile (240) –  SQL (235) –  Analytics (228) Source: http://www.itjobswatch.co.uk/jobs/uk/apache spark.do (18 August 2015)

Slide 10

Slide 10 text

Apache Spark jobs in the UK (contract) •  Top related IT skills –  Hadoop (259) –  Big Data (231) –  Java (154) –  Scala (141) –  Apache Hive (129) –  Agile (113) –  HBase (89) –  Finance (87) Source: http://www.itjobswatch.co.uk/contracts/uk/apache spark.do (18 August 2015)

Slide 11

Slide 11 text

Data scientist jobs in the UK (permanent) •  Top related IT skills –  Analytics (464) –  R (440) –  Python (417) –  SQL (379) –  Big Data (346) –  Hadoop (317) –  Analytical Skills (292) –  Statistics (289) Source: http://www.itjobswatch.co.uk/jobs/uk/data scientist.do (18 August 2015)

Slide 12

Slide 12 text

Data scientist jobs in the UK (contract) •  Top related IT skills –  Python (71) –  R (61) –  Hadoop (57) –  SQL (49) –  Analytics (48) –  Analytical Skills (43) –  Big Data (42) –  Data Analysis (34) Source: http://www.itjobswatch.co.uk/contracts/uk/data scientist.do (18 August 2015)

Slide 13

Slide 13 text

Education choices ... Source: Shutterstock Image ID 159183185

Slide 14

Slide 14 text

Education choices •  Read a book •  On-the-job training •  Internal training courses •  External training courses •  University courses •  MOOCs •  ...

Slide 15

Slide 15 text

Apache Spark ... •  Introduction to Big Data with Apache Spark – Advanced undergraduate-level material

Slide 16

Slide 16 text

Apache Spark ... •  Scalable Machine Learning

Slide 17

Slide 17 text

Apache Spark ... •  Prereq: Python (for both), programming, maths, algorithms, ML, probability, linear algebra, calculus •  Length: 5 weeks •  Effort: 5-7 hours per week •  Upgrade: Verified ID for $50 •  No experience with Spark or DC •  Labs use PySpark

Slide 18

Slide 18 text

Apache Spark ... •  Tools: VirtualBox, Vagrant, modern computer •  Assessment: setup, weekly quizzes, labs

Slide 19

Slide 19 text

Apache Spark ... •  The Good – Discussion forum (great community help) •  The Bad – Time commitment (more than 5-7 hours) •  The Ugly – Auto-grader – Highest mark – Piazza

Slide 20

Slide 20 text

Apache Spark •  Extra XSeries certificate also available if both courses taken with verified ID.

Slide 21

Slide 21 text

Data Science ... •  Data Science Specialization

Slide 22

Slide 22 text

Data Science 1.  The Data Scientist’s Toolbox 2.  R Programming 3.  Getting and Cleaning Data 4.  Exploratory Data Analysis 5.  Reproducible Research 6.  Statistical Inference 7.  Regression Models 8.  Practical Machine Learning 9.  Developing Data Products 10.  Data Science Capstone

Slide 23

Slide 23 text

Data Science ... •  Prereq: Some programming •  Length: 4 weeks •  Effort: up to 9 hours per week •  Upgrade: Verified ID for £19 - £32 •  Labs use R

Slide 24

Slide 24 text

Data Science ... •  Tools: Markdown, git, GitHub, R, RStudio •  Assessment: quizzes, projects

Slide 25

Slide 25 text

Data Science ... •  The Good – Discussion forum (great community help) – Course certificates (distinction) •  The Bad – Peer assessments •  The Ugly – None

Slide 26

Slide 26 text

Data Science

Slide 27

Slide 27 text

Contact details

Slide 28

Slide 28 text

Find me on – http://www.linkedin.com/in/akmalchaudhri/ – http://twitter.com/akmalchaudhri/ – http://www.quora.com/Akmal-Chaudhri/ – http://www.facebook.com/akmal.chaudhri/ – http://plus.google.com/+AkmalChaudhri/ – http://www.slideshare.net/VeryFatBoy/ – http://www.youtube.com/VeryFatBoyVideos/

Slide 29

Slide 29 text

Akmal B. Chaudhri [email protected]

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Thank you