Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science 101 - An overview medley

Data Science 101 - An overview medley

This is a medley of great talks on Data Sciences (see references)
copied together & compressed to fit into a 20 min. overview talk on the topic

Awesome Incremented

December 04, 2015
Tweet

More Decks by Awesome Incremented

Other Decks in Technology

Transcript

  1. Disclaimer - What this is •A medley of great talks

    on DS (see references) •copied together & compressed → To fit into a 20 min. overview talk on the topic
  2. Agenda •What is Data Science? •Why Data Science? •What is

    a Data Scientist? •How to become a Data Scientist?
  3. Data Science: The Origins 1970s: Peter Naur introduces “data science”

    as a synonym to “computer science” 1997: Jeff Wu claims “statisticians” are “data scientists”. 2001: William Cleveland introduces data science as an independent discipline, extending statistics. 2008: DJ Patil (LinkedIn) and Jeff Hammerbacher (Facebook) describe their job role as that of “Data Scientist”
  4. What about Big Data? •Volume SQL → HDFS •Velocity complex

    events processing, apache storm, apache spark streaming •Variety structured | semi-structured | unstructured social graphs, system logs, tweets/blogs, CCTV many variables, sampling variability (e.g., spatiotemporal)
  5. What about Big Data? •Volume •Velocity •Variety Nobody wants data.

    Everybody wants data-driven reliable actionable insights
  6. Really, what is data science? According to NIST “Data science

    is the empirical synthesis of actionable knowledge from raw data through the complete data lifecycle process.” According to NIST Big Data Framework
  7. Man on the Moon - Small Data! Computer Program Date:

    1969 64 Kb, 2Kb RAM, Fortran Must work lst time Apollo XI Speed: 3,500 km/hour Weight: 13,500 kg Lots of complex data Man on the Moon Distance: 356,000 km Never been there before Must return to Earth
  8. Think About lt - We live in Crazy Times! Apollo

    Xl, 1969 64 Kb SkyDive Stratos, 2012 Tens of Gigabytes
  9. How to become a Data Scientist? Golden times for autodidacts

    • So much Open Source & Open Data (GitHub) • Never been easier to get in touch (Twitter, Social) • Low-cost Compute resources (Cloud, SAAS, PAAS) → However, you will finally hit a wall.
  10. How to become a Data Scientist? At the latest then

    get in touch with • bootcamps, retreats, courses such as... • NYC DS bootcamp, DS Retreat, Udacity, Education.emc For now some references to get started...
  11. References • 10 myths about Data Scientists (J. Kobelius) •

    Big Data [sorry] & Data Science (Data Science London) • Intro to Data Science (P. Nathan) • Data Science in 2016: Moving Up (P. Nathan) • How to Become a Data Scientist (R. Orban) • Jose Quesada über Skills der Data Scientists (Heise) • slideshare.net/urlwolf (Jose Quesada at SlideShare)
  12. [Sort of] Data Scientist Toolkit • Java, R, Python... (bonus:

    Clojure, Haskell, Scala) • Hadoop, HDFS & Map Reduce... (bonus: Spark, Storm) • HBase, Pig & Hive... (bonus: Shark, Impala, Cascalog) • ETL, Webscrapers, Flume, Sqoop... (bonus: Hume) • SQL RDBMS,DW,OLAP… • Knime, Weka, RapidMiner...(bonus: SciPy, NumPy, scikit-learn, pandas) • D3.js, Gephi, ggplot2, Tableu, Flare, Shiny… • SPSS, Matlab, SAS... (the enterprise man) • NoSQL, Mongo DB, Couchbase, Cassandra… • And Yes! ... MS-Excel: the most used, most underrated DS tool