Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Non-technical dive to Data Science

Non-technical dive to Data Science

An introduction to Data Science for non-technical people.

Kacper Łukawski

June 27, 2019
Tweet

More Decks by Kacper Łukawski

Other Decks in Technology

Transcript

  1. Data types in Data Science • Structured: Excel spreadsheet like,

    time series, graphs • Unstructured: text, images, audio, video, and more...
  2. Data Analysis • data wrangling • basic descriptive statistics •

    data visualization • SQL experience • knowledge of R/Python Data Analyst responsibilities and skills
  3. Machine Learning • using ML algorithms to utilize data, learn

    from it and forecast future • data modelling and evaluation • probability and statistics knowledge • programming skills Machine Learning Engineer responsibilities and skills
  4. Big Data • building infrastructure and architecture for Big Data

    • using databases • designing large-scale processing systems • integrate different data sources into Data Lake • knowledge of Hadoop ecosystem: HDFS, Spark, Hive, Kafka, Druid, etc. • data importing Data Engineer responsibilities and skills
  5. Data Science • business & data understanding • statistical modelling

    & machine learning • reporting & visualization • and many more... Data Scientist responsibilities and skills
  6. Artificial Intelligence vs Machine Learning If it’s written in PowerPoint,

    it is definitely Artificial Intelligence. However, if it’s written in Python/R/Scala/whatever, it is probably Machine Learning. ML is just one of the attempts to achieve AI - the best we currently have, but surely not good enough to reach it at any point. Many forms of Government have been tried, and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time.… Winston Churchill
  7. 5 questions ML may try to answer 1. Is this

    A or B? Classification 2. Is this weird? Anomalies detection 3. How much / how many? Regression 4. How is this organized? Clustering 5. What should I do next? Reinforcement Learning
  8. Data Science relationships Data Science is an umbrella term that

    encompasses different disciplines. Big Data Machine Learning Deep Learning Artificial Intelligence
  9. Quick overview of Big Data tools Hadoop ecosystem consists of

    many different tools which are used depending on the problem: - Kafka - events processing - HBase - key-value storage - Hive - SQL-like data storage - Spark - generic framework for distributed computing - and many more...
  10. An incomplete list of Data Science tools There are two

    common choices when it comes to Data Science - R and Python. As we mostly have an experience with Python, there are some commonly used tools: - pandas - data manipulation - matplotlib, seaborn - data visualization - scikit-learn, Tensorflow, Keras - machine learning algorithms implementation
  11. Applicability of Data Science in different sectors A majority of

    modern companies collects a lot of data which is not utilized, however it could and even should be. The myth is, we need to have lots of data to perform a modelling, but that’s not true. Actually even a small business may become a data driven organization, and Data Science shouldn’t be treated as a magical problem solver for all the issues we have.
  12. Applicability of Data Science in different sectors 1. healthcare -

    aging society 2. process automation - e.g. replacing dangerous jobs with machines 3. ecommerce and sales - targeting customers 4. communication - chatbots, disabilities 5. funny images manipulation and memes generation
  13. Future trends in Data Science 1. XAI - eXplainable Artificial

    Intelligence 2. AutoML - Machine Learning without programming knowledge 3. GDPR adoption and bias removal 4. AGI - Artificial General Intelligence