Non-technical dive to Data Science

Non-technical dive to Data Science

An introduction to Data Science for non-technical people.


Kacper Łukawski

June 27, 2019


  1. Non-technical dive to Data Science Kacper Łukawski

  2. Nomenclature

  3. Data types in Data Science • Structured: Excel spreadsheet like,

    time series, graphs • Unstructured: text, images, audio, video, and more...
  4. Data Analysis • data wrangling • basic descriptive statistics •

    data visualization • SQL experience • knowledge of R/Python Data Analyst responsibilities and skills
  5. None
  6. Machine Learning • using ML algorithms to utilize data, learn

    from it and forecast future • data modelling and evaluation • probability and statistics knowledge • programming skills Machine Learning Engineer responsibilities and skills
  7. None
  8. None
  9. None
  10. Big Data • building infrastructure and architecture for Big Data

    • using databases • designing large-scale processing systems • integrate different data sources into Data Lake • knowledge of Hadoop ecosystem: HDFS, Spark, Hive, Kafka, Druid, etc. • data importing Data Engineer responsibilities and skills
  11. None
  12. Data Science • business & data understanding • statistical modelling

    & machine learning • reporting & visualization • and many more... Data Scientist responsibilities and skills
  13. Data Science Business + Data Analytics + Big Data +

    Machine Learning + ...
  14. Artificial Intelligence vs Machine Learning If it’s written in PowerPoint,

    it is definitely Artificial Intelligence. However, if it’s written in Python/R/Scala/whatever, it is probably Machine Learning. ML is just one of the attempts to achieve AI - the best we currently have, but surely not good enough to reach it at any point. Many forms of Government have been tried, and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time.… Winston Churchill
  15. 5 questions ML may try to answer 1. Is this

    A or B? Classification 2. Is this weird? Anomalies detection 3. How much / how many? Regression 4. How is this organized? Clustering 5. What should I do next? Reinforcement Learning
  16. Data Science relationships Data Science is an umbrella term that

    encompasses different disciplines. Big Data Machine Learning Deep Learning Artificial Intelligence
  17. None
  18. Big Data tools

  19. Quick overview of Big Data tools Hadoop ecosystem consists of

    many different tools which are used depending on the problem: - Kafka - events processing - HBase - key-value storage - Hive - SQL-like data storage - Spark - generic framework for distributed computing - and many more...
  20. None
  21. Data Science tools

  22. An incomplete list of Data Science tools There are two

    common choices when it comes to Data Science - R and Python. As we mostly have an experience with Python, there are some commonly used tools: - pandas - data manipulation - matplotlib, seaborn - data visualization - scikit-learn, Tensorflow, Keras - machine learning algorithms implementation
  23. None
  24. Data Science future

  25. Applicability of Data Science in different sectors A majority of

    modern companies collects a lot of data which is not utilized, however it could and even should be. The myth is, we need to have lots of data to perform a modelling, but that’s not true. Actually even a small business may become a data driven organization, and Data Science shouldn’t be treated as a magical problem solver for all the issues we have.
  26. Applicability of Data Science in different sectors 1. healthcare -

    aging society 2. process automation - e.g. replacing dangerous jobs with machines 3. ecommerce and sales - targeting customers 4. communication - chatbots, disabilities 5. funny images manipulation and memes generation
  27. Applicability of Data Science in different sectors And probably many,

    many more…
  28. Future trends in Data Science 1. XAI - eXplainable Artificial

    Intelligence 2. AutoML - Machine Learning without programming knowledge 3. GDPR adoption and bias removal 4. AGI - Artificial General Intelligence
  29. None