Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

P8105: What is data science?

P8105: What is dataย science?

Avatar for Jeff Goldsmith

Jeff Goldsmith

May 31, 2017
Tweet

More Decks by Jeff Goldsmith

Other Decks in Education

Transcript

  1. 3 โ€ข Data science = statistics โ€ข Data science =

    computer science โ€ข Data science = machine learning โ€ข Data science = statistics + computer science + machine learning โ€ข Data scientists are big data wranglers โ€ข โ€œA data scientist is just a sexier word for statistician.โ€ โ€“Nate Silver โ€ข โ€œA data scientist is a better computer scientist than a statistician and is a better statistician than a computer scientist.โ€ โ€ข โ€œA data scientist is a statistician who is usefulโ€ โ€“ Hadley Wickham โ€ข A data scientist is a good statistical analyst โ€ข A data scientist is a statistician who codes in python Some not great definitions
  2. 3 โ€ข Data science = statistics โ€ข Data science =

    computer science โ€ข Data science = machine learning โ€ข Data science = statistics + computer science + machine learning โ€ข Data scientists are big data wranglers โ€ข โ€œA data scientist is just a sexier word for statistician.โ€ โ€“Nate Silver โ€ข โ€œA data scientist is a better computer scientist than a statistician and is a better statistician than a computer scientist.โ€ โ€ข โ€œA data scientist is a statistician who is usefulโ€ โ€“ Hadley Wickham โ€ข A data scientist is a good statistical analyst โ€ข A data scientist is a statistician who codes in python Some not great definitions
  3. 6 โ€ข โ€œData science is just โ€ฆโ€ definitions miss the

    point โ€“ If data science is just statistics (or machine learning, or computer science, or engineering) we wouldnโ€™t need a new term, let alone a new discipline โ€“ The popularity of โ€œdata scienceโ€ suggests that thereโ€™s a newly recognized need โ€ข โ€œA data scientist is a good โ€ whatever definitions arenโ€™t helpful โ€“ Theyโ€™re almost deliberately judgmental โ€“ A good definition doesnโ€™t depend on opinions โ€“ There are โ€œdata scientistsโ€ in each discipline, but some very good statisticians / computer scientists / etc arenโ€™t โ€œdata scientistsโ€ Why these definitions are bad
  4. 7 โ€ข โ€œData science is the combination of these 40

    skills โ€ฆโ€ are unrealistic Why these definitions are bad https://www.youtube.com/watch?v=b9ZLXwAuUyw&app=desktop
  5. 8 โ€ข Kinda like the blind men and the elephant

    โ€“ no one perspective is completely right or completely wrong, but piling them all up isnโ€™t right either โ€ข They give a sense of what is valued by the data science community โ€“ using data in a principled way and coding well Why these definitions are good
  6. 9 โ€ข Data science is interdisciplinary โ€“ You do need

    a breadth of skills โ€“ You also need a particular mindset โ€“ curiosity and engagement is critical โ€“ You need some domain knowledge to be successful Why these definitions are good https://www.xkcd.com/1831/
  7. 10 โ€ข Weโ€™ll focus mostly on process; how to formulate

    and answer questions through analyses are the focus of other courses โ€ข This is also a โ€œbadโ€ definition, in that it doesnโ€™t explain where data science came from For the purpose of this class: Data science is the study of formulating and rigorously answering questions using a data-centric process that emphasizes clarity, reproducibility, effective communication, and ethical practices.
  8. 12 โ€œWhat is the point of โ€˜data scienceโ€™? Arenโ€™t we

    already data scientists?โ€ First question from the audience
  9. 12 โ€œWhat is the point of โ€˜data scienceโ€™? Arenโ€™t we

    already data scientists?โ€ First question from the audience ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก
  10. 13 โ€œA data scientist is a statistician whoโ€™s usefulโ€ Response

    from Hadley Wickham (roughly) ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฃ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽŠ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿคฃ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜ ๐ŸŽŠ ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ˜€ ๐ŸŽ‰ ๐Ÿ˜€ ๐Ÿ‘ ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ˜ ๐ŸŽ‰ ๐Ÿ˜ ๐Ÿคฃ ๐ŸŽŠ ๐Ÿคฃ ๐Ÿคฃ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ˜ก ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿ™ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ™„ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿ™ ๐Ÿคฆ ๐Ÿ™ ๐Ÿ‘Ž ๐Ÿ˜ก ๐Ÿคฆ ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ ๐Ÿ‘Ž ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฆ ๐Ÿฅฑ ๐Ÿคฆ ๐Ÿ˜‘ ๐Ÿ˜ก ๐Ÿ™„ ๐Ÿ˜‘ ๐Ÿคฌ
  11. 14 โ€ข Itโ€™s easy, in 2024, to forget what the

    statistical identity crisis phase was like โ€ข But that was a whole thing, for quite a while That question is understandable
  12. 14 โ€ข Itโ€™s easy, in 2024, to forget what the

    statistical identity crisis phase was like โ€ข But that was a whole thing, for quite a while That question is understandable
  13. 15 โ€ข Data science emerged in parallel to (at least)

    six broad trends: โ€“ Big data โ€“ Emphasis on prediction โ€“ Reproducibility crisis in science โ€“ Interdisciplinary research โ€“ Diversity, equity, and inclusion โ€“ Everything should be on the internet โ€ข These werenโ€™t new in 2012 and arenโ€™t unique to data science โ€ข โ€ฆ but they had a big impact on the โ€œdata scienceโ€ perspective What made โ€œdata scienceโ€ happen
  14. 16 โ€ข Core data science values arenโ€™t built into the

    definition, but were critical to the valence of โ€œdata scienceโ€ Connotation >> definition
  15. 17 Public Health Data Science [Public health] data science is

    the study of formulating and rigorously answering questions [in order to advance health and well-being] using a data-centric process that emphasizes clarity, reproducibility, effective communication, and ethical practices.
  16. 18 โ€ข Public health training emphasizes some elements that are

    critical data science thinking and work: โ€“ Study design โ€“ Sampling process โ€“ Measurement process โ€“ Desire vs ability to infer causation โ€“ Cross-disciplinary collaboration โ€“ Engagement with data ethics โ€“ Public dissemination and dialog โ€œPublic Healthโ€ is the important part
  17. 18 โ€ข Public health training emphasizes some elements that are

    critical data science thinking and work: โ€“ Study design โ€“ Sampling process โ€“ Measurement process โ€“ Desire vs ability to infer causation โ€“ Cross-disciplinary collaboration โ€“ Engagement with data ethics โ€“ Public dissemination and dialog โ€œPublic Healthโ€ is the important part From โ€œTotal Survey Error: Past, Present, and Futureโ€ (Groves and Lyberg) via โ€œData Alone Isnโ€™t Ground Truthโ€ by Angela Bassa
  18. 19 โ€ข Build a broad knowledge base โ€ข Donโ€™t be

    embarrassed by what you donโ€™t know โ€“ Corollary: donโ€™t be a jerk to people who donโ€™t know what you know โ€ข Ask questions (well) and keep learning โ€ข Pretty much the same as learning anything, but hard because people donโ€™t like to show their code How to learn data science
  19. 19 โ€ข Build a broad knowledge base โ€ข Donโ€™t be

    embarrassed by what you donโ€™t know โ€“ Corollary: donโ€™t be a jerk to people who donโ€™t know what you know โ€ข Ask questions (well) and keep learning โ€ข Pretty much the same as learning anything, but hard because people donโ€™t like to show their code How to learn data science
  20. 20 โ€ข All questions are good questions, but sometimes good

    questions arenโ€™t asked well โ€ข Think through what youโ€™re trying to ask โ€ข If your code is broken, create a simple example that illustrates whatโ€™s broken How to learn data science
  21. 21 โ€ข Build up you โ€œknown knownsโ€ โ€ข Recognize your

    โ€œknown unknownsโ€ โ€ข Avoid โ€œunknown unknowsโ€ How to learn data science
  22. 25 โ€ข It learns, in ways that are difficult to

    interrogate, from input data โ€ข Even with curation, this can go badly ChatGPT is โ€œAIโ€ https://twitter.com/spiantado/status/1599462375887114240
  23. 28 Ben has some professional coding experience of his own,

    but it was brief, shallow, and now about twenty years out of date. GPT-4 on its own is, for the moment, a worse programmer than I am. Ben is much worse. But Ben plus GPT-4 is a dangerous thing. Where is the value in being a โ€œprogrammerโ€ now?
  24. 28 Ben has some professional coding experience of his own,

    but it was brief, shallow, and now about twenty years out of date. GPT-4 on its own is, for the moment, a worse programmer than I am. Ben is much worse. But Ben plus GPT-4 is a dangerous thing. I still feel secure in my profession. In fact, I feel somewhat more secure than before. ... The thing Iโ€™m relatively good at is knowing whatโ€™s worth building, what users like, how to communicate both technically and humanely. I suspect that ... we will think of โ€œthe programmerโ€ the way we now look back on โ€œthe computer,โ€ when that phrase referred to a person who did calculations by hand. Where is the value in being a โ€œprogrammerโ€ now?
  25. 30 Reproducibility โ€ข One concrete emphasis of data science is

    reproducibility โ€ข Given the same data and the same code, anyone should be able to produce the same results โ€“ Code is an important means of communication โ€“ New tools encourage reproducibility, but the concept is not platform- dependent
  26. 31 Sharing code โ€ข Openness is valuable โ€“ identify errors

    early and fix them quickly โ€ข Try to think of sharing code as a gesture of confidence and humility โ€“ Youโ€™ve done your best, and you should feel good about that โ€“ Everyone makes mistakes sometimes; when you do, thatโ€™s fine โ€“ fix it and move on โ€ข Lack of transparency can reflect a lot of things โ€ข Of these, arrogance is the most dangerous