Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ISI Programming Course - 05 - Python Data Science Libraries

ISI Programming Course - 05 - Python Data Science Libraries

Jungwon Seo

October 15, 2018
Tweet

More Decks by Jungwon Seo

Other Decks in Technology

Transcript

  1. NumPy • Support for large, multi-dimensional arrays and matrices. •

    High-level mathematical functions for those arrays. • https://github.com/numpy/numpy
  2. Pandas • Data manipulation and analysis • Data structure and

    operations for numerical tables and time series. • The name is derived from the term "panel data” • https://github.com/pandas-dev/pandas
  3. Question #1 • You have data with 10,000 rows, however,

    many of them consist of empty cells (missing data). How can you remove the rows where more than half of the cells are empty?
  4. Question #2 • You want to merge two randomly sorted

    student information data sets. • One table has the course grades and the other has personal information. • How will you match the pair? • Imagine “Copy and Paste” after sorting in the Excel sheet.
  5. Question #3 • You need to calculate the count, mean,

    std, min, 25%, 50%, 75%, and max values in the specific row. • You need to figure out the number of unique values, the most frequent value, and the frequency.
  6. You can solve these tasks using Pandas without having a

    bunch of ‘if-else’ and ‘for’.
  7. Matplotlib • Plotting library • Designed to closely resemble that

    of MATLAB • https://github.com/matplotlib/matplotlib
  8. 1. NumPy • Scalar, Vector, Matrix, (Tensor) • Array creation

    • Array selection • Array operation Source Code