Adding Jupyter Notebooks in your Data Analytics Toolbag

Data Analytics for fun and proﬁt Harish Pillay 24 February
2020 [email protected], @harishpillay 1 https://tinyurl.com/sygs4bm

3 If the only tool you have is a hammer,
everything looks like a nail.

4 Spreadsheets 1. Spreadsheets are the natural habitat of all
ﬁnance-related people for the last 30 years 2. Lots of experience in using them (like pivot-tables for example) 3. Data size is limited to the desktop - even the SaaS providers place constraints 4. Scripting with spreadsheet macros - not via languages like Python

5 Issues with spreadsheets 1. Cascading Errors - one (https://www.teampay.co/insights/biggest-excel-mistakes-of-all-time/)
mistake, and it snowballs. 2. Scalability - limited in size of sheets 3. Performance - limited to system that it is used on 4. Testing - almost impossible to test correctness of a sheets 5. Traceability/Debugging - tiny changes can impact formulae and make things very difficult 6. All Inclusive - The data and calculations are all contained within the spreadsheet file and run from a local computer. 7. Operational Risk - All spreadsheets start as small/quick-fix calculations but some turn into permanent enterprise-wide solutions by feeding a number of business processes and the integrity of many financial, operational and regulatory processes is threatened due to a lack of visibility of the entire lineage.

6 Do please use spreadsheets 1. Correctness and accuracy is
not a priority 2. Data is not too big (i.e. no need for scalability) 3. No need for real-time updates 4. Using them as scratch pad to quickly put a prototype together 5. No need for long term maintenance.

7 https://jupyter.org By Cameron Oelsen - https://github.com/jupyter/jupyter.github.io/blob/master/assets/main-logo.svg, BSD, https://commons.wikimedia.org/w/index.php?curid=68763478

8 Why Jupyter?

9 Why? 1. To tap on modern techniques of data
analysis via the browser 2. Polyglot language support via kernels (language engine)- Python, R, Julia and 138 others - https://github.com/jupyter/jupyter/wiki/Jupyter-kernels 3. Rapid innovation in software is driven by the tsunami of data being generated and stored. These need fundamentally different tools for analytics that have to lightweight, fast, repeatable, scalable and low cost. 4. A perfect storm of software packages - Scikit-learn, Scipy, Matplotlib, Pandas, Numpy, TensorFlow, PyTorch etc that have added signiﬁcant value to the Jupyter ecosystem.

10 Very short history of Jupyter 2001: Fernando Perez started
the iPython project (this is what happens when you get bored with your PhD work) as a “afternoon hack”. 2014: Fernando announced spin-off of iPython to be Jupyter. This spin-off along with the modern framework of the web and lots of other open source initiatives, Jupyter (after 5 iterations) became a tool for data analytics. 2018 onwards: Multiple providers of Jupyter notebooks: Jupyter Project, Kaggle, Colaboratory, mybinder.org, OpenShift (https://github.com/jupyter-on-openshift/jupyter-notebooks)

11 Demo time

12 Go ahead and try this: 1. https://mybinder.org/v2/gh/jupyterlab/jupyterlab-demo/master?urlpath =lab/tree/demo 2.
https://colab.research.google.com 3. https://kaggle.com Some tutorial Resources: 1. https://github.com/datacamp/datacamp-community-tutorials/ 2. https://datajournalism.com/ 3. Lots of courses on coursera and edx.

13 Comments? Harish Pillay [email protected] @harishpillay https://tinyurl.com/sygs4bm

Adding Jupyter Notebooks in your Data Analytics...

Adding Jupyter Notebooks in your Data Analytics Toolbag

Harish Pillay

More Decks by Harish Pillay

Other Decks in Technology

Featured

Transcript

Data Analytics for fun and proﬁt Harish Pillay 24 February

2

3 If the only tool you have is a hammer,

4 Spreadsheets 1. Spreadsheets are the natural habitat of all

5 Issues with spreadsheets 1. Cascading Errors - one (https://www.teampay.co/insights/biggest-excel-mistakes-of-all-time/)

6 Do please use spreadsheets 1. Correctness and accuracy is

7 https://jupyter.org By Cameron Oelsen - https://github.com/jupyter/jupyter.github.io/blob/master/assets/main-logo.svg, BSD, https://commons.wikimedia.org/w/index.php?curid=68763478

8 Why Jupyter?

9 Why? 1. To tap on modern techniques of data

10 Very short history of Jupyter 2001: Fernando Perez started

11 Demo time

12 Go ahead and try this: 1. https://mybinder.org/v2/gh/jupyterlab/jupyterlab-demo/master?urlpath =lab/tree/demo 2.

13 Comments? Harish Pillay [email protected] @harishpillay https://tinyurl.com/sygs4bm