Adding Jupyter Notebooks in your Data Analytics Toolbag
Lots of people use only spreadsheets for their work and that is pretty much the only tool. This talk is about also adding Jupyter Notebooks to do repeatable and verifiable way of the data collected with spreadsheets.
finance-related people for the last 30 years 2. Lots of experience in using them (like pivot-tables for example) 3. Data size is limited to the desktop - even the SaaS providers place constraints 4. Scripting with spreadsheet macros - not via languages like Python
mistake, and it snowballs. 2. Scalability - limited in size of sheets 3. Performance - limited to system that it is used on 4. Testing - almost impossible to test correctness of a sheets 5. Traceability/Debugging - tiny changes can impact formulae and make things very difficult 6. All Inclusive - The data and calculations are all contained within the spreadsheet file and run from a local computer. 7. Operational Risk - All spreadsheets start as small/quick-fix calculations but some turn into permanent enterprise-wide solutions by feeding a number of business processes and the integrity of many financial, operational and regulatory processes is threatened due to a lack of visibility of the entire lineage.
not a priority 2. Data is not too big (i.e. no need for scalability) 3. No need for real-time updates 4. Using them as scratch pad to quickly put a prototype together 5. No need for long term maintenance.
analysis via the browser 2. Polyglot language support via kernels (language engine)- Python, R, Julia and 138 others - https://github.com/jupyter/jupyter/wiki/Jupyter-kernels 3. Rapid innovation in software is driven by the tsunami of data being generated and stored. These need fundamentally different tools for analytics that have to lightweight, fast, repeatable, scalable and low cost. 4. A perfect storm of software packages - Scikit-learn, Scipy, Matplotlib, Pandas, Numpy, TensorFlow, PyTorch etc that have added significant value to the Jupyter ecosystem.
the iPython project (this is what happens when you get bored with your PhD work) as a “afternoon hack”. 2014: Fernando announced spin-off of iPython to be Jupyter. This spin-off along with the modern framework of the web and lots of other open source initiatives, Jupyter (after 5 iterations) became a tool for data analytics. 2018 onwards: Multiple providers of Jupyter notebooks: Jupyter Project, Kaggle, Colaboratory, mybinder.org, OpenShift (https://github.com/jupyter-on-openshift/jupyter-notebooks)
https://colab.research.google.com 3. https://kaggle.com Some tutorial Resources: 1. https://github.com/datacamp/datacamp-community-tutorials/ 2. https://datajournalism.com/ 3. Lots of courses on coursera and edx.