Jupyter hearts PixieDust: Making Jupyter Notebooks Faster, Flexible, and Easier to use

© 2017 IBM Corp. Watson Data Platform @rajrsingh Jupyter ❤
Pixiedust Making Jupyter Notebooks Faster, Flexible, and Easier to use Raj Singh, PhD <[email protected]> Developer Advocate IBM Watson Data Platform

© 2017 IBM Corp. Watson Data Platform @rajrsingh The data
problems of tomorrow cannot be solved by data scientists alone Courtesy of Quinn Dumbrowski • https://www.flickr.com/photos/quinnanya/2722672659

© 2017 IBM Corp. Watson Data Platform @rajrsingh How do
we blur the lines between developers and data scientists? Disclaimer: All characters and events depicted in this story are entirely fictitious. Any similarity to actual use cases, events or persons is actually intentional.

© 2017 IBM Corp. Watson Data Platform @rajrsingh Organizations are
Systems of Systems SYSTEMS OF ORCHESTRATION Systems of Operation Systems of Record Systems of Engagement

© 2017 IBM Corp. Watson Data Platform @rajrsingh • Hold
a master degree in computer science • 10 years experience, 6 years with the company • Languages of choice: Java, Node.js, HTML5/CSS3 • Data: No SQL (Cloudant, Mongo), relational • No major experience with Big Data T H E F U L L S T A C K D E V E L O P E R “The best line of code is the one I didn't have to write!” MEET BEN

© 2017 IBM Corp. Watson Data Platform @rajrsingh • Hold
a PHD in mathematics • 5 years experience, 2 years with the company • Proficient in Python and R • Expert in Machine Learning and Data visualization • Software engineering is not her thing T H E D A T A S C I E N T I S T “In God we trust. All others bring data.” – W. Edwards Deming MEET NATASHA

© 2017 IBM Corp. Watson Data Platform @rajrsingh “We have
an urgent need to build an application for marketing that can provide real-time sentiment analysis on Twitter data.” Surprise meeting with the VP of Development! Courtesy of Charles Forerunner • https://unsplash.com/photos/3fPXt37X6UQ

© 2017 IBM Corp. Watson Data Platform @rajrsingh KEY CONSTRAINTS
• You only have 6 weeks to build the application • Target consumer is marketing staff, so it must be easy to use • It must scale out of the box – look at using Apache Spark

© 2017 IBM Corp. Watson Data Platform @rajrsingh Ben &
Natasha start brainstorming • I’ll work on data acquisition from Twitter and enrichment with sentiment analysis scores using Spark Streaming • I know Java very well, but I don’t have time to learn Python. • However, I am willing to learn Scala if that helps improve my productivity • I’ll perform the data exploration and analysis • I know Python and R, but I am not familiar enough with Java or Scala • I like pandas and numpy. I’m ok to learn Spark but expect the same level of apis • I need to work iteratively with the data I’ll need to do some data exploration too. I’ll need APIs to access my data.

© 2017 IBM Corp. Watson Data Platform @rajrsingh How can
we collaborate? Notebooks?

© 2017 IBM Corp. Watson Data Platform @rajrsingh Text Annotations
Code Data Visualizations Widgets Output Open source notebooks • Web based UI for running Apache Spark console commands • Easy, no install Spark accelerator • Best way to start working with Apache Spark • Multiple flavors • Jupyter • Zeppelin • Local or cloud hosted • IBM Data Science Experience • Databricks

© 2017 IBM Corp. Watson Data Platform @rajrsingh Browser Kernel
Code Output https://www.bluetrack.com/uploads/items_images/kernel-of-corn-stress-balls1_thumb.jpg?r=1 What is Jupyter? • "Open source, interactive data science and scientific computing" • Formerly IPython • Large, open, growing community and ecosystem • Very popular • ~2 million users for IPython • $6m in funding in 2015 • 200 contributors to notebook subproject alone • 275,000 public notebooks on GitHub

© 2017 IBM Corp. Watson Data Platform @rajrsingh Batch Job
(spark-submit) Interactive Notebook Spark Application (driver) Master (cluster manager) Spark Cluster Worker Node Worker Node ... Notebook Server Browser Kernel Master (cluster manager) Spark Cluster Worker Node Worker Node ... RDD Partitioning Task packaging and dispatching Worker node scheduling What is Spark?

© 2017 IBM Corp. Watson Data Platform @rajrsingh code results
Kernel with Spark support Services: Congitive, … Libraries: Statistics, Math, Machine Learning, Plotting, Data (flat files, relational database, NoSQL database, …) Worker Worker ... Worker Big Data Analysis

© 2017 IBM Corp. Watson Data Platform @rajrsingh — BEN
“But they seem complicated for developers like me” Notebooks are powerful data science tools

© 2017 IBM Corp. Watson Data Platform @rajrsingh Enter PixieDust…
• Visualize data (e.g., Table, Charts, Map, etc) • Full stack app development with PixieApps • Download/export data • Use Scala directly in a Notebook • Install packages into Notebook • Spark job progress monitor • Extensible Open Source Python helper library for Jupyter Notebooks https://github.com/ibm-watson-data-lab/pixiedust

© 2017 IBM Corp. Watson Data Platform @rajrsingh — NATASHA
“Expressing everything in code is nice, but LOB users don’t want to run code” What about the Line of Business User?

© 2017 IBM Corp. Watson Data Platform @rajrsingh Enter PixieApps
• Python classes that extend PixieDust, letting you write UI for your analytics • Easy to build: mostly HTML and CSS with some custom attributes (micro-format style) • With PixieApps you can: • Create different html views with routes to invoke them • Invoke Python Scripts from user interactions • Run in the notebook cell output or in a Dialog • Use cases: • Dashboards • Data Browsers • Data Pipeline Management

© 2017 IBM Corp. Watson Data Platform @rajrsingh Twitter Sentiment
analysis with Watson Tone Analyzer and Watson Personality Insights https://github.com/ibm-watson-data-lab/pixiedust_incubator/tree/master/twitterdemo

© 2017 IBM Corp. Watson Data Platform @rajrsingh PixieDust demo
Twitter Sentiment with Watson and PixieDust https://github.com/ibm-watson-data-lab/pixiedust/blob/master/notebook/Twitter%20Sentiment%20with%20Watson%20and%20Pixiedust.ipynb

© 2017 IBM Corp. Watson Data Platform @rajrsingh “This is
great, but C-Suite executives need to be able to select filters and see real-time charts without writing code!” Updating the VP Courtesy of Charles Forerunner • https://unsplash.com/photos/3fPXt37X6UQ

© 2017 IBM Corp. Watson Data Platform @rajrsingh PixieApp demo
Sentiment Analysis of Twitter Hashtags with Spark https://github.com/ibm-watson-data-lab/pixiedust/blob/master/notebook/Twitter%20Sentiment%20with%20Watson%20and%20Pixiedust.ipynb https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-spark-7ee6ca5c1585

© 2017 IBM Corp. Watson Data Platform @rajrsingh Thanks •
Pixiedust • https://github.com/ibm-watson-data-lab/pixiedust • Project Jupyter • http://jupyter.org/ • IBM Data Science Experience • http://datascience.ibm.com • free 30-day trial • Me • [email protected] • Tweet @rajrsingh • Resources • https://github.com/ibm-watson-data-lab/pixiedust • https://ibm-watson-data- lab.github.io/pixiedusthttps://medium.com/ibm- watson-data-lab/i-am-not-a-data-scientist- efe7ca6ceba2 • https://spark.apache.org • https://www.ibm.com/us-en/marketplace/spark- as-a-service • http://datascience.ibm.com • https://www.ibm.com/watson/developercloud/ton e-analyzer.html • https://medium.com/ibm-watson-data-lab/real- time-sentiment-analysis-of-twitter-hashtags-with- spark-7ee6ca5c1585 • https://gist.github.com/vabarbosa/76d08b1cc6f80 d5fc80856a1f3f32014 • https://gist.github.com/vabarbosa/dca176c3a68f0 c101cbe475571e56bf7 • https://ibm.biz/pixiedustvis • https://ibm.biz/pixiedustlab

Jupyter hearts PixieDust: Making Jupyter Notebo...

Jupyter hearts PixieDust: Making Jupyter Notebooks Faster, Flexible, and Easier to use

Raj Singh

More Decks by Raj Singh

Other Decks in Technology

Featured

Transcript

© 2017 IBM Corp. Watson Data Platform @rajrsingh Jupyter ❤

© 2017 IBM Corp. Watson Data Platform @rajrsingh https://hbr.org/2016/02/the-rise-of-data-driven-decision-making-is-real-but-uneven

© 2017 IBM Corp. Watson Data Platform @rajrsingh The data

© 2017 IBM Corp. Watson Data Platform @rajrsingh How do

© 2017 IBM Corp. Watson Data Platform @rajrsingh Organizations are

© 2017 IBM Corp. Watson Data Platform @rajrsingh • Hold

© 2017 IBM Corp. Watson Data Platform @rajrsingh • Hold

© 2017 IBM Corp. Watson Data Platform @rajrsingh “We have

© 2017 IBM Corp. Watson Data Platform @rajrsingh KEY CONSTRAINTS

© 2017 IBM Corp. Watson Data Platform @rajrsingh Ben &

© 2017 IBM Corp. Watson Data Platform @rajrsingh How can

© 2017 IBM Corp. Watson Data Platform @rajrsingh Text Annotations

© 2017 IBM Corp. Watson Data Platform @rajrsingh Browser Kernel

© 2017 IBM Corp. Watson Data Platform @rajrsingh Batch Job

© 2017 IBM Corp. Watson Data Platform @rajrsingh code results

© 2017 IBM Corp. Watson Data Platform @rajrsingh — BEN

© 2017 IBM Corp. Watson Data Platform @rajrsingh Enter PixieDust…

© 2017 IBM Corp. Watson Data Platform @rajrsingh — NATASHA

© 2017 IBM Corp. Watson Data Platform @rajrsingh Enter PixieApps

© 2017 IBM Corp. Watson Data Platform @rajrsingh Twitter Sentiment

© 2017 IBM Corp. Watson Data Platform @rajrsingh Architecture

© 2017 IBM Corp. Watson Data Platform @rajrsingh PixieDust demo

© 2017 IBM Corp. Watson Data Platform @rajrsingh “This is

© 2017 IBM Corp. Watson Data Platform @rajrsingh PixieApp demo

© 2017 IBM Corp. Watson Data Platform @rajrsingh

© 2017 IBM Corp. Watson Data Platform @rajrsingh Thanks •