Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualizing Twitter data with Blaze and Bokeh - PyTexas 2014

Visualizing Twitter data with Blaze and Bokeh - PyTexas 2014

"Visualizing Twitter data with Blaze and Bokeh" at PyTexas 2014 by Christine Doig, Data Scientist at Continuum Analytics.

Making nice interactive data visualizations in the browser should be easy and fun! Let's explore tweets with simple IPython notebooks, a Blaze interface and Bokeh plots!

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets. http://bokeh.pydata.org/

Blaze provides a uniform and adaptable interface to access a variety of backends, which include streaming Python, Pandas, SQLAlchemy, and Spark. http://blaze.pydata.org/

Christine Doig

October 05, 2014
Tweet

More Decks by Christine Doig

Other Decks in Technology

Transcript

  1. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas 2014 About me Christine Doig @ch_doig Data Scientist, Continuum Analytics ! Background: • Industrial Engineering, UPC. • Data Mining and Business Intelligence. ! Experience analyzing diverse datasets: • Energy • Manufacturing • Banking • Social media ! … using a diverse set of tools: SQL, Matlab, Excel, SAS, R, Python
  2. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 About Continuum Analytics Intro Large scale data analytics Interactive data visualization A practical example http://continuum.io/ ! ! We build technologies that enable analysts and data scientist to answer questions from the data all around us. Committed to Open Source Areas of Focus • Software solutions • Consulting • Training • Anaconda: Free Python distribution • Numba, Conda, Blaze, Bokeh, dynd • Sponsor
  3. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 About this talk Visualizing Twitter Data with Blaze and Bokeh 1. Large scale data analytics - Blaze ! 2. Interactive data visualization - Bokeh ! 3. A practical example - Twitter dataset Intro Large scale data analytics Interactive data visualization A practical example Introduction to large-scale data analytics and interactive visualization through a tweets dataset practical example Objective Structure
  4. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example What’s a Data Scientist?
  5. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example • Solid hands-on experience in developing analytical solutions using statistical tools (e.g. R, SAS, or similar) • Experience in implementing Machine Learning systems which may include classification, clustering, natural language processing and time series analysis. • Hands-on experience in database management (MS SQL, MySQL, PostgreSQL…) • Solid hands-on coding experience in Python, Java, C++, or similar • Experience in dealing with large data sets and a solid understanding of Big Data technologies and applications (AWS, Hadoop, MapReduce, Hive, Hbase, etc). • Sound presentation skills, visualizing complicated data science results in Tableau, Microstrategy, or similar • Comfortable working with front-end development technologies, including: HTML, CSS, JavaScript, D3.js, Django, etc. What’s a Data Scientist?
  6. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example • Solid hands-on experience in developing analytical solutions using statistical tools (e.g. R, SAS, or similar) • Experience in implementing Machine Learning systems which may include classification, clustering, natural language processing and time series analysis. • Hands-on experience in database management (MS SQL, MySQL, PostgreSQL…) • Solid hands-on coding experience in Python, Java, C++, or similar • Experience in dealing with large data sets and a solid understanding of Big Data technologies and applications (AWS, Hadoop, MapReduce, Hive, Hbase, etc). • Sound presentation skills, visualizing complicated data science results in Tableau, Microstrategy, or similar • Comfortable working with front-end development technologies, including: HTML, CSS, JavaScript, D3.js, Django, etc. … and be human! What’s a Data Scientist?
  7. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas
  8. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data.
  9. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Free enterprise-ready Python distribution
  10. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Free enterprise-ready Python distribution Anaconda
  11. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda
  12. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda
  13. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Power to speed up
  14. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Numba Power to speed up
  15. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Numba Power to speed up Scale
  16. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Blaze Numba Power to speed up Scale
  17. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Blaze Numba Power to speed up Interactive data visualizations Scale
  18. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Blaze Bokeh Numba Power to speed up Interactive data visualizations Scale
  19. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Blaze Bokeh Numba Power to speed up Share and deploy Interactive data visualizations Scale
  20. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example “ At my company X, we have peta/terabytes of data, just lying around, waiting for someone to explore it” - someone at PyTexas Let’s make it easier for users to explore and extract useful insights out of data. Package manager Free enterprise-ready Python distribution Anaconda Conda Blaze Bokeh Numba Wakari Power to speed up Share and deploy Interactive data visualizations Scale
  21. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Large scale data analytics - An Overview BI - DB DM/Stats/ML Scientific Computing Distributed Systems
  22. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Large scale data analytics - An Overview BI - DB DM/Stats/ML Scientific Computing Distributed Systems
  23. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Large scale data analytics - An Overview BI - DB DM/Stats/ML Scientific Computing Distributed Systems
  24. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Large scale data analytics - An Overview BI - DB DM/Stats/ML Scientific Computing Distributed Systems Numba bcolz
  25. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Large scale data analytics - An Overview BI - DB DM/Stats/ML Scientific Computing Distributed Systems Numba bcolz
  26. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Large scale data analytics - An Overview BI - DB DM/Stats/ML Scientific Computing Distributed Systems Numba bcolz RHadoop
  27. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze Source: http://worrydream.com/ABriefRantOnTheFutureOfInteractionDesign/
  28. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze Source: http://worrydream.com/ABriefRantOnTheFutureOfInteractionDesign/
  29. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze Source: http://worrydream.com/ABriefRantOnTheFutureOfInteractionDesign/
  30. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze Source: http://worrydream.com/ABriefRantOnTheFutureOfInteractionDesign/
  31. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Distributed Systems Scientific Computing BI - DB DM/Stats/ML Blaze bcolz Connecting technologies to users Connecting technologies to each other Blaze hdf5
  32. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame Intro Large scale data analytics Interactive data visualization A practical example HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze
  33. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze.expressions
  34. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze.data
  35. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze.data
  36. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze.data
  37. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze.data
  38. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data Storage Abstract expressions Computational backend csv HDF5 bcolz DataFrame HDFS selection filter group by join column wise Pandas Streaming Python Spark MongoDB SQLAlchemy json Blaze.compute
  39. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Table Using the interactive Table object we can interact with a variety of computational backends with the familiarity of a local DataFrame
  40. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 15 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Table
  41. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 15 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Table
  42. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 15 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Table
  43. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 15 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Table
  44. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Migrations - into the into function makes it easy to moves data from one container type to another
  45. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Migrations - into the into function makes it easy to moves data from one container type to another
  46. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Migrations - into the into function makes it easy to moves data from one container type to another
  47. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Blaze.API Migrations - into the into function makes it easy to moves data from one container type to another
  48. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 17 Blaze notebooks Intro Large scale data analytics Interactive data visualization A practical example
  49. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Why I like using Blaze? ! - Syntax is very similar to Pandas - Easy to scale - Easy to find best computational backend to a particular dataset - Easy to adapt my code if someone handles me a dataset in a different format/ backend
  50. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Want to learn more about Blaze? Free Webinar: http://www.continuum.io/webinars/getting-started-with-blaze ! Blogpost: http://continuum.io/blog/blaze-expressions http://continuum.io/blog/blaze-migrations http://continuum.io/blog/blaze-hmda ! Docs and source code: http://blaze.pydata.org/ https://github.com/ContinuumIO/blaze
  51. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Data visualization - An Overview Results presentation Visual analytics Static Interactive Small datasets Large datasets Traditional plots Novel graphics
  52. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Bokeh • Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • Matplotlib compatibility • No need to write Javascript http://bokeh.pydata.org/ https://github.com/ContinuumIO/bokeh
  53. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Intro Large scale data analytics Interactive data visualization A practical example Bokeh - Interactive, Visual analytics • Tools (e.g. Pan, Wheel Zoom, Save, Resize, Select, Reset View)
  54. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 23 Intro Large scale data analytics Interactive data visualization A practical example Bokeh - Interactive, Visual analytics • Widgets and dashboards
  55. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 24 Bokeh - Interactive, Visual analytics Intro Large scale data analytics Interactive data visualization A practical example • Crossfilter
  56. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 25 Bokeh - Large datasets Server-side downsampling and abstract rendering Intro Large scale data analytics Interactive data visualization A practical example
  57. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 26 Bokeh - No JavaScript Intro Large scale data analytics Interactive data visualization A practical example
  58. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 26 Bokeh - No JavaScript Intro Large scale data analytics Interactive data visualization A practical example
  59. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 27 Bokeh examples Intro Large scale data analytics Interactive data visualization A practical example
  60. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 A practical example: Twitter dataset Intro Large scale data analytics Interactive data visualization A practical example
  61. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Setting up your environment ! 1. Download Anaconda https://store.continuum.io/cshop/anaconda/ 2. Create your conda environment $conda create -n pytexas python=2.7 blaze bokeh ! 3. Activate environment $source activate pytexas $conda install mongodb pyspark ! 4. Start notebook $ipython notebook ! 5. Execute the notebooks Intro Large scale data analytics Interactive data visualization A practical example
  62. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 Want to know more about Mining Twitter? Intro Large scale data analytics Interactive data visualization A practical example https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/
  63. Visualizing Twitter Data with Blaze and Bokeh, Christine Doig, PyTexas

    2014 31 Twitter notebooks Intro Large scale data analytics Interactive data visualization A practical example