Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Julia meets Data Science

Julia meets Data Science

Presentation give to the Bristol Data Science group in April 2017

Malcolm Sherrington

April 06, 2017
Tweet

More Decks by Malcolm Sherrington

Other Decks in Technology

Transcript

  1. Which programming language? !  V1.0 as released in 1991 ! 

    Does not have native arrays !  The latest version is not 100% compatible with the previous one !  Is 30-50 times slower than ‘C’ code !  Presently is probably the most popular language of choice among Data Scientists
  2. What is the core of Data Science? It can’t just

    be a rehash of statistics, can it? Look at academics courses to see what is proposed In the UK have been an explosion this year of Masters courses and also BSc`s from Nottingham, Warwick, Goldsmiths, Essex … However there is no actual agreement on course content even with the core subjects BUT most seem to agree on the following: Data analysis and visualisation Machine Learning Big Data Stochastic and Bayesian processing Deep Learning Information Retrieval and Textual Analysis
  3. What challenges does this present? !  Increasingly large volumes of

    data !  Real-time predictive analytics !  Stochastic / MCMC estimation !  Unstructured information !  Artificial intelligence and robotics
  4. The usual “suspects” !  Python !  R !  Matlab :

    ( Octave ) !  JVM : ( Scala / Clojure ) !  SAS / SPSS !  Excel !  Julia
  5. What makes a good Data Science Language? !  General purpose

    language with a sizable user community. !  A good set of general purpose libraries. !  Be free, open-source and platform independent. !  Be fast and efficient. !  Have a strong type system, and be statically typed with good compile-time type checking and type safety. !  Have reasonable type inference. !  Includes immutability and immutable data !  Have a REPL for interactive use and also support for IDEs. !  Provide both imperative and functional programming.
  6. The two language problem !  Language 1: used in the

    design / analysis !  Language 2: needed to speed up in the Enterprise
  7. Is the two language problem really that bad? !  Implicit

    The ‘2nd’ language does not exactly map the previous analysis in the 1st. !  Explicit The ‘IT’ section are not experts in the details of the domain subject matter !  Ownership The DS section lose control of the project and also the development timescales become much longer
  8. Julia: History and onwards … !  Original development team (G4)

    –  Jeff Bezanson, Stefan Karpinski –  Viral Shah, Alan Edelman !  Started at MIT in 2010 !  First release was February, 2012 !  v1.0 to be released Q3, 2017 !  Still actively maintained by G4 who are now all involved with Julia Computing
  9. (Some of ) Julia's features 1.  Almost all of Julia

    is written in Julia 2.  Multiple dispatch 3.  Homoiconic (macros) 4.  Fast execution speeds (LLVM / JIT-tered) 5.  Parallelism built-in 6.  Interoperability with other code
  10. Why is Julia fast? The big idea is to compile

    the code using LLVM Translate to an intermediate language (IR) Separate project (~2003) at Urbana Champaign Used at Apple for Clang compiler and Swift.
  11. What makes Julia special? !  It is written in Julia

    - apart from a small core - and the code is available to look at. !  The designers are data scientists and not tied to companies such as Google (Go) or Mozilla (Rust) and still actively involved in the language’s development. !  It eliminates the two-language problem. !  It has been designed for parallelism / distributed computation. !  It is designed for cooperation not confrontation. !  Julia combines the best from MATLAB, R and Python, is consistent, well designed and fast.
  12. Julia for Data Scientists !  Julia makes no distinction between

    analysts, developers and programmers !  Packages are grouped: JuliaStats, JuliaWeb, JuliaOpt !  Julia maps analytics to coding seemlessly !  Easy to call functions in foreign libraries !  Julia can interface with Python and R modules !  Julia can read and write R datafiles (amongst others) !  Common functionality is in Base or Packages