Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Julia language: Shooting Star or a Flash in the...

Julia language: Shooting Star or a Flash in the Pan?

Malcolm Sherrington, founder London Julia User Group, talk at Data Science London meetup

Data Science London

March 31, 2014
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. History •  Gang of “four”: –  Jeff Bezanson, Virah Shah

    –  Stefan Karpinski, Alan Edelman •  Started at MIT in 2010 •  First release February, 2012 •  Still actively maintained by G4 •  MIT using Julia in courses (on youtube)
  2. The Red Queen’s Race   "Well,  in  our  country,"  said

     Alice,  s3ll   pan3ng  a  li6le,  "you'd  generally  get  to   somewhere  else  —  if  you  run  very  fast  for   a  long  3me,  as  we've  been  doing."     "A  slow  sort  of  country!"  said  the   Queen.  "Now,  here,  you  see,  it  takes  all   the  running  you  can  do,  to  keep  in  the   same  place.          If  you  want  to  get  somewhere  else,  you   must  run  at  least  twice  as  fast  as  that!  
  3. What happened to Ada? •  Designed 1977/83 for US DoD

    in order to supercede 100’s of languages DoD used. •  Mandated its use in 1987. •  Dropped the mandate in 1997. •  Still used in air traffic control systems such as iFacts, GNATS. •  Nearest meetup group is in Stockholm.
  4. Runners and Riders Current field: 1.  Runners: Matlab, R, Python

    2.  Riders: C/C++, Java 3.  Outsiders: Scala, Clojure 4.  Non-starter: Perl
  5. What makes a good Data Science Language? (1) •  Be

    a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks. •  Be free, open-source and platform independent. •  Be fast and efficient. •  Have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra. •  Have a strong type system, and be statically typed with good compile-time type checking and type safety. •  Have reasonable type inference. •  Have a REPL for interactive use
  6. What makes a good Data Science Language? (2) •  Have

    good tool support - including build tools, doc tools, testing tools, and an intelligent IDE. •  Have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design •  Allow imperative programming for occasions where it makes sense. •  Be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications. •  Have excellent built-in data capabilities. •  Have comprehensive math and statistical routines.
  7. Comparison with Matlab •  Julia syntax is similar to Matlab

    but its construction is purposely very different. •  Matlab has only one data structure (the matrix) and is optimised for matrix operations. Other native computations can be very slow. •  The focus on matrices lead to some important differences in MATLAB’s design compared to GP programming languages such as Julia. •  Julia uses similar matrix syntax to Matlab but also incorporates list comprehensions.
  8. Comparison with R •  Origins as open-source clone of S+.

    •  Still seen as a “statistical” DSL. •  R is single threaded and hard to speed up. •  Introduced the data frame structure which is also present in Julia •  Julia also has an RDatasets package. •  R has very good graphic and data visualisation support. •  Julia has a Google group: julia-stats. •  Julia can call R modules using the Rif package.
  9. Comparison with Python •  Python now seen by many as

    the Data Science language. •  Strength lies in its community support. •  Modules such as numpy, scipy, matplotlib and pandas are very powerful. •  Speed up using PyPy •  Mature frameworks such as Django •  Julia approach is co-operation not confrontation via the PyCall and also IJulia ó IPython
  10. What makes Julia special? •  It is written in Julia,

    apart from a small core, and the code is available to look at. •  The designers are data scientists and not tied to companies such as Google (Go) or Mozilla (Rust). •  It has been designed for parallelism / distributed computation •  It takes every opportunity to cooperate rather than confront. •  Julia intends to combine the best from MATLAB, R and Python into one language that is to be consistent, well designed and fast.
  11. Special features •  Easy  installa*on   •  JIT  compila*on  

    •  Built-­‐in  package  manager   •  Corou*nes  and  green  threads   •  Mul*ple  dispatch   •  Dynamic  type  system   •  Meta  programming  with  Lisp-­‐like  macros   •  Call  C  func*ons  directly   •  Call  Python  func*ons:  (PyCall)   •  Best-­‐of-­‐breed  C  and  Fortran  libraries   •  Unicode  support    
  12. The ones to read … •  Parallel computing –  http://

    julia.readthedocs/en/latest/manual/parallel-computing •  Metaprogramming –  http://docs.julialang.org/en/latest/manual/metaprogramming •  Networking and streams –  http://docs.julialang.org/en/latest/manual/networking-and-streams •  Calling C and Fortran code –  http:// julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code
  13. Modules and packages •  Julia has its own built-in package

    manager •  There are (now) 250+ packages. •  These include: –  Statistics –  Graphics –  System tools –  Database –  Web and Cloud –  Simulation •  Its quite easy to add your own package (via GITHub)
  14. 100+  contributors,  1000+  mailing  list  subscribers,  175+  packages   AWS,

     ArgParse,  BSplines,  Benchmark,  BinDeps,  BioSeq,  BloomFilters,  Cairo,   Calculus,  Calendar,  Cartesian,  Catalan,  ChainedVectors,  ChemicalKine*cs,  Clang,   Clp,  ClusterManagers,  Clustering,  Codecs,  CoinMP,  Color,  Compose,   Con*nuedFrac*ons,  Cpp,  Cubature,  Curl,  DICOM,  DWARF,  DataFrames,   DataStructures,  Date*me,  Debug,  DecisionTree,  Devectorize,  DictU*ls,  DictViews,   DiscreteFactor,  Distance,Distribu*ons,  DualNumbers,  ELF,  Ellip*c,  Example,   ExpressionU*ls,  FITSIO,  FactCheck,  FastaIO,  FastaRead,  FileFind,   Func*onalCollec*ons,  Func*onalU*ls,  GLFW,  GLM,  GLPK,  GLUT,  GSL,GZip,  Gadfly,   Gaston,  GeoIP,  GeometricMCMC,GetC,  GoogleCharts,  Graphs,  Grid,  Gtk,  Gurobi,   HDF5,  HDFS,  HTTP,  HTTPClient,  Hadamard,  HZpCommon,  HZpParser,   HZpServer,HypothesisTests,  ICU,  ImageView,Images,  ImmutableArrays,  IniFile,   Iterators,  Ito,  JSON,  JudyDicts,  JuliaWebRepl,  KLDivergence,  LIBSVM,  Languages,   LazySequences,  LibCURL,  LibExpat,  LinProgGLPK,  Loss,  MAT,  MATLAB,  MCMC,   MDCT,  MLBase,MNIST,  MarketTechnicals,  MathProg,  MathProgBase,  Meddle,   Memoize,  Meshes,  Me*s,  MixedModels,Monads,  Mongo,  Mongrel2,  Morsel,   Mustache,  NHST,  NIfTI,  NLopt,  Named,  NetCDF,    NumericExtensions,   NumericFunctors,    ODBC,  ODE,  OpenGL,  OpenSSL,  Op*m,  Op*ons,  PLX,  PTools,   PaZernDispatch,  Phylo,Phylogene*cs,  Polynomial,  Profile,  ProgressMeter,   ProjectTemplate,  PyCall,  PyPlot,  PySide,  Quandl,QuickCheck,  RDatasets,  REPL,   RNGTest,  RPMmd,  RandomMatrices,  Readline,  Regression,  Resampling,  Rif,   Rmath,  RobustStats,  Roots,  SDE,  SDL,  SVM,  SemidefiniteProgramming,  SimJulia,   SimpleMCMC,  Sims,Sodium,  Soundex,  Sqlite,  Stats,  StrPack,  Sundials,  SymPy,   TOML,  Terminals,  TextAnalysis,  TextWrap,  TimeModels,  TimeSeries,  Tk,   TopicModels,  TradingInstrument,  Trie,  URLParse,  UTF16,  Units,  ValueDispatch,   WAV,  WebSockets,  Winston,  YAML,  ZMQ,  Zlib  
  15. Julia does have graphics! •  Winston (Standard 2D graphics) • 

    Gadfly (Like 'gg2plot') •  Gaston (Uses gnuplot as graphics engine) •  PyPlot (Uses IPython/matplotlib.py) •  Plotly (http://plot.ly/api)
  16. What’s missing? •  Cached package loading –  At present all

    modules are compiled on the fly –  Preloading would reduce startup times •  Better database connectivity –  Uses ODBC –  Simple d/b support via SQLite –  No native Oracle, MySQL or Postgresql •  More comprehensive NoSQL support –  Packages for Mongo, Redis. –  JSON package helps with CouchDB, Neo4j
  17. Familiar syntax for Matlab/Octave users func*on  randmatstat  (t;  n=10)  

           v  =  zeros(t)          w  =  zeros(t)          for  i  =  1:t                  a  =  randn(n,n)                  b  =  randn(n,n)                  c    =  randn(n,n)                  d  =  randn(n,n)                  P  =  [a  b  c  d]                  Q  =  [a  b;  c  d]                  v[i]  =  trace((P'*P)^4)                  w[i]  =  trace((Q'*Q)^4)          end          std(v)/mean(v),    std(w)/mean(w)   end  
  18. You can also write low-level code func*on  qsort!(a,lo,hi)    

         i,  j  =  lo,  hi          while    i  <  hi                  pivot  =  a[(lo+hi)>>>1]                  while    i  <=  j                          while    a[i]  <  pivot;    i  =  i+1;    end                          while    a[j]  >  pivot;    j  =  j-­‐1;    end                            if    i  <=  j                                    a[i],  a[j]  =  a[j],  a[i]                                      i,  j  =  i+1,  j-­‐1                          end                  end                  if    lo  <  j;    qsort!(a,lo,j);    end                  lo,  j  =  i,  hi          end          return   end  
  19. Going further … •  Start with the julia.org website • 

    Install Julia and read the documentation •  Look at the training material –  http://julialang.org/teaching/ •  Try the Julia Studio •  Read/subscribe to Google-groups sites –  julia-users, julia-stats, julia-opt, julia-dev •  Join the LJuUG –  http://www.meetup.com/London-Julia-User-Group
  20. Useful links London Julia Users Group http://londonjulia.org Julia (main) site

    http://julialang.org Julia (docs) site http://docs.julialang.org Forio: Julia Studio http://forio.com/products/julia-studio