Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Julia: Shooting star or a flash in the pan?

Julia: Shooting star or a flash in the pan?

Talk given to Data Science, London in February, 2014

Malcolm Sherrington

February 11, 2014
Tweet

More Decks by Malcolm Sherrington

Other Decks in Programming

Transcript

  1. History " Gang of “four”: – Jeff Bezanson, Virah Shah

    – Stefan Karpinski, Alan Edelman " Started at MIT in 2010 " First release February, 2012 " Still actively maintained by G4 " MIT using Julia in courses (on youtube)
  2. The Red Queen’s Race 
 "Well, in our country," said

    Alice, still panting a little, "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing." "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!
  3. What happened to Ada? " Designed 1977/83 for US DoD

    in order to supercede 100’s of languages DoD used. " Mandated its use in 1987. " Dropped the mandate in 1997. " Still used in air traffic control systems such as iFacts, GNATS. " Nearest meetup group is in Stockholm.
  4. Runners and Riders Current field: 1. Runners: Matlab, R, Python

    2. Riders: C/C++, Java 3. Outsiders: Scala, Clojure 4. Non-starter: Perl
  5. What makes a good Data Science Language? (1) " Be

    a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks. " Be free, open-source and platform independent. " Be fast and efficient. " Have a good, well-designed library for scientific computing, including non-uniform random number generation and linear algebra. " Have a strong type system, and be statically typed with good compile-time type checking and type safety. " Have reasonable type inference. " Have a REPL for interactive use
  6. What makes a good Data Science Language? (2) " Have

    good tool support - including build tools, doc tools, testing tools, and an intelligent IDE. " Have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design " Allow imperative programming for occasions where it makes sense. " Be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications. " Have excellent built-in data capabilities. " Have comprehensive math and statistical routines.
  7. Comparison with Matlab " Julia syntax is similar to Matlab

    but its construction is purposely very different. " Matlab has only one data structure (the matrix) and is optimised for matrix operations. Other native computations can be very slow. " The focus on matrices lead to some important differences in MATLAB’s design compared to GP programming languages such as Julia. " Julia uses similar matrix syntax to Matlab but also incorporates list comprehensions.
  8. Comparison with R " Origins as open-source clone of S+.

    " Still seen as a “statistical” DSL. " R is single threaded and hard to speed up. " Introduced the data frame structure which is also present in Julia " Julia also has an RDatasets package. " R has very good graphic and data visualisation support. " Julia has a Google group: julia-stats. " Julia can call R modules using the Rif package.
  9. Comparison with Python " Python now seen by many as

    the Data Science language. " Strength lies in its community support. " Modules such as numpy, scipy, matplotlib and pandas are very powerful. " Speed up using PyPy " Mature frameworks such as Django " Julia approach is co-operation not confrontation via the PyCall and also IJulia ⬄ IPython
  10. What makes Julia special? " It is written in Julia,

    apart from a small core, and the code is available to look at. " The designers are data scientists and not tied to companies such as Google (Go) or Mozilla (Rust). " It has been designed for parallelism / distributed computation " It takes every opportunity to cooperate rather than confront. " Julia intends to combine the best from MATLAB, R and Python into one language that is to be consistent, well designed and fast.
  11. Special features • Easy installation • JIT compilation • Built-in

    package manager • Coroutines and green threads • Multiple dispatch • Dynamic type system • Meta programming with Lisp-like macros • Call C functions directly • Call Python functions: (PyCall) • Best-of-breed C and Fortran libraries • Unicode support
  12. The ones to read … " Parallel computing – http://

    julia.readthedocs/en/latest/manual/parallel-computing " Metaprogramming – http://docs.julialang.org/en/latest/manual/metaprogramming " Networking and streams – http://docs.julialang.org/en/latest/manual/networking-and-streams " Calling C and Fortran code – http:// julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code
  13. Modules and packages " Julia has its own built-in package

    manager " There are (now) 250+ packages. " These include: – Statistics – Graphics – System tools – Database – Web and Cloud – Simulation " Its quite easy to add your own package (via GITHub)
  14. 100+ contributors, 1000+ mailing list subscribers, 175+ packages AWS, ArgParse,

    BSplines, Benchmark, BinDeps, BioSeq, BloomFilters, Cairo, Calculus, Calendar, Cartesian, Catalan, ChainedVectors, ChemicalKinetics, Clang, Clp, ClusterManagers, Clustering, Codecs, CoinMP, Color, Compose, ContinuedFractions, Cpp, Cubature, Curl, DICOM, DWARF, DataFrames, DataStructures, Datetime, Debug, DecisionTree, Devectorize, DictUtils, DictViews, DiscreteFactor, Distance,Distributions, DualNumbers, ELF, Elliptic, Example, ExpressionUtils, FITSIO, FactCheck, FastaIO, FastaRead, FileFind, FunctionalCollections, FunctionalUtils, GLFW, GLM, GLPK, GLUT, GSL,GZip, Gadfly, Gaston, GeoIP, GeometricMCMC,GetC, GoogleCharts, Graphs, Grid, Gtk, Gurobi, HDF5, HDFS, HTTP, HTTPClient, Hadamard, HttpCommon, HttpParser, HttpServer,HypothesisTests, ICU, ImageView,Images, ImmutableArrays, IniFile, Iterators, Ito, JSON, JudyDicts, JuliaWebRepl, KLDivergence, LIBSVM, Languages, LazySequences, LibCURL, LibExpat, LinProgGLPK, Loss, MAT, MATLAB, MCMC, MDCT, MLBase,MNIST, MarketTechnicals, MathProg, MathProgBase, Meddle, Memoize, Meshes, Metis, MixedModels,Monads, Mongo, Mongrel2, Morsel, Mustache, NHST, NIfTI, NLopt, Named, NetCDF, NumericExtensions, NumericFunctors, ODBC, ODE, OpenGL, OpenSSL, Optim, Options, PLX, PTools, PatternDispatch, Phylo,Phylogenetics, Polynomial, Profile, ProgressMeter, ProjectTemplate, PyCall, PyPlot, PySide, Quandl,QuickCheck, RDatasets, REPL, RNGTest, RPMmd, RandomMatrices, Readline, Regression, Resampling, Rif, Rmath, RobustStats, Roots, SDE, SDL, SVM, SemidefiniteProgramming, SimJulia, SimpleMCMC, Sims,Sodium, Soundex, Sqlite, Stats, StrPack, Sundials, SymPy, TOML, Terminals, TextAnalysis, TextWrap, TimeModels, TimeSeries, Tk, TopicModels, TradingInstrument, Trie, URLParse, UTF16, Units,
  15. Julia does have graphics! " Winston (Standard 2D graphics) "

    Gadfly (Like 'gg2plot') " Gaston (Uses gnuplot as graphics engine) " PyPlot (Uses IPython/matplotlib.py) " Plotly (http://plot.ly/api)
  16. What’s missing? " Cached package loading – At present all

    modules are compiled on the fly – Preloading would reduce startup times " Better database connectivity – Uses ODBC – Simple d/b support via SQLite – No native Oracle, MySQL or Postgresql " More comprehensive NoSQL support – Packages for Mongo, Redis. – JSON package helps with CouchDB, Neo4j
  17. Familiar syntax for Matlab/Octave users function randmatstat (t; n=10) v

    = zeros(t) w = zeros(t) for i = 1:t a = randn(n,n) b = randn(n,n) c = randn(n,n) d = randn(n,n) P = [a b c d] Q = [a b; c d] v[i] = trace((P'*P)^4) w[i] = trace((Q'*Q)^4) end std(v)/mean(v), std(w)/mean(w) end
  18. You can also write low-level code function qsort!(a,lo,hi) i, j

    = lo, hi while i < hi pivot = a[(lo+hi)>>>1] while i <= j while a[i] < pivot; i = i+1; end while a[j] > pivot; j = j-1; end if i <= j a[i], a[j] = a[j], a[i] i, j = i+1, j-1 end end if lo < j; qsort!(a,lo,j); end lo, j = i, hi end return end
  19. Going further … " Start with the julia.org website "

    Install Julia and read the documentation " Look at the training material – http://julialang.org/teaching/ " Try the Julia Studio " Read/subscribe to Google-groups sites – julia-users, julia-stats, julia-opt, julia-dev " Join the LJuUG – http://www.meetup.com/London-Julia-User-Group
  20. Useful links London Julia Users Group http://londonjulia.org Julia (main) site

    http://julialang.org Julia (docs) site http://docs.julialang.org Forio: Julia Studio http://forio.com/products/julia-studio