$30 off During Our Annual Pro Sale. View Details »

speakerdeck-rendering-bug

Fernando Perez
July 12, 2014
370

 speakerdeck-rendering-bug

This is a re-upload of an older presentation:

https://speakerdeck.com/fperez/1204-biofrontiers-boulder

that shows how recent changes to SpeakerDeck's rasterizer algorithm have sharply degraded the quality of embedded PNGs to the point of making them unusable.

Fernando Perez

July 12, 2014
Tweet

More Decks by Fernando Perez

Transcript

  1. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    The scientific Python ecosystem: open source tools for
    better computing in science
    Fernando Pérez
    http://fperez.org
    [email protected]
    Helen Wills Neuroscience Institute, UC Berkeley
    BioFrontiers, CU Boulder
    April 2, 2012

    View Slide

  2. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 2 / 54

    View Slide

  3. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 3 / 54

    View Slide

  4. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Computing: part of the DNA of science
    Much more than “the third branch” of science
    An avalanche of experimental quantitative data
    Biology, genetics, neuroscience, astronomy, climate modeling...
    All scientists must now do real computing
    “Big Data”, “Cloud computing”, etc: lots of buzzwords...
    They will NOT automatically produce good science
    Good computing is now a necessary (though not sufficient!) condition
    for good science.
    The rigor, openness, culture of validation, collaboration and other
    aspects of science must also become part of scientific computing.
    FP (UC Berkeley) Python for science 4/2/12 4 / 54

    View Slide

  5. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Computing: part of the DNA of science
    Much more than “the third branch” of science
    An avalanche of experimental quantitative data
    Biology, genetics, neuroscience, astronomy, climate modeling...
    All scientists must now do real computing
    “Big Data”, “Cloud computing”, etc: lots of buzzwords...
    They will NOT automatically produce good science
    Good computing is now a necessary (though not sufficient!) condition
    for good science.
    The rigor, openness, culture of validation, collaboration and other
    aspects of science must also become part of scientific computing.
    FP (UC Berkeley) Python for science 4/2/12 4 / 54

    View Slide

  6. Not all clouds are necessarily good...

    View Slide

  7. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    A crisis of credibility and real issues
    The Duke clinical trials scandal - Potti/Nevin
    A compounding of (common and otherwise) data analysis errors.
    No materials allowing validation/reproduction of results.
    Patients were harmed.
    Lawsuits, resignations.
    Major policy reviews and changes: NCI, IOM, ...
    More: see K. Baggerly’s "starter set" page.
    The Duke situation is more common than we’d like to believe!
    Begley & Ellis, Nature, 3/28/12: Drug development: Raise standards
    for preclinical cancer research.
    47 out of 53 “landmark papers” could not be replicated.
    Nature, Feb 2012, Ince et al: The case for open computer programs
    “The scientific community places more faith in computation than is
    justified”
    “anything less than the release of actual source code is an indefensible
    approach for any scientific results that depend on computation”
    FP (UC Berkeley) Python for science 4/2/12 6 / 54

    View Slide

  8. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    A crisis of credibility and real issues
    The Duke clinical trials scandal - Potti/Nevin
    A compounding of (common and otherwise) data analysis errors.
    No materials allowing validation/reproduction of results.
    Patients were harmed.
    Lawsuits, resignations.
    Major policy reviews and changes: NCI, IOM, ...
    More: see K. Baggerly’s "starter set" page.
    The Duke situation is more common than we’d like to believe!
    Begley & Ellis, Nature, 3/28/12: Drug development: Raise standards
    for preclinical cancer research.
    47 out of 53 “landmark papers” could not be replicated.
    Nature, Feb 2012, Ince et al: The case for open computer programs
    “The scientific community places more faith in computation than is
    justified”
    “anything less than the release of actual source code is an indefensible
    approach for any scientific results that depend on computation”
    FP (UC Berkeley) Python for science 4/2/12 6 / 54

    View Slide

  9. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    A crisis of credibility and real issues
    The Duke clinical trials scandal - Potti/Nevin
    A compounding of (common and otherwise) data analysis errors.
    No materials allowing validation/reproduction of results.
    Patients were harmed.
    Lawsuits, resignations.
    Major policy reviews and changes: NCI, IOM, ...
    More: see K. Baggerly’s "starter set" page.
    The Duke situation is more common than we’d like to believe!
    Begley & Ellis, Nature, 3/28/12: Drug development: Raise standards
    for preclinical cancer research.
    47 out of 53 “landmark papers” could not be replicated.
    Nature, Feb 2012, Ince et al: The case for open computer programs
    “The scientific community places more faith in computation than is
    justified”
    “anything less than the release of actual source code is an indefensible
    approach for any scientific results that depend on computation”
    FP (UC Berkeley) Python for science 4/2/12 6 / 54

    View Slide

  10. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Related changes: Open *
    Internet: interactions for humans, code and data
    Open Source Software
    development akin to scientific culture
    viable alternatives to proprietary software
    tools and lessons for improving the scientific process: Github
    Open Access
    thecostofknowledge.org: Elsevier boycott
    FRPAA House hearing on March 29th.
    Open Education
    MIT Open Courseware, Khan Academy...
    Stanford CS 221 in fall 2011: ~160,000 students.
    Spring 2012:
    Sebastian Thrun leaves Stanford: Udacity.
    Stanford: Coursera.
    MITx, TED-Ed...
    FP (UC Berkeley) Python for science 4/2/12 7 / 54

    View Slide

  11. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Related changes: Open *
    Internet: interactions for humans, code and data
    Open Source Software
    development akin to scientific culture
    viable alternatives to proprietary software
    tools and lessons for improving the scientific process: Github
    Open Access
    thecostofknowledge.org: Elsevier boycott
    FRPAA House hearing on March 29th.
    Open Education
    MIT Open Courseware, Khan Academy...
    Stanford CS 221 in fall 2011: ~160,000 students.
    Spring 2012:
    Sebastian Thrun leaves Stanford: Udacity.
    Stanford: Coursera.
    MITx, TED-Ed...
    FP (UC Berkeley) Python for science 4/2/12 7 / 54

    View Slide

  12. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Related changes: Open *
    Internet: interactions for humans, code and data
    Open Source Software
    development akin to scientific culture
    viable alternatives to proprietary software
    tools and lessons for improving the scientific process: Github
    Open Access
    thecostofknowledge.org: Elsevier boycott
    FRPAA House hearing on March 29th.
    Open Education
    MIT Open Courseware, Khan Academy...
    Stanford CS 221 in fall 2011: ~160,000 students.
    Spring 2012:
    Sebastian Thrun leaves Stanford: Udacity.
    Stanford: Coursera.
    MITx, TED-Ed...
    FP (UC Berkeley) Python for science 4/2/12 7 / 54

    View Slide

  13. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Related changes: Open *
    Internet: interactions for humans, code and data
    Open Source Software
    development akin to scientific culture
    viable alternatives to proprietary software
    tools and lessons for improving the scientific process: Github
    Open Access
    thecostofknowledge.org: Elsevier boycott
    FRPAA House hearing on March 29th.
    Open Education
    MIT Open Courseware, Khan Academy...
    Stanford CS 221 in fall 2011: ~160,000 students.
    Spring 2012:
    Sebastian Thrun leaves Stanford: Udacity.
    Stanford: Coursera.
    MITx, TED-Ed...
    FP (UC Berkeley) Python for science 4/2/12 7 / 54

    View Slide

  14. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 8 / 54

    View Slide

  15. #1: Adaptive, multiwavelet PDE tools
    Gregory Beylkin, Vani Cheruvu, FP. Applied Math, CU Boulder.
    Fast application of integral kernels. (Partial Differential Equations)
    Implementation went from 1 to 3 dimensions directly (extremely
    unusual).
    Complex algorithm: beyond pure numerics.
    Very good performance, thanks to NumPy, F2PY and weave.
    Dynamically generated C++ sources: code as a run-time resource.
    Nnod
    = 10, ǫ = 1.0e − 10, Nblocks
    = 445

    View Slide

  16. #2: Mining the literature on macaque brain connectivity
    Mark D’Esposito, Rob Blumenfeld, Daniel Bliss, FP; UC Berkeley.
    Anatomical brain connectivity experiments: difficult and expensive.
    Invaluable dataset in a web server.

    View Slide

  17. From messy data to graph descriptions
    Programatically query web server.
    Parse XML into rich graphs.
    NetworkX graph library: Aric Hagberg at LANL.
    24d
    8A
    24a
    47/12
    24b
    BL
    BM
    PGOp
    MST
    V3D
    TPPro
    PEA
    PEC
    25
    9/46D
    24c
    TEO
    46
    TEA
    OPAL
    PGM 8AV
    1
    V1
    V2
    2
    PG
    TH
    PECg
    TF
    PaI
    3a
    8AD
    PO
    ST1
    PaAR
    13a
    AA
    ST2
    FST
    CL
    TPO
    9/46V
    PE
    Me
    CE
    TPt
    3b
    TE1
    TE2
    TE3
    ST3
    TEM
    11
    10
    13
    IPA
    OPro
    14
    PFG
    8B
    32
    31
    PGA
    OPt
    6M
    PAAC
    PFOp
    La
    45A
    45B
    44
    40 30 20 10 0 10 20
    0
    10
    20
    30
    40 Full Graph - Sagittal view

    View Slide

  18. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 12 / 54

    View Slide

  19. Beyond (floating point) number crunching
    Hardware
    floating point
    Arbitrary precision
    integers
    Rationals
    Interval arithmetic
    Symbolic manipulation
    FORTRAN
    Extended precision
    floating point
    Text processing
    Databases
    Graphical user
    interfaces
    Web interfaces
    Hardware
    control
    Multi-language
    integration
    Data formats: HDF5, XML, ...

    View Slide

  20. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Python in this context
    Open Source, free, highly portable.
    Extremely readable: “executable pseudo-code”.
    Simple: “fits your brain”.
    Rich types and library: “batteries included”
    Easy to wrap C, C++ and FORTRAN.
    NumPy: IDL/Matlab-like arrays.
    FP (UC Berkeley) Python for science 4/2/12 14 / 54

    View Slide

  21. The scientific Python ecosystem (incomplete view)
    IPython
    NetworkX

    View Slide

  22. NumPy: the foundation for array processing
    A flexible, efficient, multidimensional array object.
    Convenient syntax: c = a+b.
    Math library that operates on arrays: y = sin(k*t).
    Basic scientific functionality:
    Linear algebra
    FFTs
    Random number generation

    View Slide

  23. SciPy: numerical algorithms galore
    linalg : Linear algebra routines (including BLAS/LAPACK)
    sparse : Sparse Matrices (including UMFPACK, ARPACK,...)
    fftpack : Discrete Fourier Transform algorithms
    cluster : Vector Quantization / Kmeans
    odr : Orthogonal Distance Regression
    special : Special Functions (Airy, Bessel, etc).
    stats : Statistical Functions
    optimize : Optimization Tools
    maxentropy : Routines for fitting maximum entropy models
    integrate : Numerical Integration routines
    ndimage : n-dimensional image package
    interpolate : Interpolation Tools
    signal : Signal Processing Tools
    io : Data input and output
    Lots more...

    View Slide

  24. Matplotlib: high-quality data visualization

    View Slide

  25. MayaVi: 3d visualization (VTK)

    View Slide

  26. FluidLab: a MayaVi based CFD visualization tool
    K. Julien, P. Schmitt (now NCAR), B. Barrow, F. Pérez (App. Math, CU).

    View Slide

  27. Sympy: symbolic and multiprecision computing

    View Slide

  28. NetworkX: tools for complex networks
    Aric Hagberg, Pieter Swart et. al., Los Alamos Theory Division

    View Slide

  29. Scikits Learn: (easy to use) machine learning
    3 2 1 0 1 2 3
    x
    3
    2
    1
    0
    1
    2
    3
    y
    True Independent Sources
    3 2 1 0 1 2 3
    x
    3
    2
    1
    0
    1
    2
    3
    y
    Observations
    PCA
    ICA
    3 2 1 0 1 2 3
    x
    3
    2
    1
    0
    1
    2
    3
    y
    PCA scores
    3 2 1 0 1 2 3
    x
    3
    2
    1
    0
    1
    2
    3
    y
    FastICA on 2D point clouds
    4 5 6 7 8
    1.5
    2.0
    2.5
    3.0
    3.5
    4.0
    4.5
    Linear Discr. Analysis
    versicolor
    virginica
    4 5 6 7 8
    1.5
    2.0
    2.5
    3.0
    3.5
    4.0
    4.5
    Quadratic Discr. Analysis
    versicolor
    virginica
    6 4 2 0 2 4 6
    6
    4
    2
    0
    2
    4
    6
    SVM with non-linear kernel (RBF)
    inliers
    outliers
    SVM: Weighted samples

    View Slide

  30. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 24 / 54

    View Slide

  31. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    IPython: Interactive Scientific Computing
    A CU Boulder project
    Started when I was a graduate student in Physics (2001).
    Continued as a postdoc in Applied Mathematics.
    Brian Granger: CU Physics.
    In brief
    1 A better Python shell
    2 Embeddable Kernel and powerful interactive clients
    1 Terminal
    2 Qt console
    3 Web notebook
    3 Flexible parallel computing
    FP (UC Berkeley) Python for science 4/2/12 25 / 54

    View Slide

  32. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    IPython: Interactive Scientific Computing
    A CU Boulder project
    Started when I was a graduate student in Physics (2001).
    Continued as a postdoc in Applied Mathematics.
    Brian Granger: CU Physics.
    In brief
    1 A better Python shell
    2 Embeddable Kernel and powerful interactive clients
    1 Terminal
    2 Qt console
    3 Web notebook
    3 Flexible parallel computing
    FP (UC Berkeley) Python for science 4/2/12 25 / 54

    View Slide

  33. IPython: Matlab/IDL-like interactive use

    View Slide

  34. Qt console: inline plots, html, multiline editing, ...
    Evan Patterson (Enthought)

    View Slide

  35. Microsoft Visual Studio 2010 integrated console
    Dino Viehland and Shahrokh Mortazavi; http://pytools.codeplex.com

    View Slide

  36. Browser-based notebook: rich text, code, plots, ...
    Brian Granger, James Gao (Berkeley), rest of the team

    View Slide

  37. Interactive and high-level parallel APIs
    Min Ragan-Kelley, Brian Granger

    View Slide

  38. A mid-size project by now

    View Slide

  39. Other projects using IPython
    Scientific
    EPD: Enthought Python Distribution.
    Sage: open source mathematics.
    PyRAF: Space Telescope Science Institute
    CASA: Nat. Radio Astronomy Observatory
    Ganga: CERN
    PyMAD: neutron spectrom., Laue Langevin
    Sardana: European Synchrotron Radiation
    ASCEND: eng. modeling (Carnegie Mellon).
    JModelica: dynamical systems.
    DASH: Denver Aerosol Sources and Health.
    Trilinos: Sandia National Lab.
    DoD: baseline configuration.
    Mayavi: 3d visualization, Enthought.
    NiPype: computational pipelines, MIT.
    PyIMSL Studio, by Visual Numerics.
    ...
    Web/Other
    Visual Studio 2010: MS.
    Django.
    Turbo Gears.
    Pylons web framework
    Zope and Plone CMS.
    Axon Shell, BBC
    Kamaelia.
    Schevo database.
    Pitz: distributed
    task/bug tracking.
    iVR (interactive Virtual
    Reality).
    Movable Python
    (portable Python
    environment).
    ...

    View Slide

  40. Support
    Enthought, Austin, TX: Lots!
    Tech-X Corporation, Boulder, CO: Parallel/notebook (previous
    versions)
    Microsoft: WinHPC support, Visual Studio integration
    NIH: via NiPy grant
    NSF: via Sage compmath grant
    Google: summer of code 2005, 2010.
    DoD/HPTi.

    View Slide

  41. (Incomplete) Cast of Characters
    Brian Granger - Cal State San Luis Obispo Physics
    Min Ragan-Kelley - UC Berkeley Nuclear engineering.
    Thomas Kluyver - U. Sheffield Plant biology
    Jörgen Stenarson - SP Technical Research Institute of Sweden
    Paul Ivanov - UC Berkeley neuroscience
    Robert Kern - Enthought
    Evan Patterson - Caltech Physics/Enthought
    Stefan van der Walt - UC Berkeley
    John Hunter - TradeLink Securities, Chicago.
    Prabhu Ramachandran - Aerospace Engineering, IIT Bombay
    Satra Ghosh- MIT Neuroscience
    Gaël Varoquaux - Neurospin (Orsay, France)
    Ville Vainio - CS, Tampere University of Technology, Finland
    Barry Wark - Neuroscience, U. Washington.
    Ondrej Certik - Physics, U Nevada Reno
    Darren Dale - Cornell
    Justin Riley - MIT
    Mark Voorhies - UC San Francisco
    Nicholas Rougier - INRIA Nancy Grand Est
    Thomas Spura - Fedora project
    Julian Taylor - Debian/Ubuntu
    Many more! (~140 commit authors)

    View Slide

  42. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 35 / 54

    View Slide

  43. What does it take to get reproducible research results?
    Reproducible research practices!
    Reproducibility at publication time?
    It’s already too late.
    Learn from a community (open source) where
    reproducibility is an everyday practice
    (by necessity)

    View Slide

  44. What does it take to get reproducible research results?
    Reproducible research practices!
    Reproducibility at publication time?
    It’s already too late.
    Learn from a community (open source) where
    reproducibility is an everyday practice
    (by necessity)

    View Slide

  45. What does it take to get reproducible research results?
    Reproducible research practices!
    Reproducibility at publication time?
    It’s already too late.
    Learn from a community (open source) where
    reproducibility is an everyday practice
    (by necessity)

    View Slide

  46. FOSS better than scientific research?
    FOSS: Free and Open Source Software
    Public distributed version control: provenance tracking

    View Slide

  47. Pull requests: ongoing peer review

    View Slide

  48. Pull requests: back and forth discussion

    View Slide

  49. Branches: exploratory work with control

    View Slide

  50. Automated tests: validation
    The IPython build Dashboard: immediate feedback

    View Slide

  51. Public bug trackers

    View Slide

  52. Versioned science
    Git: the tool you didn’t know you needed
    Reproducibility?
    Tracking and recreating every step of your work
    In the software world: it’s called Version Control!
    Git: an enabling technology. Use version control for everything
    Paper/grant writing (never get paper_v5_john.tex by email again!)
    Everyday research: track your results
    Teaching (never accept an emailed homework assignment again!)

    View Slide

  53. Versioned science
    Git: the tool you didn’t know you needed
    Reproducibility?
    Tracking and recreating every step of your work
    In the software world: it’s called Version Control!
    Git: an enabling technology. Use version control for everything
    Paper/grant writing (never get paper_v5_john.tex by email again!)
    Everyday research: track your results
    Teaching (never accept an emailed homework assignment again!)

    View Slide

  54. Git for running a course?

    View Slide

  55. One student’s work

    View Slide

  56. Details

    View Slide

  57. Benefits of teaching with Git
    Automatic timestamping of all work.
    Distributed backup: the dog can not eat their homework!
    They can work from any computer.
    Easy downloading of all class materials without a million clicks.
    The end of the email attachment madness.
    Version control as an natural tool, as common as email.

    View Slide

  58. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Outline
    1 Changes in Science & Computing
    2 Two vignettes
    3 Scientific Python
    4 IPython
    5 Lessons from the open source world
    6 Where is this going?
    FP (UC Berkeley) Python for science 4/2/12 48 / 54

    View Slide

  59. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    A brief demo of the IPython notebook
    FP (UC Berkeley) Python for science 4/2/12 49 / 54

    View Slide

  60. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    IPython and the lifecycle of scientific ideas
    Individual exploration
    Collaboration
    “Google docs with a brain”
    Large-scale parallel production work
    IPython notebook on Amazon EC2: MIT’s StarCluster
    Publication
    Generation of HTML/PDF/EPub...
    “Executable papers”
    Education
    Workshops and bootcamps (UC Berkeley, elsewhere)
    FP (UC Berkeley) Python for science 4/2/12 50 / 54

    View Slide

  61. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    The executable paper: Titus Brown (MSU), 3/21/12
    http://arxiv.org/abs/1203.4802
    FP (UC Berkeley) Python for science 4/2/12 51 / 54

    View Slide

  62. Changes Two vignettes Scientific Python IPython Lessons from the open source world Where is this going?
    Titus’ IPython notebook, runs on Amazon Cloud
    FP (UC Berkeley) Python for science 4/2/12 52 / 54

    View Slide

  63. Next steps...
    IPython
    Executable examples in books (with a large US publisher)
    A full book on brain imaging and statistics (JB Poline - Neurospin).
    DoD - classic HPC environments.
    Notebook: a format beyond Python (R, matlab, etc...)
    UK: Python in education and the Raspberry Pi.
    Numfocus.org: a foundation
    interface with industry.
    support open source scientific Python
    produce educational materials
    Github.com: collaborations on ’versioned science’.

    View Slide

  64. Things are changing...
    Journal policies...
    Funding agencies...
    Needs of everyday science...
    So we must also change:
    Improve our computational praxis
    Better educate our students
    Acknowledge computational work alongside other metrics of academic
    work.

    View Slide