Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning, teaching, hacking: the journey of a s...

Learning, teaching, hacking: the journey of a scikit-image contributor

Python San Sebastian 2015 keynote

Emmanuelle Gouillart

October 25, 2015
Tweet

More Decks by Emmanuelle Gouillart

Other Decks in Science

Transcript

  1. Learning, teaching, hacking The journey of a scikit-image contributor Emmanuelle

    Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart
  2. My first experience of programming... >>> cd new experiment >>>

    a c q u i r e t e m p e r a t u r e () >>> name exp = ’ convection ’ >>> c o n t r o l p a r a m e t e r () >>> ... and o t h e r magical s p e l l s x
  3. My first experience of programming... >>> cd new experiment >>>

    a c q u i r e t e m p e r a t u r e () >>> name exp = ’ convection ’ >>> c o n t r o l p a r a m e t e r () >>> ... and o t h e r magical s p e l l s x
  4. Python African Tour Dakar 2009 Project started by Kamon Ayeva

    Python for IT students (web development, etc.) Scientific Python for engineering/science students
  5. Euroscipy conferences Every August: Leipzig, Paris, Brussels, Cambridge Next time

    : Erlangen 2 days of tutorials, beginners and advanced 2 days of conference Help from volunteers always welcome!
  6. Scipy lecture notes Train a lot of people: need tools

    that scale Several weeks of tutorials! Beginners: the core of Scientific Python Advanced: learn more tricks Packages: specific applications and packages Developed and used for Euroscipy conferences Curated and enriched over the years
  7. Docstrings now and then docstring in 2008 D e f

    i n i t i o n : np. d i f f (a, n=1, a x i s =-1) D o c s t r i n g : C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along g i v e n a x i s . x
  8. Docstrings now and then D e f i n i

    t i o n : np. d i f f (a, n=1, a x i s =-1) Docstring : C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along given a x i s . The f i r s t o r d e r d i f f e r e n c e i s given by ‘‘ out [n] = a[n+1] - a[n]‘‘ along the given axis , h i g h e r o r d e r d i f f e r e n c e s are c a l c u l a t e d by using ‘ d i f f ‘ r e c u r s i v e l y . Parameters ---------- a : a r r a y l i k e Input a r r a y n : int , o p t i o n a l The number of times v a l u e s are d i f f e r e n c e d . a x i s : int , o p t i o n a l The a x i s along which the d i f f e r e n c e i s taken , d e f a u l t i s the l a s t a x i s . Returns ------- d i f f : ndarray The ‘n‘ o r d e r d i f f e r e n c e s . The shape of the output i s the same as ‘a‘ except along ‘ axis ‘ where the dimension i s s m a l l e r by ‘n‘. See Also -------- gradient , e d i f f 1 d , cumsum Examples -------- >>> x = np. a r r a y ([1, 2, 4, 7, 0]) >>> np. d i f f (x) a r r a y ([ 1, 2, 3, -7]) >>> np. d i f f (x , n=2) a r r a y ([ 1, 1, -10]) much better now! Parameters and their type Suggestion of other functions Simple example
  9. pydocweb and NumPy documentation Marathon Tools by Pauli Virtnanen, with

    enthusiastic cheering from my side Documentation effort led by St´ efan van der Walt Easy as Wikipedia A wiki to improve the docs We didn’t have Github!
  10. NumPy documentation standard https://github.com/numpy/numpy/blob/master/doc/example.py def foo ( var1 , var2

    , long var name = ’ hi ’) : r”””A one−line summary that does not use variable names or the function name. Several sentences providing an extended description . Refer to variables using back−ticks , e . g . ‘var ‘ . Parameters − − − − − − − − − − var1 : array like Array like means all those objects − − lists , nested lists , etc . − − that can be converted to an array . We can also refer to variables like ‘var1 ‘ . var2 : int The type above can either refer to an actual Python type (e . g . ‘ ‘ int ‘ ‘) , or describe the type of the variable in more detail , e . g . ‘ ‘(N,) ndarray ‘ ‘ or ‘ ‘ array like ‘ ‘ . Long variable name : {’ hi ’ , ’ho ’} , optional Choices in brackets , default f i r s t when optional . Returns − − − − − − − type Explanation of anonymous return value of type ‘ ‘type ‘ ‘ . describe : type Explanation of return value named ‘ describe ‘ . out : type Explanation of ‘out ‘ . Other Parameters − − − − − − − − − − − − − − − − only seldom used keywords : type Explanation common parameters listed above : type Explanation
  11. Outcome and impact of documentation marathon # of words in

    Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs
  12. Outcome and impact of documentation marathon # of words in

    Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs
  13. A shortage of developers Fernando Perez & Aaron Meurer Gist

    5843625 A low bus factor A few people do most of the work
  14. A flood of images hundreds of terabytes of scientific data

    for scientific experiment http://sdo.gsfc.nasa.gov/
  15. A flood of images hundreds of terabytes of scientific data

    for scientific experiment http://sdo.gsfc.nasa.gov/ Image processing Manipulating images in order to retrieve new images or image characteristics (features, measurements, ...) Often combined with machine learning
  16. What is scikit-image? An open-source (BSD) generic image processing library

    for the Python language (and NumPy data arrays)
  17. What is scikit-image? An open-source (BSD) generic image processing library

    for the Python language (and NumPy data arrays) for 2D & 3D images simple API & gentle learning curve
  18. Manipulating images as numerical (numpy) arrays Pixels are arrays elements

    import numpy as np image = np. ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x
  19. Manipulating images as numerical (numpy) arrays Pixels are arrays elements

    import numpy as np image = np. ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x >>> coffee.shape (400, 600, 3) >>> red channel = coffee[..., 0] >>> image 3d = np.ones((100, 100, 100))
  20. NumPy-native: images as NumPy arrays NumPy arrays as arguments and

    outputs >>> from skimage import io , f i l t e r s >>> c a m e r a a r r a y = i o . imread ( ’ camera image . png ’ ) >>> type( c a m e r a a r r a y ) <type ’numpy . ndarray ’ > >>> c a m e r a a r r a y . dtype dtype ( ’ uint8 ’ ) >>> f i l t e r e d a r r a y = f i l t e r s . g a u s s i a n f i l t e r ( camera array , sigma =5) >>> type( f i l t e r e d a r r a y ) <type ’numpy . ndarray ’ > >>> import m a t p l o t l i b . p y p l o t as p l t >>> p l t .imshow( f i l t e r e d a r r a y , cmap= ’ gray ’ ) x
  21. An API relying mostly on functions skimage . f i

    l t e r s . g a u s s i a n f i l t e r (image , sigma , output = None, mode= ’ n e a r e st ’ , c v a l =0, m u l t i c h a n n e l =None) Multi - d i m e n s i o n a l Gaussian filter Parameters ---------- image : array - l i k e input image ( g r a y s c a l e or c o l o r ) to filter. sigma : s c a l a r or sequence of s c a l a r s st and ard d e v i a t i o n f o r Gaussian k e r n e l . The st and ard d e v i a t i o n s of the Gaussian filter are g i v e n f o r each a x i s as a sequence , or as a s i n g l e number , in which case i t i s equal f o r all axes . output : array , o p t i o n a l The ‘‘ output ‘‘ parameter p a s s e s an a r r a y in which to s t o r e the filter output . mode : { ’ r e f l e c t ’ , ’ constant ’ , ’ n ea re st ’ , ’ mirror ’ , ’ wrap ’ }, o p t i o n a l One filter = one function Use keyword argument for parameter tuning
  22. Denoising tomography images In-situ imaging of phase separation in silicate

    melts From basic (generic) to advanced (specific) filters
  23. Denoising tomography images Histogram of pixel values From basic (generic)

    to advanced (specific) filters bilateral = restoration . denoise bilateral (dat) bilateral = restoration . denoise bilateral (dat, sigma range=2.5, sigma spatial=2) tv = restoration . denoise tv chambolle (dat, weight=0.5)
  24. Mathematical morphology skimage.morphology: binary + grayscale morphology dilation, erosion, closing,

    opening several structural elements remove small objects watershed
  25. Feature extraction followed by classification Combining scikit-image and scikit-learn Extract

    features (skimage.feature) Pixels intensity values (R, G, B) Local gradients More advanced descriptors: HOGs, Gabor, ... Train classifier with known regions here, random forest classifier Classify pixels
  26. Versatile use for 2D, 2D-RGB, 3D... >>> from skimage import

    measure >>> l a b e l s 2 d = measure . l a b e l ( image 2d ) >>> l a b e l s 3 d = measure . l a b e l ( image 3d ) x
  27. Datasheet Package statistics http://scikit-image.org/ Release 0.11 (1 - 2 release

    per year) Among 1000 best ranked packages on PyPi Development model Mature algorithms Only Python + Cython code for easier maintainability Focus on good practices: testing, documentation, version control Hosted on GitHub: thorough code reivew + continuous integration Core team of 5 − 10 persons (close to applications)
  28. Achieving a sustainable growth Balance users’ and contributors’ goals: robustness

    and smooth learning curve vs cool factor and bleeding-edge tools Feature development should not be faster than quality improvement Documentation and training for users Low entry barriers for contributors
  29. Massive data processing and parallelization Competitive environment: some other tools

    use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable
  30. joblib: easy simple parallel computing + lazy re-evaluation >>> from

    skimage import data <>> hubble = data . h u b b l e d e e p f i e l d () >>> width = 10 >>> p i c s = u t i l . view as windows ( hubble , ( width , hubble . shape [1], hubble . shape [2]) , s t e p = width ) >>> from j o b l i b import P a r a l l e l , d e l a y e d >>> # task is an image processing function >>> P a r a l l e l ( n j o b s =4)( d e l a y e d ( t a s k )( p i c ) f o r p i c in p i c s ) x
  31. joblib: easy simple parallel computing + lazy re-evaluation Familiar with

    this mess? from skimage import f i l t e r s # Comment to save some time # filter_im = filters.median(im) # binary_im = filters. threshold_otsu (filter_im) v a l u e s = np. unique (im) x
  32. joblib: easy simple parallel computing + lazy re-evaluation Familiar with

    this mess? from skimage import f i l t e r s # Comment to save some time # filter_im = filters.median(im) # binary_im = filters. threshold_otsu (filter_im) v a l u e s = np. unique (im) x >>> from j o b l i b import Memory >>> mem = Memory( c a c h e d i r = ’ /tmp/ j o b l i b ’ ) >>> square = mem. cache (np. square ) >>> b = square (a) [Memory] C a l l i n g square ... square ( a r r a y ([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) s q u a r e - 0... s , 0.0 min >>> c = square (a) >>> # The above call did not trigger an evaluation x
  33. A platform to build an ecosystem upon Tool for users,

    platform for other tools $ apt-cache rdepends python-matplotlib ... 96 Python packages & applications Specific applications that could build on scikit-image Imaging techniques; microscopy, tomography, ... Fields: cell biology, astronomy, ... Requirements: stable API, good docs
  34. No need to be a programming genius to contribute to

    OSS Social and pedagogical skills useful and welcome You will learn a lot and make friends. P. Hintjens
  35. No need to be a programming genius to contribute to

    OSS Social and pedagogical skills useful and welcome You will learn a lot and make friends. P. Hintjens Try it out! http://scikit-image.org/ Feedback welcome github.com/scikit-image/scikit-image Please cite the paper Let’s talk about scikit-image @EGouillart