Slide 1

Slide 1 text

Learning, teaching, hacking The journey of a scikit-image contributor Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart

Slide 2

Slide 2 text

Introducing myself: researcher and geek

Slide 3

Slide 3 text

Introducing myself: researcher and geek

Slide 4

Slide 4 text

Introducing myself: researcher and geek

Slide 5

Slide 5 text

The Scientific Python ecosystem Signal processing Specialized modules Visualization Interpreters and IDEs

Slide 6

Slide 6 text

The Scientific Python ecosystem Integrated distributions Signal processing Specialized modules Visualization Interpreters and IDEs

Slide 7

Slide 7 text

The long winding road...

Slide 8

Slide 8 text

1. Teaching and documentation

Slide 9

Slide 9 text

My first experience of programming...

Slide 10

Slide 10 text

My first experience of programming... >>> cd new experiment >>> a c q u i r e t e m p e r a t u r e () >>> name exp = ’ convection ’ >>> c o n t r o l p a r a m e t e r () >>> ... and o t h e r magical s p e l l s x

Slide 11

Slide 11 text

My first experience of programming... >>> cd new experiment >>> a c q u i r e t e m p e r a t u r e () >>> name exp = ’ convection ’ >>> c o n t r o l p a r a m e t e r () >>> ... and o t h e r magical s p e l l s x

Slide 12

Slide 12 text

Python African Tour Dakar 2009 Project started by Kamon Ayeva Python for IT students (web development, etc.) Scientific Python for engineering/science students

Slide 13

Slide 13 text

Euroscipy conferences Every August: Leipzig, Paris, Brussels, Cambridge Next time : Erlangen 2 days of tutorials, beginners and advanced 2 days of conference Help from volunteers always welcome!

Slide 14

Slide 14 text

Is academic teaching taking over? Engineering departments are still lagging behind.

Slide 15

Slide 15 text

Scipy lecture notes Train a lot of people: need tools that scale Several weeks of tutorials! Beginners: the core of Scientific Python Advanced: learn more tricks Packages: specific applications and packages Developed and used for Euroscipy conferences Curated and enriched over the years

Slide 16

Slide 16 text

Docstrings now and then docstring in 2008 D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1) D o c s t r i n g : C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along g i v e n a x i s . x

Slide 17

Slide 17 text

Docstrings now and then D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1) Docstring : C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along given a x i s . The f i r s t o r d e r d i f f e r e n c e i s given by ‘‘ out [n] = a[n+1] - a[n]‘‘ along the given axis , h i g h e r o r d e r d i f f e r e n c e s are c a l c u l a t e d by using ‘ d i f f ‘ r e c u r s i v e l y . Parameters ---------- a : a r r a y l i k e Input a r r a y n : int , o p t i o n a l The number of times v a l u e s are d i f f e r e n c e d . a x i s : int , o p t i o n a l The a x i s along which the d i f f e r e n c e i s taken , d e f a u l t i s the l a s t a x i s . Returns ------- d i f f : ndarray The ‘n‘ o r d e r d i f f e r e n c e s . The shape of the output i s the same as ‘a‘ except along ‘ axis ‘ where the dimension i s s m a l l e r by ‘n‘. See Also -------- gradient , e d i f f 1 d , cumsum Examples -------- >>> x = np. a r r a y ([1, 2, 4, 7, 0]) >>> np. d i f f (x) a r r a y ([ 1, 2, 3, -7]) >>> np. d i f f (x , n=2) a r r a y ([ 1, 1, -10]) much better now! Parameters and their type Suggestion of other functions Simple example

Slide 18

Slide 18 text

pydocweb and NumPy documentation Marathon Tools by Pauli Virtnanen, with enthusiastic cheering from my side Documentation effort led by St´ efan van der Walt Easy as Wikipedia A wiki to improve the docs We didn’t have Github!

Slide 19

Slide 19 text

NumPy documentation standard https://github.com/numpy/numpy/blob/master/doc/example.py def foo ( var1 , var2 , long var name = ’ hi ’) : r”””A one−line summary that does not use variable names or the function name. Several sentences providing an extended description . Refer to variables using back−ticks , e . g . ‘var ‘ . Parameters − − − − − − − − − − var1 : array like Array like means all those objects − − lists , nested lists , etc . − − that can be converted to an array . We can also refer to variables like ‘var1 ‘ . var2 : int The type above can either refer to an actual Python type (e . g . ‘ ‘ int ‘ ‘) , or describe the type of the variable in more detail , e . g . ‘ ‘(N,) ndarray ‘ ‘ or ‘ ‘ array like ‘ ‘ . Long variable name : {’ hi ’ , ’ho ’} , optional Choices in brackets , default f i r s t when optional . Returns − − − − − − − type Explanation of anonymous return value of type ‘ ‘type ‘ ‘ . describe : type Explanation of return value named ‘ describe ‘ . out : type Explanation of ‘out ‘ . Other Parameters − − − − − − − − − − − − − − − − only seldom used keywords : type Explanation common parameters listed above : type Explanation

Slide 20

Slide 20 text

Outcome and impact of documentation marathon # of words in Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs

Slide 21

Slide 21 text

Outcome and impact of documentation marathon # of words in Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs

Slide 22

Slide 22 text

Documentation at a glance: galleries of examples

Slide 23

Slide 23 text

Documentation at a glance: galleries of examples

Slide 24

Slide 24 text

Documentation at a glance: galleries of examples

Slide 25

Slide 25 text

Galleries of examples Umbrella project: sphinx-gallery

Slide 26

Slide 26 text

2. Hacking

Slide 27

Slide 27 text

A shortage of developers Fernando Perez & Aaron Meurer Gist 5843625 A low bus factor A few people do most of the work

Slide 28

Slide 28 text

At this time I was starting to do a lot of image processing...

Slide 29

Slide 29 text

The revolution of images

Slide 30

Slide 30 text

The revolution of images

Slide 31

Slide 31 text

A flood of images several 108 images uploaded on Facebook each day

Slide 32

Slide 32 text

A flood of images hundreds of terabytes of scientific data for scientific experiment http://sdo.gsfc.nasa.gov/

Slide 33

Slide 33 text

A flood of images hundreds of terabytes of scientific data for scientific experiment http://sdo.gsfc.nasa.gov/ Image processing Manipulating images in order to retrieve new images or image characteristics (features, measurements, ...) Often combined with machine learning

Slide 34

Slide 34 text

What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays)

Slide 35

Slide 35 text

What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays) for 2D & 3D images simple API & gentle learning curve

Slide 36

Slide 36 text

Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np. ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x

Slide 37

Slide 37 text

Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np. ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x >>> coffee.shape (400, 600, 3) >>> red channel = coffee[..., 0] >>> image 3d = np.ones((100, 100, 100))

Slide 38

Slide 38 text

NumPy-native: images as NumPy arrays NumPy arrays as arguments and outputs >>> from skimage import io , f i l t e r s >>> c a m e r a a r r a y = i o . imread ( ’ camera image . png ’ ) >>> type( c a m e r a a r r a y ) >>> c a m e r a a r r a y . dtype dtype ( ’ uint8 ’ ) >>> f i l t e r e d a r r a y = f i l t e r s . g a u s s i a n f i l t e r ( camera array , sigma =5) >>> type( f i l t e r e d a r r a y ) >>> import m a t p l o t l i b . p y p l o t as p l t >>> p l t .imshow( f i l t e r e d a r r a y , cmap= ’ gray ’ ) x

Slide 39

Slide 39 text

An API relying mostly on functions skimage . f i l t e r s . g a u s s i a n f i l t e r (image , sigma , output = None, mode= ’ n e a r e st ’ , c v a l =0, m u l t i c h a n n e l =None) Multi - d i m e n s i o n a l Gaussian filter Parameters ---------- image : array - l i k e input image ( g r a y s c a l e or c o l o r ) to filter. sigma : s c a l a r or sequence of s c a l a r s st and ard d e v i a t i o n f o r Gaussian k e r n e l . The st and ard d e v i a t i o n s of the Gaussian filter are g i v e n f o r each a x i s as a sequence , or as a s i n g l e number , in which case i t i s equal f o r all axes . output : array , o p t i o n a l The ‘‘ output ‘‘ parameter p a s s e s an a r r a y in which to s t o r e the filter output . mode : { ’ r e f l e c t ’ , ’ constant ’ , ’ n ea re st ’ , ’ mirror ’ , ’ wrap ’ }, o p t i o n a l One filter = one function Use keyword argument for parameter tuning

Slide 40

Slide 40 text

Filtering: transforming image data skimage.filters, skimage.exposure, skimage.restoration

Slide 41

Slide 41 text

Denoising tomography images In-situ imaging of phase separation in silicate melts From basic (generic) to advanced (specific) filters

Slide 42

Slide 42 text

Denoising tomography images Histogram of pixel values From basic (generic) to advanced (specific) filters bilateral = restoration . denoise bilateral (dat) bilateral = restoration . denoise bilateral (dat, sigma range=2.5, sigma spatial=2) tv = restoration . denoise tv chambolle (dat, weight=0.5)

Slide 43

Slide 43 text

Segmentation: labelling regions skimage.segmentation

Slide 44

Slide 44 text

Mathematical morphology skimage.morphology: binary + grayscale morphology dilation, erosion, closing, opening several structural elements remove small objects watershed

Slide 45

Slide 45 text

Extracting features skimage.feature, skimage.filters

Slide 46

Slide 46 text

Feature extraction followed by classification Combining scikit-image and scikit-learn Extract features (skimage.feature) Pixels intensity values (R, G, B) Local gradients More advanced descriptors: HOGs, Gabor, ... Train classifier with known regions here, random forest classifier Classify pixels

Slide 47

Slide 47 text

Measures on images skimage.measure

Slide 48

Slide 48 text

Versatile use for 2D, 2D-RGB, 3D... >>> from skimage import measure >>> l a b e l s 2 d = measure . l a b e l ( image 2d ) >>> l a b e l s 3 d = measure . l a b e l ( image 3d ) x

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Gallery of examples

Slide 51

Slide 51 text

Getting started: finding documentation

Slide 52

Slide 52 text

Datasheet Package statistics http://scikit-image.org/ Release 0.11 (1 - 2 release per year) Among 1000 best ranked packages on PyPi Development model Mature algorithms Only Python + Cython code for easier maintainability Focus on good practices: testing, documentation, version control Hosted on GitHub: thorough code reivew + continuous integration Core team of 5 − 10 persons (close to applications)

Slide 53

Slide 53 text

The people

Slide 54

Slide 54 text

The people A quite healthy curve... we can do better!

Slide 55

Slide 55 text

3. Challenges for the future

Slide 56

Slide 56 text

Achieving a sustainable growth Balance users’ and contributors’ goals: robustness and smooth learning curve vs cool factor and bleeding-edge tools Feature development should not be faster than quality improvement Documentation and training for users Low entry barriers for contributors

Slide 57

Slide 57 text

Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable

Slide 58

Slide 58 text

joblib: easy simple parallel computing + lazy re-evaluation >>> from skimage import data <>> hubble = data . h u b b l e d e e p f i e l d () >>> width = 10 >>> p i c s = u t i l . view as windows ( hubble , ( width , hubble . shape [1], hubble . shape [2]) , s t e p = width ) >>> from j o b l i b import P a r a l l e l , d e l a y e d >>> # task is an image processing function >>> P a r a l l e l ( n j o b s =4)( d e l a y e d ( t a s k )( p i c ) f o r p i c in p i c s ) x

Slide 59

Slide 59 text

joblib: easy simple parallel computing + lazy re-evaluation Familiar with this mess? from skimage import f i l t e r s # Comment to save some time # filter_im = filters.median(im) # binary_im = filters. threshold_otsu (filter_im) v a l u e s = np. unique (im) x

Slide 60

Slide 60 text

joblib: easy simple parallel computing + lazy re-evaluation Familiar with this mess? from skimage import f i l t e r s # Comment to save some time # filter_im = filters.median(im) # binary_im = filters. threshold_otsu (filter_im) v a l u e s = np. unique (im) x >>> from j o b l i b import Memory >>> mem = Memory( c a c h e d i r = ’ /tmp/ j o b l i b ’ ) >>> square = mem. cache (np. square ) >>> b = square (a) [Memory] C a l l i n g square ... square ( a r r a y ([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) s q u a r e - 0... s , 0.0 min >>> c = square (a) >>> # The above call did not trigger an evaluation x

Slide 61

Slide 61 text

A platform to build an ecosystem upon Tool for users, platform for other tools $ apt-cache rdepends python-matplotlib ... 96 Python packages & applications Specific applications that could build on scikit-image Imaging techniques; microscopy, tomography, ... Fields: cell biology, astronomy, ... Requirements: stable API, good docs

Slide 62

Slide 62 text

No need to be a programming genius to contribute to OSS Social and pedagogical skills useful and welcome You will learn a lot and make friends. P. Hintjens

Slide 63

Slide 63 text

No need to be a programming genius to contribute to OSS Social and pedagogical skills useful and welcome You will learn a lot and make friends. P. Hintjens Try it out! http://scikit-image.org/ Feedback welcome github.com/scikit-image/scikit-image Please cite the paper Let’s talk about scikit-image @EGouillart