Slide 1

Slide 1 text

Learning programming & science with Scientific Python Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart

Slide 2

Slide 2 text

Thank you from the scikit family Conferences with childcare rock!

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Looking at materials from the inside Can we save energy when melting glass?

Slide 6

Slide 6 text

Some problems I want to solve Scientific / development problems Finding image processing pipelines for extracting scientific information Running these operations on very large datasets (> 1 Tb) in reasonable time Removing bugs in scikit-image

Slide 7

Slide 7 text

Some problems I want to solve Scientific / development problems Finding image processing pipelines for extracting scientific information Running these operations on very large datasets (> 1 Tb) in reasonable time Removing bugs in scikit-image Social and learning problems Working with people with little experience in programming and image processing Helping people to learn programming and image processing How can we have more developers and users of scikit-image?

Slide 8

Slide 8 text

What is image processing? Manipulations of images to Transform them for other purposes: color balance, enhancement, filtering, inpainting, ... Extract information from them: classification, measurements... a) flltering b) feature extraction c) segmentation d) measures non-local means denoising total variation denoising blob detection super-pixels skeleton & local diameter particle properties segmentation from markers ridge detection

Slide 9

Slide 9 text

Traditional image processing: no future?

Slide 10

Slide 10 text

Traditional image processing: no future? Is deep learning the universal solution? Image classification From Keras blog Image segmentation FCN, from Daniil Pakhomov’s blog

Slide 11

Slide 11 text

Still room for other tools! Non-natural images

Slide 12

Slide 12 text

Still room for other tools! Non-natural images

Slide 13

Slide 13 text

Still room for other tools! Non-natural images Avoid sledgehammer approach simple operations: image resiz- ing, histogram equalization, ... pre-processing for machine learning

Slide 14

Slide 14 text

What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays)

Slide 15

Slide 15 text

What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays) for 2D & 3D images simple API & gentle learning curve

Slide 16

Slide 16 text

Segmentation: labelling regions skimage.segmentation

Slide 17

Slide 17 text

Extracting features skimage.feature, skimage.filters

Slide 18

Slide 18 text

Feature extraction followed by classification Combining scikit-image and scikit-learn Daisy features Random Forest classifier Back to image processing: Gaussian filtering, thresholding and mathematical morphology.

Slide 19

Slide 19 text

Datasheet Package statistics http://scikit-image.org/ Release 0.13 (1 - 2 release per year) Among 1000 best ranked packages on PyPi 20000 unique visitors / month

Slide 20

Slide 20 text

Who is your typical user? Windows 54% Linux 26% OS X 20% Not a lot of hardcore geeks Not a lot of time on her plate Learning / finding information is hard

Slide 21

Slide 21 text

The people 2 4 6 8 10 12 Individual Committer 0.0 0.2 0.4 0.6 0.8 1.0 Commit rate Normalized commit rates since 2014-06 matplotlib numpy pandas scikit-image scikit-learn scipy A quite healthy curve... we can do better! Fernando Perez & Aaron Meurer, Gist 5843625

Slide 22

Slide 22 text

Resisting complexity for less confused users

Slide 23

Slide 23 text

Good software solves useful problems, in a simple way Peter Hintjens, Coding Serbia 2014

Slide 24

Slide 24 text

My first experience of programming...

Slide 25

Slide 25 text

My first experience of programming... >>> cd new_experiment >>> acquire_temperature () >>> name_exp = ’← convection ’ >>> control_parameter () >>> ... and other magical← spells x

Slide 26

Slide 26 text

My first experience of programming... >>> cd new_experiment >>> acquire_temperature () >>> name_exp = ’← convection ’ >>> control_parameter () >>> ... and other magical← spells x

Slide 27

Slide 27 text

Mastering new abstractions Python beginners

Slide 28

Slide 28 text

Mastering new abstractions Python beginners data scientists

Slide 29

Slide 29 text

NumpPy: Python objects for numerical arrays Multi-dimensional numerical data container (based on compiled code) + utility functions to create/manipulate them >>> a = np.random. random_integers (0, 1, (2, 2, 2)) >>> a array ([[[0 , 1], [1, 0]], [[0, 0], [0, 1]]]) >>> a.shape , a.dtype ((2, 2, 2), dtype(’int64 ’)) x

Slide 30

Slide 30 text

NumpPy: Python objects for numerical arrays Multi-dimensional numerical data container (based on compiled code) + utility functions to create/manipulate them >>> a = np.random. random_integers (0, 1, (2, 2, 2)) >>> a array ([[[0 , 1], [1, 0]], [[0, 0], [0, 1]]]) >>> a.shape , a.dtype ((2, 2, 2), dtype(’int64 ’)) x Efficient and versatile data access indexing and slicing fancy indexing

Slide 31

Slide 31 text

Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np.ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x

Slide 32

Slide 32 text

Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np.ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x >>> coffee.shape (400, 600, 3) >>> red channel = coffee[..., 0] >>> image 3d =

Slide 33

Slide 33 text

NumPy-native: images as NumPy arrays NumPy arrays as arguments and outputs >>> from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) >>> camera_array .dtype dtype(’uint8 ’) >>> filtered_array = filters.gaussian(camera_array , ← sigma =5) >>> type( filtered_array ) >>> import matplotlib.pyplot as plt >>> plt.imshow(filtered_array , cmap=’gray ’) x

Slide 34

Slide 34 text

How we simplified the API Before 2013 >>> from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) Image ... >> camera.max() Image (255 , dtype=uint8) x

Slide 35

Slide 35 text

Versatile use for 2D, 2D-RGB, 3D... >>> from skimage import measure >>> labels_2d = measure.label(image_2d) >>> labels_3d = measure.label(image_3d) x

Slide 36

Slide 36 text

Versatile use for 2D, 2D-RGB, 3D... def quickshift(image , ratio =1.0, kernel_size =5, ← max_dist =10, sigma =0, random_seed =42): """ Segments image using quickshift clustering in← Color -(x,y) space. ... """ image = img_as_float(np.atleast_3d(image)) ... x

Slide 37

Slide 37 text

Versatile use for 2D, 2D-RGB, 3D... def quickshift(image , ratio =1.0, kernel_size =5, ← max_dist =10, sigma =0, random_seed =42): """ Segments image using quickshift clustering in← Color -(x,y) space. ... """ image = img_as_float(np.atleast_3d(image)) ... x One filter = one function Use keyword argument for parameter tuning

Slide 38

Slide 38 text

API of scikit-image skimage filters restoration segmentation ... denoise_bilateral input array + optional parameters output (array) submodule module function variables

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Object-oriented programming OK for abstractions not adding cognitive load

Slide 41

Slide 41 text

API of scikit-learn simple and consistent

Slide 42

Slide 42 text

Consistent names

Slide 43

Slide 43 text

Consistent names filters . median ( image , selem=None , ← out=None , . . . ) morphology . binary_erosion ( image , ← selem=None , out=None ) restoration . denoise_bilateral (← image , win_size=None , . . . )

Slide 44

Slide 44 text

Consistent names filters . median ( image , selem=None , ← out=None , . . . ) morphology . binary_erosion ( image , ← selem=None , out=None ) restoration . denoise_bilateral (← image , win_size=None , . . . )

Slide 45

Slide 45 text

Learning from mistakes: confusing behavior

Slide 46

Slide 46 text

Learning from mistakes: confusing behavior im = img_as_float ( data . camera ( ) ) hist_1 = exposure . histogram ( im ) im_med = filters . median ( im ) hist_2 = exposure . histogram (← im_med ) 0 50 100 150 200 250 im im_med

Slide 47

Slide 47 text

Learning from mistakes: confusing behavior Is image maximum a useful abstraction?

Slide 48

Slide 48 text

Documentation and teaching

Slide 49

Slide 49 text

Docstrings now and then docstring in 2008 Definition: np.diff(a, n=1, axis =-1) Docstring: Calculate the n-th order discrete difference along ← given axis. x

Slide 50

Slide 50 text

Docstrings now and then Definition: np.diff(a, n=1, axis =-1) Docstring: Calculate the n-th order discrete difference along given axis. The first order difference is given by ‘‘out[n] = a[n+1] - a[n]‘‘ along the given axis , higher order differences are calculated by using ‘diff ‘ recursively . Parameters ---------- a : array_like Input array n : int , optional The number of times values are differenced. axis : int , optional The axis along which the difference is taken , default is the last axis. Returns ------- diff : ndarray The ‘n‘ order differences . The shape of the output is the same as ‘a‘ except along ‘axis ‘ where the dimension is smaller by ‘n‘. See Also -------- gradient , ediff1d , cumsum Examples -------- >>> x = np.array ([1, 2, 4, 7, 0]) >>> np.diff(x) much better now! Parameters and their type Suggestion of other functions Simple example

Slide 51

Slide 51 text

pydocweb and NumPy documentation Marathon Tools by Pauli Virtnanen, with enthusiastic cheering from my side Documentation effort led by St´ efan van der Walt Easy as Wikipedia A wiki to improve the docs We didn’t have Github!

Slide 52

Slide 52 text

NumPy documentation standard https://github.com/numpy/numpy/blob/master/doc/example.py def foo(var1 , var2 , long_var_name =’hi’) : r"""A one -line summary that does not use variable names or the function name. Several sentences providing an extended description . Refer to variables using back -ticks , e.g. ‘var ‘. Parameters ---------- var1 : array_like Array_like means all those objects -- lists , nested lists , etc. -- that can be converted to an array. We can also refer to variables like ‘var1 ‘. var2 : int The type above can either refer to an actual Python type (e.g. ‘‘int ‘‘), or describe the type of the variable in more detail , e.g. ‘‘(N,) ndarray ‘‘ or ‘‘array_like ‘‘. Long_variable_name : {’hi ’, ’ho ’}, optional Choices in brackets , default first when optional. Returns ------- type Explanation of anonymous return value of type ‘‘type ‘‘. describe : type Explanation of return value named ‘describe ‘. out : type Explanation of ‘out ‘. Other Parameters ---------------- only_seldom_used_keywords : type Explanation

Slide 53

Slide 53 text

Outcome and impact of documentation marathon # of words in Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs

Slide 54

Slide 54 text

Outcome and impact of documentation marathon # of words in Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs

Slide 55

Slide 55 text

Documentation at a glance: galleries of examples

Slide 56

Slide 56 text

Documentation at a glance: galleries of examples

Slide 57

Slide 57 text

Getting started: finding documentation

Slide 58

Slide 58 text

Umbrella project: sphinx-gallery

Slide 59

Slide 59 text

Auto documenting your API with links to examples

Slide 60

Slide 60 text

Auto documenting your API with links to examples

Slide 61

Slide 61 text

The machine learning flow chart

Slide 62

Slide 62 text

Learning by yourself

Slide 63

Slide 63 text

Gamifying learning

Slide 64

Slide 64 text

Example: segmentation of low-constrast regions In-situ imaging of glass batch reactive melting Non-local means denoising to preserve texture Histogram-based markers extraction Random walker segmentation Non-local means: average similar patches Random walker:anisotropic diffusion from markers Random walker less sensitive to noise than watershed, but slower

Slide 65

Slide 65 text

Towards a more interactive documentation? https://mybinder.org/v2/gh/scikit-image/skimage-tutorials/master

Slide 66

Slide 66 text

Towards a more interactive documentation?

Slide 67

Slide 67 text

More interaction for faster discovery: widgets

Slide 68

Slide 68 text

Keeping interaction easy for large data from joblib import Memory memory = Memory ( cachedir=’ . / c a c h e d i r ’ , verbose=0) @memory . cache def mem_label ( x ) : r e t u r n measure . label ( x ) @memory . cache def mem_threshold_otsu ( x ) : r e t u r n filters . threshold_otsu ( x ) [ . . . ] val = mem_threshold_otsu ( dat ) objects = dat > val median_dat = mem_median_filter ( dat , 3) val2 = mem_threshold_otsu ( median_dat [ objects ] ) liquid = median_dat > val2 segmentation_result = np . copy ( objects ) . astype ( np . uint8 ) segmentation_result [ liquid ] = 2 aggregates = mem_binary_fill_holes ( objects ) aggregates_ds = np . copy ( aggregates [ : : 4 , : : 4 , : : 4 ] ) cores = mem_binary_erosion ( aggregates_ds , np . ones ((10 , 10 ,← 10) ) )

Slide 69

Slide 69 text

Euroscipy conferences Leipzig, Paris, Brussels, Cambridge, Erlangen 2018: Trento 2 days of tutorials, beginners and advanced 2 days of conference Help from volunteers always welcome!

Slide 70

Slide 70 text

Euroscipy 2018

Slide 71

Slide 71 text

Scipy lecture notes Train a lot of people: need tools that scale Several weeks of tutorials! Beginners: the core of Scientific Python Advanced: learn more tricks Packages: specific applications and packages Developed and used for Euroscipy conferences Curated and enriched over the years

Slide 72

Slide 72 text

Learning science

Slide 73

Slide 73 text

Picture denoising

Slide 74

Slide 74 text

Picture denoising

Slide 75

Slide 75 text

Denoising tomography images In-situ imaging of phase separation in silicate melts From basic (generic) to advanced (specific) filters

Slide 76

Slide 76 text

Denoising tomography images Histogram of pixel values From basic (generic) to advanced (specific) filters bilateral = restoration.denoise_bilateral(dat) bilateral = restoration.denoise_bilateral(dat, sigma_range← =2.5, sigma_spatial=2) tv = restoration.denoise_tv_chambolle(dat, weight=0.5)

Slide 77

Slide 77 text

Boosted learning: contributing to open-source Learning more about programming Learning more about image processing

Slide 78

Slide 78 text

Challenges for the future

Slide 79

Slide 79 text

Achieving a sustainable growth Balance users’ and contributors’ goals: robustness and smooth learning curve vs cool factor and bleeding-edge tools Feature development should not be faster than quality improvement Documentation and training for users Low entry barriers for contributors

Slide 80

Slide 80 text

Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable

Slide 81

Slide 81 text

Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable

Slide 82

Slide 82 text

joblib: easy simple parallel computing + lazy re-evaluation import numpy as np from sklearn . externals . joblib import Parallel , delayed def apply_parallel ( func , data , ∗args , chunk =100, overlap =10, n_jobs=4, ∗∗kwargs ) : ””” Apply a f u n c t i o n i n p a r a l l e l to o v e r l a p p i n g chunks of an a r r a y . j o b l i b i s used f o r p a r a l l e l p r o c e s s i n g . [ . . . ] Examples − − − − − − − − > > > from skimage import data , f i l t e r s > > > c o i n s = data . c o i n s () > > > r e s = a p p l y p a r a l l e l ( f i l t e r s . gaussian , coins , 2) ””” sh0 = data . shape [ 0 ] nb_chunks = sh0 // chunk end_chunk = sh0 % chunk arg_list = [ data [ max (0 , i∗chunk − overlap ) : min (( i+1)∗chunk + overlap , sh0 ) ] f o r i i n range (0 , nb_chunks ) ] i f end_chunk > 0 : arg_list . append ( data[−end_chunk − overlap : ] ) res_list = Parallel ( n_jobs=n_jobs ) ( delayed ( func ) ( sub_im , ∗args , ∗∗kwargs ) f o r sub_im i n arg_list ) output_dtype = res_list [ 0 ] . dtype out_data = np . empty ( data . shape , dtype=output_dtype ) f o r i i n range (1 , nb_chunks ) : out_data [ i∗chunk : ( i+1)∗chunk ] = res_list [ i ] [ overlap : overlap+chunk ] out_data [ : chunk ] = res_list [0][: − overlap ] i f end_chunk > 0 : out_data[−end_chunk : ] = res_list [ −1][ overlap : ] r e t u r n out_data

Slide 83

Slide 83 text

A platform to build an ecosystem upon Tool for users, platform for other tools $ apt-cache rdepends python-matplotlib ... 96 Python packages & applications Specific applications that could build on scikit-image Imaging techniques; microscopy, tomography, ... Fields: cell biology, astronomy, ... Requirements: stable API, good docs

Slide 84

Slide 84 text

Where Do We Come From? What Are We? Where Are We Going?

Slide 85

Slide 85 text

Where Do We Come From? What Are We? Where Are We Going? What is great with our tools? What can we learn about them to create new tools? Do we need other tools?

Slide 86

Slide 86 text

We’ve got great tools Can we make them even more ”self-learning”? Consider contributing to the ecosystem: you don’t have to be a programming genius. Which tools are we lacking yet? For which problems are we at the pre-numpy / pre-pandas stage? User interfaces? Data retrieval and cleaning?