Slide 1

Slide 1 text

Introduction to scikit-image Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Looking at materials from the inside 3-D movie at 1000◦ C! Can we save energy when melting glass?

Slide 6

Slide 6 text

Getting quantitative data out of images

Slide 7

Slide 7 text

Complex workflows Several steps: pre-processing, region extraction, post-processing several possible algorithms for each step, parameter selection need for intermediate visalizations A lot of trial and error!

Slide 8

Slide 8 text

Some problems I want to solve Scientific / development problems Finding image processing pipelines for extracting scientific information Running these operations on very large datasets (> 1 Tb) in reasonable time Removing bugs in scikit-image

Slide 9

Slide 9 text

Some problems I want to solve Scientific / development problems Finding image processing pipelines for extracting scientific information Running these operations on very large datasets (> 1 Tb) in reasonable time Removing bugs in scikit-image Social and learning problems Working with people with little experience in programming and image processing Helping people to learn programming and image processing How can we have more developers and users of scikit-image?

Slide 10

Slide 10 text

What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays)

Slide 11

Slide 11 text

What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays) for 2D & 3D images simple API & gentle learning curve

Slide 12

Slide 12 text

What is image processing? Manipulations of images to Transform them for other purposes: color balance, enhancement, filtering, inpainting, ... Extract information from them: classification, measurements... a) flltering b) feature extraction c) segmentation d) measures non-local means denoising total variation denoising blob detection super-pixels skeleton & local diameter particle properties segmentation from markers ridge detection

Slide 13

Slide 13 text

Traditional image processing: no future?

Slide 14

Slide 14 text

Traditional image processing: no future? Is deep learning the universal solution? Image classification From Keras blog Image segmentation FCN, from Daniil Pakhomov’s blog

Slide 15

Slide 15 text

Still room for other tools! Non-natural images

Slide 16

Slide 16 text

Still room for other tools! Non-natural images

Slide 17

Slide 17 text

Still room for other tools! Non-natural images Avoid sledgehammer approach simple operations: image resiz- ing, histogram equalization, ... pre-processing for machine learning

Slide 18

Slide 18 text

First steps with scikit-image >>> from skimage import data , measure >>> im = data.binary_blobs (128, n_dim =3, ← volume_fraction =0.2) >>> im.shape # 3-D array (128 , 128, 128) >>> labels = measure.label(im) >>> type(im), type(labels) # images are numpy arrays (, ) >>> labels [0, 0, 0] # pixels are array elements 0 x Visualization with Mayavi Images are NumPy arrays Pixels are array elements Simple API relying on functions new = function(image) Most functions work for 2D & 3D Optional parameters: keyword arguments

Slide 19

Slide 19 text

The Scientific Python ecosystem 1 75 ... 32 ... 10 3 ... 2 8 23 ... 36 ... 1 13 ... 0 12 43 ... 6 ... 66 ... 97 1 8 5 ... 73 ... 78 98 ... 9 4 10 ... 55 ... 10 3 ... 9 1 41 ... 98 ... 38 75 ... 6 ... visualization Development environment machine learning image properties object properties image file(s) array

Slide 20

Slide 20 text

Scipy lecture notes Train a lot of people: need tools that scale Several weeks of tutorials! Beginners: the core of Scientific Python Advanced: learn more tricks Packages: specific applications and packages Developed and used for Euroscipy conferences Curated and enriched over the years

Slide 21

Slide 21 text

Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np.ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x

Slide 22

Slide 22 text

Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np.ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x >>> coffee.shape (400, 600, 3) >>> red channel = coffee[..., 0] >>> image 3d = np.ones((100, 100, 100))

Slide 23

Slide 23 text

NumPy-native: images as NumPy arrays NumPy arrays as arguments and outputs >>> from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) >>> camera_array .dtype dtype(’uint8 ’) >>> filtered_array = filters.gaussian(camera_array , ← sigma =5) >>> type( filtered_array ) >>> import matplotlib.pyplot as plt >>> plt.imshow(filtered_array , cmap=’gray ’) x

Slide 24

Slide 24 text

How we simplified the API Before 2013 >>> from skimage import io , filters >>> camera_array = io.imread(’camera_image .png’) >>> type(camera_array ) Image ... >> camera.max() Image (255 , dtype=uint8) x

Slide 25

Slide 25 text

Versatile use for 2D, 2D-RGB, 3D... >>> from skimage import measure >>> labels_2d = measure.label(image_2d) >>> labels_3d = measure.label(image_3d) x

Slide 26

Slide 26 text

Versatile use for 2D, 2D-RGB, 3D... def quickshift(image , ratio =1.0, kernel_size =5, ← max_dist =10, sigma =0, random_seed =42): """ Segments image using quickshift clustering in← Color -(x,y) space. ... """ image = img_as_float(np.atleast_3d(image)) ... x

Slide 27

Slide 27 text

Versatile use for 2D, 2D-RGB, 3D... def quickshift(image , ratio =1.0, kernel_size =5, ← max_dist =10, sigma =0, random_seed =42): """ Segments image using quickshift clustering in← Color -(x,y) space. ... """ image = img_as_float(np.atleast_3d(image)) ... x One filter = one function Use keyword argument for parameter tuning

Slide 28

Slide 28 text

API of scikit-image skimage filters restoration segmentation ... denoise_bilateral input array + optional parameters output (array) submodule module function variables

Slide 29

Slide 29 text

Consistent names

Slide 30

Slide 30 text

Consistent names filters . median ( image , selem=None , ← out=None , . . . ) morphology . binary_erosion ( image , ← selem=None , out=None ) restoration . denoise_bilateral (← image , win_size=None , . . . )

Slide 31

Slide 31 text

Consistent names filters . median ( image , selem=None , ← out=None , . . . ) morphology . binary_erosion ( image , ← selem=None , out=None ) restoration . denoise_bilateral (← image , win_size=None , . . . )

Slide 32

Slide 32 text

Filtering: transforming image data skimage.filter, skimage.exposure, skimage.restoration

Slide 33

Slide 33 text

Geometrical transformations skimage.transform scale, zoom, rotate, swirl, warp, ...

Slide 34

Slide 34 text

Extracting features skimage.feature, skimage.filters

Slide 35

Slide 35 text

Feature extraction followed by classification Combining scikit-image and scikit-learn Daisy features Random Forest classifier Back to image processing: Gaussian filtering, thresholding and mathematical morphology.

Slide 36

Slide 36 text

Measures on images skimage.measure

Slide 37

Slide 37 text

Segmentation: labelling regions skimage.segmentation

Slide 38

Slide 38 text

Datasheet Package statistics http://scikit-image.org/ Release 0.14 (1 - 2 release per year) > 200 contributors, 5-10 maintainers 20000 unique visitors / month Among 1000 best ranked packages on PyPi

Slide 39

Slide 39 text

Documentation: learning image processing and scikit-image

Slide 40

Slide 40 text

Docstrings now and then docstring in 2008 Definition: np.diff(a, n=1, axis =-1) Docstring: Calculate the n-th order discrete difference along ← given axis. x

Slide 41

Slide 41 text

Docstrings now and then Definition : np.diff(a, n=1, axis =-1) Docstring: Calculate the n-th order discrete difference along given axis. The first order difference is given by ‘‘out[n] = a[n+1] - a[n]‘‘ along the given axis , higher order differences are calculated by using ‘diff ‘ recursively . Parameters ---------- a : array_like Input array n : int , optional The number of times values are differenced. axis : int , optional The axis along which the difference is taken , default is the last axis. Returns ------- diff : ndarray The ‘n‘ order differences . The shape of the output is the same as ‘a‘ except along ‘axis ‘ where the dimension is smaller by ‘n‘. See Also -------- gradient , ediff1d , cumsum Examples -------- >>> x = np.array ([1, 2, 4, 7, 0]) >>> np.diff(x) much better now! Parameters and their type Suggestion of other functions Simple example

Slide 42

Slide 42 text

NumPy documentation standard https://github.com/numpy/numpy/blob/master/doc/example.py def foo(var1 , var2 , long_var_name =’hi’) : r"""A one -line summary that does not use variable names or the function name. Several sentences providing an extended description . Refer to variables using back -ticks , e.g. ‘var ‘. Parameters ---------- var1 : array_like Array_like means all those objects -- lists , nested lists , etc. -- that can be converted to an array. We can also refer to variables like ‘var1 ‘. var2 : int The type above can either refer to an actual Python type (e.g. ‘‘int ‘‘), or describe the type of the variable in more detail , e.g. ‘‘(N,) ndarray ‘‘ or ‘‘array_like ‘‘. Long_variable_name : {’hi ’, ’ho ’}, optional Choices in brackets , default first when optional. Returns ------- type Explanation of anonymous return value of type ‘‘type ‘‘. describe : type Explanation of return value named ‘describe ‘. out : type Explanation of ‘out ‘. Other Parameters ---------------- only_seldom_used_keywords : type Explanation

Slide 43

Slide 43 text

Documentation at a glance: galleries of examples

Slide 44

Slide 44 text

Documentation at a glance: galleries of examples

Slide 45

Slide 45 text

Getting started: finding documentation

Slide 46

Slide 46 text

Umbrella project: sphinx-gallery

Slide 47

Slide 47 text

Auto documenting your API with links to examples

Slide 48

Slide 48 text

Auto documenting your API with links to examples

Slide 49

Slide 49 text

Picture denoising

Slide 50

Slide 50 text

Picture denoising

Slide 51

Slide 51 text

Denoising tomography images In-situ imaging of phase separation in silicate melts From basic (generic) to advanced (specific) filters

Slide 52

Slide 52 text

Denoising tomography images Histogram of pixel values From basic (generic) to advanced (specific) filters bilateral = restoration.denoise_bilateral(dat) bilateral = restoration.denoise_bilateral(dat, sigma_range← =2.5, sigma_spatial=2) tv = restoration.denoise_tv_chambolle(dat, weight=0.5)

Slide 53

Slide 53 text

Learning by yourself

Slide 54

Slide 54 text

Towards a more interactive documentation? https://mybinder.org/v2/gh/scikit-image/skimage-tutorials/master

Slide 55

Slide 55 text

Towards a more interactive documentation?

Slide 56

Slide 56 text

Towards a more interactive documentation?

Slide 57

Slide 57 text

Boosted learning: contributing to open-source Learning more about programming Learning more about image processing

Slide 58

Slide 58 text

Development process: datasheet Package statistics http://scikit-image.org/ Release 0.14 (1 - 2 release per year) Among 500 best ranked packages on PyPi (> 106 downloads per year) 20000 unique visitors / month

Slide 59

Slide 59 text

The people

Slide 60

Slide 60 text

The people 2 4 6 8 10 12 Individual Committer 0.0 0.2 0.4 0.6 0.8 1.0 Commit rate Normalized commit rates since 2014-06 matplotlib numpy pandas scikit-image scikit-learn scipy A quite healthy curve... we can do better! Fernando Perez & Aaron Meurer, Gist 5843625

Slide 61

Slide 61 text

Infrastructure: how to build your code

Slide 62

Slide 62 text

More interaction for faster discovery: widgets

Slide 63

Slide 63 text

More interaction for faster discovery: web applications made easy https://dash.plot.ly/

Slide 64

Slide 64 text

Keeping interaction easy for large data from joblib import Memory memory = Memory ( cachedir=’ . / c a c h e d i r ’ , verbose=0) @memory . cache def mem_label ( x ) : r e t u r n measure . label ( x ) @memory . cache def mem_threshold_otsu ( x ) : r e t u r n filters . threshold_otsu ( x ) [ . . . ] val = mem_threshold_otsu ( dat ) objects = dat > val median_dat = mem_median_filter ( dat , 3) val2 = mem_threshold_otsu ( median_dat [ objects ] ) liquid = median_dat > val2 segmentation_result = np . copy ( objects ) . astype ( np . uint8 ) segmentation_result [ liquid ] = 2 aggregates = mem_binary_fill_holes ( objects ) aggregates_ds = np . copy ( aggregates [ : : 4 , : : 4 , : : 4 ] ) cores = mem_binary_erosion ( aggregates_ds , np . ones ((10 , 10 ,← 10) ) )

Slide 65

Slide 65 text

Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable

Slide 66

Slide 66 text

Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable

Slide 67

Slide 67 text

joblib: easy simple parallel computing + lazy re-evaluation import numpy as np from sklearn . externals . joblib import Parallel , delayed def apply_parallel ( func , data , ∗args , chunk =100, overlap =10, n_jobs=4, ∗∗kwargs ) : ””” Apply a f u n c t i o n i n p a r a l l e l to o v e r l a p p i n g chunks of an a r r a y . j o b l i b i s used f o r p a r a l l e l p r o c e s s i n g . [ . . . ] Examples − − − − − − − − > > > from skimage import data , f i l t e r s > > > c o i n s = data . c o i n s () > > > r e s = a p p l y p a r a l l e l ( f i l t e r s . gaussian , coins , 2) ””” sh0 = data . shape [ 0 ] nb_chunks = sh0 // chunk end_chunk = sh0 % chunk arg_list = [ data [ max (0 , i∗chunk − overlap ) : min (( i+1)∗chunk + overlap , sh0 ) ] f o r i i n range (0 , nb_chunks ) ] i f end_chunk > 0 : arg_list . append ( data[−end_chunk − overlap : ] ) res_list = Parallel ( n_jobs=n_jobs ) ( delayed ( func ) ( sub_im , ∗args , ∗∗kwargs ) f o r sub_im i n arg_list ) output_dtype = res_list [ 0 ] . dtype out_data = np . empty ( data . shape , dtype=output_dtype ) f o r i i n range (1 , nb_chunks ) : out_data [ i∗chunk : ( i+1)∗chunk ] = res_list [ i ] [ overlap : overlap+chunk ] out_data [ : chunk ] = res_list [0][: − overlap ] i f end_chunk > 0 : out_data[−end_chunk : ] = res_list [ −1][ overlap : ] r e t u r n out_data

Slide 68

Slide 68 text

Experimental chunking and parallelization

Slide 69

Slide 69 text

Other tools Version control: git/github Debugger: %profile Profiler: lineprofiler package with @profile decorator

Slide 70

Slide 70 text

A platform to build an ecosystem upon Tool for users, platform for other tools $ apt-cache rdepends python-matplotlib ... 96 Python packages & applications Specific applications that could build on scikit-image Imaging techniques; microscopy, tomography, ... Fields: cell biology, astronomy, ... Requirements: stable API, good docs

Slide 71

Slide 71 text

dash-canvas demo Package: http://github.com/plotly/dash-canvas Gallery: http://dash-playground.plotly.host/canvas-gallery/