v3
@_inesmontani
NEW LIBRARY PATTERNS &
Behind the Scenes:
Design Concepts
EXPLAINED
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
global function
registry system
Slide 4
Slide 4 text
global function
registry system
programmable
user-facing APIs
Slide 5
Slide 5 text
global function
registry system
programmable
user-facing APIs
“bottom-up”
configuration
system
Slide 6
Slide 6 text
global function
registry system
programmable
user-facing APIs
type-based
data validation
“bottom-up”
configuration
system
Slide 7
Slide 7 text
global function
registry system
programmable
user-facing APIs
type-based
data validation
“bottom-up”
configuration
system
type hints &
static analysis for
model definitions
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
advanced
workflows for
modern NLP &
deep learning
SCENARIO #1
Slide 10
Slide 10 text
advanced
workflows for
modern NLP &
deep learning
SCENARIO #1
ease of use with
pre-configured
building blocks &
good defaults
SCENARIO #2
Slide 11
Slide 11 text
Machine Learning
is complex
AND THAT’S OKAY
Slide 12
Slide 12 text
Developer
Experience
NOT JUST
WE NEED BETTER
Abstractions
Slide 13
Slide 13 text
custom registered
functions and code config overrides
(data paths)
spacy.io/usage/training
training
config
Slide 14
Slide 14 text
con
fi
g.cfg
customize settings,
hyperparameters
and architectures
Slide 15
Slide 15 text
docs.python.org/3/library/con
fi
gparser.html
Slide 16
Slide 16 text
docs.python.org/3/library/con
fi
gparser.html
+ any JSON values
+ more flexible variable
interpolation
Slide 17
Slide 17 text
con
fi
g.cfg
registered functions
resolved bottom-up
variable interpolation
structured sections
Slide 18
Slide 18 text
“classic”
approach
Slide 19
Slide 19 text
modular
approach
Slide 20
Slide 20 text
con
fi
g.cfg
Slide 21
Slide 21 text
con
fi
g.cfg
resolved bottom-up
Slide 22
Slide 22 text
con
fi
g.cfg
resolved bottom-up
Slide 23
Slide 23 text
Pseudocode
“top-down”
configuration
Slide 24
Slide 24 text
Pseudocode
“top-down”
configuration
Slide 25
Slide 25 text
Pseudocode
“top-down”
configuration
Slide 26
Slide 26 text
Pseudocode
“top-down”
configuration
Slide 27
Slide 27 text
Pseudocode
“top-down”
configuration
Slide 28
Slide 28 text
Pseudocode
“bottom-up”
configuration
Slide 29
Slide 29 text
Pseudocode
“bottom-up”
configuration
Slide 30
Slide 30 text
con
fi
g.cfg
Slide 31
Slide 31 text
con
fi
g.cfg
Slide 32
Slide 32 text
serialized
model
save
Slide 33
Slide 33 text
serialized
model
save load
Slide 34
Slide 34 text
serialized
model
save load
custom code
& settings
Slide 35
Slide 35 text
serialized
model
save load
custom code
& settings
How should I
reconstruct this
object?
Slide 36
Slide 36 text
serialized
model
save load
custom code
& settings
How should I
reconstruct this
object?
define how to
create custom
objects
Slide 37
Slide 37 text
github.com/explosion/catalogue
Function registry
Slide 38
Slide 38 text
github.com/explosion/catalogue
Function registry
Slide 39
Slide 39 text
github.com/explosion/catalogue
Function registry
Slide 40
Slide 40 text
we always need to
know how an object
expects to be created
spacy.io/usage/processing-pipelines
Bugs & Mistakes
HAPPEN – WE JUST NEED TO
catch them
Slide 44
Slide 44 text
github.com/samuelcolvin/pydantic
data validation and
settings management
using Python type hints
Slide 45
Slide 45 text
No content
Slide 46
Slide 46 text
define
data model
Slide 47
Slide 47 text
define
data model
validate
against data
model
Slide 48
Slide 48 text
define
data model
validate
against data
model
...
FilePath
HttpUrl
int bool
StrictStr
PositiveInt
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
catch errors
in config
Slide 51
Slide 51 text
Con
fi
g schema
Slide 52
Slide 52 text
Con
fi
g schema
con
fi
g.cfg
Slide 53
Slide 53 text
Con
fi
g schema
con
fi
g.cfg
Slide 54
Slide 54 text
No content
Slide 55
Slide 55 text
1. inspect
Slide 56
Slide 56 text
1. inspect
2. generate data model
Slide 57
Slide 57 text
1. inspect
2. generate data model
3. validate
Slide 58
Slide 58 text
base_con
fi
g.cfg
Slide 59
Slide 59 text
base_con
fi
g.cfg
partial
config
Slide 60
Slide 60 text
base_con
fi
g.cfg
partial
config
show
visual diff
con
fi
g.cfg
Slide 61
Slide 61 text
No content
Slide 62
Slide 62 text
No content
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
No content
Slide 65
Slide 65 text
No content
Slide 66
Slide 66 text
No content
Slide 67
Slide 67 text
No content
Slide 68
Slide 68 text
No content
Slide 69
Slide 69 text
No content
Slide 70
Slide 70 text
No content
Slide 71
Slide 71 text
No content
Slide 72
Slide 72 text
No content
Slide 73
Slide 73 text
No content
Slide 74
Slide 74 text
No content
Slide 75
Slide 75 text
Debugging &
FASTER
Productivity
HIGHER
Slide 76
Slide 76 text
Model
thinc.ai
Slide 77
Slide 77 text
Model
custom array types
Floats2d
thinc.ai
Slide 78
Slide 78 text
Model
custom array types
Floats2d Ints1d
...
Padded Ragged
thinc.ai
Slide 79
Slide 79 text
No content
Slide 80
Slide 80 text
expected
return types
Slide 81
Slide 81 text
Y: Floats3d
Incompatible return value type
(got "Tuple[Floats3d, Callable[[Any], Any]]",
expected
return types
Slide 82
Slide 82 text
Y: Floats3d
Incompatible return value type
(got "Tuple[Floats3d, Callable[[Any], Any]]",
expected
return types
static analysis: catch
errors as you type
Slide 83
Slide 83 text
Pseudocode
typed methods for
transformations
Slide 84
Slide 84 text
Model
thinc.ai
Slide 85
Slide 85 text
Model
Model[InputT,
OutputT]
generic types
thinc.ai
Slide 86
Slide 86 text
No content
Slide 87
Slide 87 text
mypy.ini
optional mypy plugin
for more checks
Slide 88
Slide 88 text
Relu: Relu
Layer outputs type (thinc.types.Floats2d) but
the next layer expects (thinc.types.Ragged) as
an input
mypy.ini
optional mypy plugin
for more checks
Slide 89
Slide 89 text
Relu: Relu
Layer outputs type (thinc.types.Floats2d) but
the next layer expects (thinc.types.Ragged) as
an input
static analysis: catch
errors as you type
mypy.ini
optional mypy plugin
for more checks
extensive
documentation
user-focused
error handling
& validation
consistent
naming
avoid redundant
shortcuts & competing
abstractions
smooth path from
prototype to
production
Developer
Productivity
Slide 96
Slide 96 text
extensive
documentation
user-focused
error handling
& validation
consistent
naming
avoid redundant
shortcuts & competing
abstractions
smooth path from
prototype to
production
provide building blocks
to program with, not just
abstractions
Developer
Productivity
Slide 97
Slide 97 text
extensive
documentation
user-focused
error handling
& validation
consistent
naming
avoid redundant
shortcuts & competing
abstractions
smooth path from
prototype to
production
provide building blocks
to program with, not just
abstractions
Developer
Productivity
Slide 98
Slide 98 text
Prototype &
Production
CLOSING THE GAP BETWEEN
Slide 99
Slide 99 text
NLP
Slide 100
Slide 100 text
NLP
flexible
tools you know
and understand
Slide 101
Slide 101 text
spacy.io/usage/v3 @spacy_io @_inesmontani
Slide 102
Slide 102 text
spacy.io/usage/v3 @spacy_io @_inesmontani
install spaCy v3
from pip or conda
Slide 103
Slide 103 text
spacy.io/usage/v3 @spacy_io @_inesmontani
documentation
and quickstart
install spaCy v3
from pip or conda
Slide 104
Slide 104 text
spacy.io/usage/v3 @spacy_io @_inesmontani
documentation
and quickstart
install spaCy v3
from pip or conda
thank you! —