Slide 1

Slide 1 text

Advances in Multilingual Stemming on CPAN Nick Patch @nickpatch Shutterstock

Slide 2

Slide 2 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 3

Slide 3 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 4

Slide 4 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 5

Slide 5 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 6

Slide 6 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 7

Slide 7 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 8

Slide 8 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 9

Slide 9 text

hacking → hack hacker → hack hacked → hack hack → hack

Slide 10

Slide 10 text

gurgled → gurgl

Slide 11

Slide 11 text

gurgled → gurgl

Slide 12

Slide 12 text

gurgled → gurgl gurgling → gurgl

Slide 13

Slide 13 text

gurgled → gurgl gurgling → gurgl

Slide 14

Slide 14 text

gurgled → gurgl gurgling → gurgl gurgle → gurgl

Slide 15

Slide 15 text

gurgled → gurgl gurgling → gurgl gurgle → gurgl

Slide 16

Slide 16 text

stem("hacker")

Slide 17

Slide 17 text

stem("hacker") eq stem("hacking")

Slide 18

Slide 18 text

indexer(stem("hacker"))

Slide 19

Slide 19 text

indexer(stem("hacker")) lookup(stem("hacking"))

Slide 20

Slide 20 text

Lingua::Stem::Any

Slide 21

Slide 21 text

Lingua::Stem::Any bg cs da de en eo es fa f fr gl hu io it nl no pt ro ru sv tr

Slide 22

Slide 22 text

use Lingua::Stem::Any; $stemmer = Lingua::Stem::Any->new( language => $language ); $stem = $stemmer->stem($word);

Slide 23

Slide 23 text

Attributes language source cache exceptions casefold normalize

Slide 24

Slide 24 text

Methods stem($word) stem(@words) stem_in_place(\@words)

Slide 25

Slide 25 text

Methods languages languages($source) sources sources($lang) clear_cache

Slide 26

Slide 26 text

Lingua::Stem::UniNE::CS Czech Image by NuclearVacuum on Wikimedia Commons / CC BY-SA 3.0

Slide 27

Slide 27 text

Lingua::Stem::UniNE::CS Czech Bulgarian Lingua::Stem::UniNE::BG Image by NuclearVacuum on Wikimedia Commons / CC BY-SA 3.0

Slide 28

Slide 28 text

Lingua::Stem::UniNE::FA Persian Image by Mani1 on Wikimedia Commons / public domain

Slide 29

Slide 29 text

Lingua::Stem::Patch::EO Esperanto Image by Ionut Cojocaru on Wikimedia Commons / CC BY 3.0

Slide 30

Slide 30 text

Lingua::Stem::Patch::IO Ido Image by Ionut Cojocaru on Wikimedia Commons / CC BY 3.0

Slide 31

Slide 31 text

Lingua::Stem::TLH ?! Klingon?! Image by NASA and ESA / public domain

Slide 32

Slide 32 text

TODO pl Polish ar Arabic bn Bengali hi Hindi mr Marathi

Slide 33

Slide 33 text

Nick Patch @nickpatch Shutterstock