Slide 1

Slide 1 text

Elasticsearch & a bit of maths Symfony Day 2016 (Roma)

Slide 2

Slide 2 text

Matteo Dora github.com/mattbit

Slide 3

Slide 3 text

WARNING contains maths!

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Yes • Elasticsearch & Symfony • Score query • Geolocation • Rating

Slide 6

Slide 6 text

No • Full text, TF/IDF • Vector space model • Analyzers • Schrödinger equation, cat, etc.

Slide 7

Slide 7 text

Relevance sorting? Roero Arneis Pescaja 2015 92⁄100 12 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Tavernello Chardonnay Caviro – 74⁄100 3 € Mirum La Monacesca 2013 96⁄100 21 € Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €

Slide 8

Slide 8 text

Per prezzo… NAIVE SORTING

Slide 9

Slide 9 text

Tavernello Chardonnay Caviro – 74⁄100 3 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Roero Arneis Pescaja 2015 92⁄100 12 € Mirum La Monacesca 2013 96⁄100 21 € Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €

Slide 10

Slide 10 text

Tavernello Chardonnay Caviro – 74⁄100 3 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Roero Arneis Pescaja 2015 92⁄100 12 € Mirum La Monacesca 2013 96⁄100 21 € Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €

Slide 11

Slide 11 text

Non proprio la mia prima scelta…

Slide 12

Slide 12 text

Secondo la critica… NAIVE SORTING

Slide 13

Slide 13 text

Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 € Mirum La Monacesca 2013 96⁄100 21 € Roero Arneis Pescaja 2015 92⁄100 12 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Tavernello Chardonnay Caviro – 74⁄100 3 €

Slide 14

Slide 14 text

Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 € Mirum La Monacesca 2013 96⁄100 21 € Roero Arneis Pescaja 2015 92⁄100 12 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Tavernello Chardonnay Caviro – 74⁄100 3 €

Slide 15

Slide 15 text

Ci dovrei pensare…

Slide 16

Slide 16 text

Rilevanza • Vitigno • Annata • Alcol • Acidità • Dolcezza • Astringenza • Affinamento • Recensioni • doc, docg • …

Slide 17

Slide 17 text

???

Slide 18

Slide 18 text

Friends, Romans, countrymen,
 lend me your ears! “ — Shakespeare, Julius Caesar

Slide 19

Slide 19 text

$ brew install elasticsearch Elasticsearch

Slide 20

Slide 20 text

$ apt-get install elasticsearch Elasticsearch

Slide 21

Slide 21 text

$ composer require \
 friendsofsymfony/elastica-bundle Fos Elastica Bundle

Slide 22

Slide 22 text

app/AppKernel.php … new FOS\ElasticaBundle\FOSElasticaBundle(), …

Slide 23

Slide 23 text

app/config/elasticsearch.yml # include in config.yml! fos_elastica: clients: default: { host: localhost, port: 9200 } indexes: wines: types: …

Slide 24

Slide 24 text

app/config/elasticsearch.yml wine: mappings: name: { type: string } producer: type: object properties: name: { type: string } location: { type: geo_point } price: { type: float } persistence: …

Slide 25

Slide 25 text

app/config/elasticsearch.yml wine: mappings: … persistence: driver: orm model: AppBundle\Entity\Wine

Slide 26

Slide 26 text

$ bin/console fos:elastica:populate Populate!

Slide 27

Slide 27 text

http://localhost:9200/wines/wine/1 { "_index": "wines", "_type": "wine", "_id": "1", "_version": 1, "found": true, "_source": { "name": "Roero Arneis", "producer": "Pescaja", "price": 12.00, "rating": 92 },

Slide 28

Slide 28 text

Fatto!

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

???

Slide 31

Slide 31 text

What’s past is prologue. “ — Shakespeare, The Tempest

Slide 32

Slide 32 text

Score function Assegnamo un punteggio a ogni caratteristica, poi combiniamo in un unico punteggio totale.

Slide 33

Slide 33 text

Score function • punteggio da 0 a 1: i ∈[0, 1] • moltiplichiamo: ∏ i • i = fi (x)

Slide 34

Slide 34 text

Exempli gratia

Slide 35

Slide 35 text

0 0,25 0,5 0,75 1 Prezzo 0 50 € 100 € 150 €

Slide 36

Slide 36 text

0 0,25 0,5 0,75 1 Voto 0 20 40 60 80 100

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

f ( x ) = e x 2 2 2

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

Elasticsearch!

Slide 42

Slide 42 text

function_score query

Slide 43

Slide 43 text

Decay functions • Linear (linear) • Exponential (exp) • Gaussian (gauss)

Slide 44

Slide 44 text

⚫︎ linear ⚫︎ exponential ⚫︎ gaussian

Slide 45

Slide 45 text

function_score query { "query": { "function_score": { "functions": [ … ] } } }

Slide 46

Slide 46 text

function_score query [ { "gauss": { "price": { "origin": 0, "offset": 10, "scale": 20 } } }, { "gauss": { "rating": { "origin": 100,

Slide 47

Slide 47 text

E Symfony?

Slide 48

Slide 48 text

WineController::listAction // use Elastica\Query\FunctionScore; $query = new FunctionScore(); $query->addDecayFunction( FunctionScore::DECAY_GAUSS, "price", // field name 0, // origin 20, // scale 10 // offset );

Slide 49

Slide 49 text

WineController::listAction $query->addDecayFunction( FunctionScore::DECAY_GAUSS, "rating", // field name 100, // origin 10, // scale 10 // offset );

Slide 50

Slide 50 text

WineController::listAction $finder = $this->get( 'fos_elastica.finder.wines.wine' ); $wines = $finder->find($query);

Slide 51

Slide 51 text

and…

Slide 52

Slide 52 text

Relevance sorting! Roero Arneis Pescaja 2015 92⁄100 12 € 0,993 Mandrarossa Fiano Settesoli 2015 86⁄100 5 € 0,895 Mirum La Monacesca 2013 96⁄100 21 € 0,810 Tavernello Chardonnay Caviro – 74⁄100 3 € 0,169 Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 € 0,000

Slide 53

Slide 53 text

WineController::listAction // use Elastica\Query\FunctionScore; // use Elastica\Query\Term; $query = new FunctionScore(); $query->addDecayFunction(…); $query->addWeightFunction( 2, // weight value new Term(["grape" => "verdicchio"]) );

Slide 54

Slide 54 text

Mirum La Monacesca 2013 96⁄100 21 € 1,621 Roero Arneis Pescaja 2015 92⁄100 12 € 0,993 Mandrarossa Fiano Settesoli 2015 86⁄100 5 € 0,895 Tavernello Chardonnay Caviro – 74⁄100 3 € 0,169 Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 € 0,000

Slide 55

Slide 55 text

Geolocation INTERMEZZO

Slide 56

Slide 56 text

Winebar.php public function getLocation() { return "{$this->lat}, {$this->lon}"; }

Slide 57

Slide 57 text

elasticsearch.yml winebar: mappings: location: { type: geo_point } […]

Slide 58

Slide 58 text

WinebarController::listAction // use Elastica\Query\FunctionScore; $query = new FunctionScore(); $query->addDecayFunction( FunctionScore::DECAY_GAUSS, "location", // field name "41.849872, 12.574170", // origin "5km", // scale "1km" // offset );

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

And now for something completely different. “ — Monty Python

Slide 61

Slide 61 text

Recensioni degli utenti • ★★★☆☆, ★★★★★ • (★★★☆☆+★★★★★)÷2 = ★★★★☆ • Giusto? [domanda retorica]

Slide 62

Slide 62 text

Esempio

Slide 63

Slide 63 text

Roero Arneis 2015 Marchesi di Barolo Pescaja Roero Arneis 2015 ★★★★★ ★★★★☆ ★★★★★ ★★★★★ ★★★★★ ★★★★☆ ★★★★★ ★★★★★ ★★★★★ 4,75 5,00

Slide 64

Slide 64 text

Idee?

Slide 65

Slide 65 text

⋆ ×

Slide 66

Slide 66 text

Roero Arneis 2015 Cantina Canci Pescaja Roero Arneis 2015 ★★★★★ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ 5 × 1 = 5 1 × 6 = 6

Slide 67

Slide 67 text

Altre idee?

Slide 68

Slide 68 text

Inferenza bayesiana Diamo una stima del voto medio sulla base dei dati a disposizione.

Slide 69

Slide 69 text

Intervallo di credibilità ★ ★ ★ ★ ★ 0 5 3,7 4,2

Slide 70

Slide 70 text

3 1 30 10

Slide 71

Slide 71 text

0 5

Slide 72

Slide 72 text

0 5

Slide 73

Slide 73 text

0 5

Slide 74

Slide 74 text

Come si fa?

Slide 75

Slide 75 text

Ranking items with star ratings evanmiller.org/ranking-items-with-star-ratings.html by Evan Miller

Slide 76

Slide 76 text

WARNING take cover! ∑xi

Slide 77

Slide 77 text

media intervallo di credibilità S(n1, . . . , nk) = K X k=1 sk nk + 1 N + K ± z↵/2 v u u u t 0 @ K X k=1 s2 k nk + 1 N + K ! K X k=1 sk nk + 1 N + K !2 1 A /(N + K + 1) media2

Slide 78

Slide 78 text

Stats.php $votes = [ // vote value => vote count 1 => 0, 2 => 1, 3 => 0, 4 => 2, 5 => 6, ];

Slide 79

Slide 79 text

Stats.php $N = array_sum($votes); // tot num of votes $K = count($votes); // number of stars $z = 1.65; // 90% credibility $M = 0; $A = 0;

Slide 80

Slide 80 text

Stats.php foreach ($votes as $value => $count) { $M += $value*($count + 1)/($N + $K); $A += ($value**2)*($count + 1)/($N + $K); } $intervalWidth = 2*$z*sqrt( ($A - $M**2)/($N + $K + 1) ); $lowerBound = $M - $intervalWidth/2;

Slide 81

Slide 81 text

Bonus

Slide 82

Slide 82 text

By medicine life may be prolong’d, yet death will seize the doctor too. “ — Shakespeare, Cymbeline

Slide 83

Slide 83 text

No content

Slide 84

Slide 84 text

Fabio Giannese Andrés Vasquez Massimo Chiarillo Eugenio Canciello Oreste di Modugno Grazie.