Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch & a bit of maths

Matteo
October 28, 2016

Elasticsearch & a bit of maths

Elasticsearch is a powerful tool with a great Symfony integration. This talk will explain how to deal with Elasticsearch and FOSElasticaBundle when the sorting becomes hard. A bit of maths and some statistics tricks will help you to sort things out, with some examples about geolocation and users’ reviews—and you’ll understand why sorting things by average rating is not always a good idea.

[http://2016.symfonyday.it]

Matteo

October 28, 2016
Tweet

More Decks by Matteo

Other Decks in Programming

Transcript

  1. No • Full text, TF/IDF • Vector space model •

    Analyzers • Schrödinger equation, cat, etc.
  2. Relevance sorting? Roero Arneis Pescaja 2015 92⁄100 12 € Mandrarossa

    Fiano Settesoli 2015 86⁄100 5 € Tavernello Chardonnay Caviro – 74⁄100 3 € Mirum La Monacesca 2013 96⁄100 21 € Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €
  3. Tavernello Chardonnay Caviro – 74⁄100 3 € Mandrarossa Fiano Settesoli

    2015 86⁄100 5 € Roero Arneis Pescaja 2015 92⁄100 12 € Mirum La Monacesca 2013 96⁄100 21 € Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €
  4. Tavernello Chardonnay Caviro – 74⁄100 3 € Mandrarossa Fiano Settesoli

    2015 86⁄100 5 € Roero Arneis Pescaja 2015 92⁄100 12 € Mirum La Monacesca 2013 96⁄100 21 € Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €
  5. Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €

    Mirum La Monacesca 2013 96⁄100 21 € Roero Arneis Pescaja 2015 92⁄100 12 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Tavernello Chardonnay Caviro – 74⁄100 3 €
  6. Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 €

    Mirum La Monacesca 2013 96⁄100 21 € Roero Arneis Pescaja 2015 92⁄100 12 € Mandrarossa Fiano Settesoli 2015 86⁄100 5 € Tavernello Chardonnay Caviro – 74⁄100 3 €
  7. Rilevanza • Vitigno • Annata • Alcol • Acidità •

    Dolcezza • Astringenza • Affinamento • Recensioni • doc, docg • …
  8. ???

  9. app/config/elasticsearch.yml wine: mappings: name: { type: string } producer: type:

    object properties: name: { type: string } location: { type: geo_point } price: { type: float } persistence: …
  10. http://localhost:9200/wines/wine/1 { "_index": "wines", "_type": "wine", "_id": "1", "_version": 1,

    "found": true, "_source": { "name": "Roero Arneis", "producer": "Pescaja", "price": 12.00, "rating": 92 },
  11. ???

  12. Score function • punteggio da 0 a 1: i ∈[0,

    1] • moltiplichiamo: ∏ i • i = fi (x)
  13. function_score query [ { "gauss": { "price": { "origin": 0,

    "offset": 10, "scale": 20 } } }, { "gauss": { "rating": { "origin": 100,
  14. Relevance sorting! Roero Arneis Pescaja 2015 92⁄100 12 € 0,993

    Mandrarossa Fiano Settesoli 2015 86⁄100 5 € 0,895 Mirum La Monacesca 2013 96⁄100 21 € 0,810 Tavernello Chardonnay Caviro – 74⁄100 3 € 0,169 Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 € 0,000
  15. WineController::listAction // use Elastica\Query\FunctionScore; // use Elastica\Query\Term; $query = new

    FunctionScore(); $query->addDecayFunction(…); $query->addWeightFunction( 2, // weight value new Term(["grape" => "verdicchio"]) );
  16. Mirum La Monacesca 2013 96⁄100 21 € 1,621 Roero Arneis

    Pescaja 2015 92⁄100 12 € 0,993 Mandrarossa Fiano Settesoli 2015 86⁄100 5 € 0,895 Tavernello Chardonnay Caviro – 74⁄100 3 € 0,169 Terlaner I Gran Cuveé Cantina Terlan 2013 97⁄100 180 € 0,000
  17. Roero Arneis 2015 Marchesi di Barolo Pescaja Roero Arneis 2015

    ★★★★★ ★★★★☆ ★★★★★ ★★★★★ ★★★★★ ★★★★☆ ★★★★★ ★★★★★ ★★★★★ 4,75 5,00
  18. Roero Arneis 2015 Cantina Canci Pescaja Roero Arneis 2015 ★★★★★

    ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ ★ˑˑˑˑ 5 × 1 = 5 1 × 6 = 6
  19. 0 5

  20. 0 5

  21. 0 5

  22. media intervallo di credibilità S(n1, . . . , nk)

    = K X k=1 sk nk + 1 N + K ± z↵/2 v u u u t 0 @ K X k=1 s2 k nk + 1 N + K ! K X k=1 sk nk + 1 N + K !2 1 A /(N + K + 1) media2
  23. Stats.php $votes = [ // vote value => vote count

    1 => 0, 2 => 1, 3 => 0, 4 => 2, 5 => 6, ];
  24. Stats.php $N = array_sum($votes); // tot num of votes $K

    = count($votes); // number of stars $z = 1.65; // 90% credibility $M = 0; $A = 0;
  25. Stats.php foreach ($votes as $value => $count) { $M +=

    $value*($count + 1)/($N + $K); $A += ($value**2)*($count + 1)/($N + $K); } $intervalWidth = 2*$z*sqrt( ($A - $M**2)/($N + $K + 1) ); $lowerBound = $M - $intervalWidth/2;
  26. By medicine life may be prolong’d, yet death will seize

    the doctor too. “ — Shakespeare, Cymbeline