Elasticsearch - c'est plus que ça en a l'air

Elasticsearch - c'est plus que ça en a l'air

Présentation de cas d'utilisation d'Elasticsearch, hors recherche "full-text", lors de Pytong 2 à Marseille — http://pytong.org.

7d1caf9df777b3b2cf474ff743494335?s=128

Jérémy Lecour

September 20, 2014
Tweet

Transcript

  1. 2.

    Base de données : distribuée orientée document avec ou sans

    schema non-relationnelle Multi-couches : Lucene système de distribution API de requête interface REST en JSON elasticsearch
  2. 3.

    { "indices.create": { "documentation": "http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index/", "methods": ["PUT", "POST"], "url": {

    "path": "/{index}", "paths": ["/{index}"], "parts": { "index": { "type" : "string", "required" : true, "description" : "The name of the index" } }, "params": { "timeout": { "type" : "time", "description" : "Explicit operation timeout" } } }, "body": { "description" : "The configuration for the index (`settings` and `mappings`)" } } } elasticsearch/elasticsearch/rest-api-spec
  3. 4.

    elasticsearch/elasticsearch-py Caractéristiques : transposition des types natifs Python de/vers JSON

    connexions persistentes load-balancing thread-safety architecture extensible gestion configurable du cluster …
  4. 5.

    Recherche "full-text" indexation et recherche correction orthographique mise en évidence

    des résultats analyse/tokenisation avancées elasticsearch 1. indexation d'enregistrement ou document externe 2. recherche dans Elasticsearch 3. récupération de l'original à son emplacement … mais encore ?
  5. 7.

    Fouille de données Quand : • awk, grep, cut, …

    sont inadaptés • Excel est une mouette malade • R est trop scientifique • on ne sait pas quoi chercher • on ne sait pas comment le chercher
  6. 8.

    90.31.45.64 - - [14/Sep/2014:00:59:58 +0200] "GET /goto?cid=hotelhotel&display_zone=hotel.best_price&result_amount=29.9&result_currency=EUR&result_status=available&url=http%3A%2F%2Fapi.example.com%2Fredirect%3Fcheckin_date%3D2014-09-27%26client%3Dhotelhotel%26currency%3DEUR%26hotel %3D5853%26language%3DFR%26nights%3D1%26partner%3Dbooking%26people%3D2%26room_code%3D7968901_82039220_0_2 HTTP/1.1" 200

    2904 "http://example.com/details?hotel_id=5853" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 54.247.188.179 - - [14/Sep/2014:00:59:58 +0200] "HEAD / HTTP/1.0" 200 616 "-" "NewRelicPinger/1.0 (6536)" 66.249.69.19 - - [14/Sep/2014:00:59:56 +0200] "GET /toulouse/hotel-toulouse-blagnac-az.html HTTP/1.1" 200 19935 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" 105.235.128.128 - - [14/Sep/2014:00:59:59 +0200] "GET /details?hotel_id=722506 HTTP/1.1" 200 13874 "-" "Mozilla/5.0 (Linux; U; Android 4.0.4; fr-fr; Q7A+ Build/IMM76I) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30" 66.249.69.35 - - [14/Sep/2014:01:00:01 +0200] "GET /torres-23/ HTTP/1.1" 500 664 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 31.170.8.141 - - [14/Sep/2014:01:00:01 +0200] "GET /server-status-64942?auto HTTP/1.1" 200 574 "-" "munin/2.0.6-4+deb7u2 (libwww-perl/6.04)" 31.170.8.141 - - [14/Sep/2014:01:00:04 +0200] "GET /server-status-64942?auto HTTP/1.1" 200 579 "-" "munin/2.0.6-4+deb7u2 (libwww-perl/6.04)" 24.58.100.150 - - [14/Sep/2014:01:00:07 +0200] "GET /bellano/?sort=distance HTTP/1.1" 200 22269 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" 24.58.100.67 - - [14/Sep/2014:01:00:08 +0200] "GET /serang/ HTTP/1.1" 200 20749 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" 66.249.69.9 - - [14/Sep/2014:01:00:09 +0200] "GET /nantes/ HTTP/1.1" 200 29757 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 88.167.200.78 - - [14/Sep/2014:01:00:09 +0200] "GET /details?id=316191 HTTP/1.1" 200 18284 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 193.201.45.74 - - [14/Sep/2014:01:00:10 +0200] "GET /redon/ HTTP/1.0" 200 104889 "http://example.com/redon/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/524.36 (KHTML, like Gecko) Chrome/30.0.1599.12785 YaBrowser/13.12.1599.12785 Safari/524.36" 31.170.8.141 - - [14/Sep/2014:01:00:14 +0200] "GET /server-status-64942?auto HTTP/1.1" 200 584 "-" "munin/2.0.6-4+deb7u2 (libwww-perl/6.04)" 88.167.200.78 - - [14/Sep/2014:01:00:14 +0200] "GET /hotels/316191/photos HTTP/1.1" 200 1682 "http://example.com/details?id=316191" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 88.167.200.78 - - [14/Sep/2014:01:00:14 +0200] "GET /hotels/316191/descriptions.js HTTP/1.1" 200 1981 "http://example.com/details?id=316191" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 66.249.69.19 - - [14/Sep/2014:01:00:14 +0200] "GET /kamnik/ HTTP/1.1" 200 20761 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 88.167.200.78 - - [14/Sep/2014:01:00:14 +0200] "GET /users/activities.js HTTP/1.1" 200 1678 "http://example.com/details?id=316191" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 65.54.247.240 - - [14/Sep/2014:01:00:14 +0200] "GET /comparateur-prix-hotel?page=3&search_id=a0dfea0d7714-20140705 HTTP/1.1" 200 15160 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X; en-us) AppleWebKit/534+ (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9524.53 MsnBot- Media /1.0b" 88.167.200.78 - - [14/Sep/2014:01:00:14 +0200] "GET /hotels/316191/availabilities.js?utf8=%E2%9C%93&reserver%5Bhotel_id%5D=316191&reserver%5Bcheckin_date%5D=27%2F09%2F2014&reserver%5Bnights%5D=1&reserver%5Bpeople%5D=2 HTTP/1.1" 200 4848 "http://example.com/details?id=316191" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 65.54.247.240 - - [14/Sep/2014:01:00:17 +0200] "GET /users/activities.js HTTP/1.1" 204 625 "http://example.com/comparateur-prix-hotel?page=3&search_id=a0dfea0d7714-20140705" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X; en-us) AppleWebKit/534+ (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9524.53 MsnBot-Media /1.0b" 66.249.93.51 - - [14/Sep/2014:01:00:17 +0200] "GET /charleville-mezieres.kml HTTP/1.1" 200 5423 "-" "Kml-Google; (+http://code.google.com/apis/kml), gzip" 65.54.247.240 - - [14/Sep/2014:01:00:18 +0200] "GET /users/autologin.js HTTP/1.1" 200 892 "http://example.com/comparateur-prix-hotel?page=3&search_id=a0dfea0d7714-20140705" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X; en-us) AppleWebKit/534+ (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9524.53 MsnBot-Media /1.0b" 90.31.45.64 - - [14/Sep/2014:01:00:19 +0200] "GET /users/autologin.js HTTP/1.1" 304 337 "http://example.com/lourdes/hotel-lourdes-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 66.249.69.9 - - [14/Sep/2014:01:00:19 +0200] "GET /coruche/ HTTP/1.1" 200 20561 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.84.115 - - [14/Sep/2014:01:00:18 +0200] "GET /comparateur-prix-hotel.kml?search%5Bsearchable_term%5D=&search%5Bsearchable_type%5D= HTTP/1.1" 200 619741 "-" "Kml-Google; (+http://code.google.com/apis/kml), gzip" 88.167.200.78 - - [14/Sep/2014:01:00:23 +0200] "GET /favicon.ico HTTP/1.1" 200 1790 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 88.167.200.78 - - [14/Sep/2014:01:00:26 +0200] "GET /users/autologin.js HTTP/1.1" 200 950 "http://example.com/details?id=316191" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 69.171.45.115 - - [14/Sep/2014:01:00:26 +0200] "GET /details?hotel_id=316191 HTTP/1.1" 206 18309 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 69.171.45.118 - - [14/Sep/2014:01:00:27 +0200] "GET /details?hotel_id=316191&fb_locale=fr_FR HTTP/1.1" 206 18309 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 66.249.69.35 - - [14/Sep/2014:01:00:28 +0200] "GET /zory/ HTTP/1.1" 200 20130 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 92.221.141.12 - - [14/Sep/2014:01:00:28 +0200] "GET /hotels/265596/availabilities.js?utf8=%E2%9C%93&reserver%5Bhotel_id%5D=265596&reserver%5Bcheckin_date%5D=14%2F09%2F2014&reserver%5Bnights%5D=4&reserver%5Bpeople%5D=1 HTTP/1.1" 200 2805 "http://example.com/details?hotel_id=265596" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" 66.249.69.35 - - [14/Sep/2014:01:00:33 +0200] "GET /la-crau/ HTTP/1.1" 200 20038 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 86.202.70.135 - - [14/Sep/2014:01:00:33 +0200] "GET /users/autologin.js HTTP/1.1" 304 337 "http://example.com/clermont-ferrand/hotel-Clermont-ferrand-pas-cher-az.html" "Mozilla/5.0 (Linux; Android 4.4.2; D2303 Build/18.3.C.0.37) AppleWebKit/524.36 (KHTML, like Gecko) Chrome/24.0.2062.117 Mobile Safari/524.36" 93.0.232.94 - - [14/Sep/2014:01:00:34 +0200] "GET / HTTP/1.1" 200 84935 "-" "Echoping/6.0.2" 157.55.39.234 - - [14/Sep/2014:01:00:34 +0200] "GET /melbourne-fl/ HTTP/1.1" 200 22876 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 93.0.232.94 - - [14/Sep/2014:01:00:36 +0200] "GET / HTTP/1.1" 200 84934 "-" "Echoping/6.0.2" 81.251.101.164 - - [14/Sep/2014:01:00:39 +0200] "GET /disneyland-paris/hotel-disneyland-paris-pas-cher-az.html HTTP/1.1" 200 98790 "http://www.google.fr/imgres?imgurl=http%3A%2F%2Fs3.example.com%2Fmedias%2F000%2F000%2F521%2Fdisney_pas_cher-big.jpg%253F1352727975&imgrefurl=http%3A%2F %2Fexample.com%2Fdisneyland-paris%2Fhotel-disneyland-paris-pas-cher-az.html&h=250&w=630&tbnid=qF3Y0wPeyeW7tM%3A&zoom=1&docid=RGXzA6TWMY8jPM&ei=jMwUVPvCD86UatGdgiA&tbm=isch&iact=rc&uact=3&dur=2234&page=1&start=0&ndsp=12&ved=0CD0QrQMwCQ" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/ 524.36 (KHTML, like Gecko) Chrome/24.0.2062.120 Safari/524.36" 81.251.101.164 - - [14/Sep/2014:01:00:41 +0200] "GET /disneyland-paris/hotel-disneyland-paris-pas-cher-az.html HTTP/1.1" 200 98680 "http://www.google.fr/imgres?imgurl=http%3A%2F%2Fs3.example.com%2Fmedias%2F000%2F000%2F521%2Fdisney_pas_cher-big.jpg%253F1352727975&imgrefurl=http%3A%2F %2Fexample.com%2Fdisneyland-paris%2Fhotel-disneyland-paris-pas-cher-az.html&h=250&w=630&tbnid=qF3Y0wPeyeW7tM%3A&zoom=1&docid=RGXzA6TWMY8jPM&ei=jMwUVPvCD86UatGdgiA&tbm=isch&iact=rc&uact=3&dur=2234&page=1&start=0&ndsp=12&ved=0CD0QrQMwCQ" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/ 524.36 (KHTML, like Gecko) Chrome/24.0.2062.120 Safari/524.36" 81.251.101.164 - - [14/Sep/2014:01:00:41 +0200] "GET /date_images/1401451200-normal-10.png HTTP/1.1" 200 1349 "http://example.com/disneyland-paris/hotel-disneyland-paris-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/524.36 (KHTML, like Gecko) Chrome/24.0.2062.120 Safari/524.36" 66.249.69.9 - - [14/Sep/2014:01:00:42 +0200] "GET /sevenoaks/ HTTP/1.1" 200 23205 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 81.251.101.164 - - [14/Sep/2014:01:00:43 +0200] "GET /users/activities.js HTTP/1.1" 200 1537 "http://example.com/disneyland-paris/hotel-disneyland-paris-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/524.36 (KHTML, like Gecko) Chrome/24.0.2062.120 Safari/524.36" 90.31.45.64 - - [14/Sep/2014:01:00:44 +0200] "GET /details?hotel_id=334620 HTTP/1.1" 200 19392 "http://example.com/lourdes/hotel-lourdes-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 41.111.59.99 - - [14/Sep/2014:01:00:45 +0200] "GET /marrakech/hotel-marrakech-pas-cher-az.html HTTP/1.1" 200 22218 "http://www.google.dz/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CDoQFjAA&url=http%3A%2F%2Fexample.com%2Fmarrakech%2Fhotel-marrakech-pas-cher- az.html&ei=k8wUVNugDayO7QaP44HwBQ&usg=AFQjCNH5uZDUHAHCcLJEcsNaaWzuidFuwA&bvm=bv.75097201,d.ZGU" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" 90.31.45.64 - - [14/Sep/2014:01:00:46 +0200] "GET /hotels/334620/descriptions.js HTTP/1.1" 200 1477 "http://example.com/details?hotel_id=334620" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 90.31.45.64 - - [14/Sep/2014:01:00:46 +0200] "GET /hotels/334620/photos HTTP/1.1" 200 1353 "http://example.com/details?hotel_id=334620" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 90.31.45.64 - - [14/Sep/2014:01:00:46 +0200] "GET /users/activities.js HTTP/1.1" 200 1916 "http://example.com/details?hotel_id=334620" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 90.31.45.64 - - [14/Sep/2014:01:00:46 +0200] "GET /hotels/334620/availabilities.js?utf8=%E2%9C%93&reserver%5Bhotel_id%5D=334620&reserver%5Bcheckin_date%5D=27%2F09%2F2014&reserver%5Bnights%5D=1&reserver%5Bpeople%5D=2 HTTP/1.1" 200 2796 "http://example.com/details?hotel_id=334620" "Mozilla/ 5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 90.31.45.64 - - [14/Sep/2014:01:00:49 +0200] "GET /users/autologin.js HTTP/1.1" 304 336 "http://example.com/details?hotel_id=334620" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" 66.249.84.115 - - [14/Sep/2014:01:00:49 +0200] "GET /comparateur-prix-hotel.kml?search%5Bdestination_id%5D=34085 HTTP/1.1" 200 11090 "-" "Kml-Google; (+http://code.google.com/apis/kml), gzip" 41.111.59.99 - - [14/Sep/2014:01:00:50 +0200] "GET /date_images/1337673600-normal-10.png HTTP/1.1" 200 1011 "http://example.com/marrakech/hotel-marrakech-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" 66.220.158.112 - - [14/Sep/2014:01:00:50 +0200] "GET /details?hotel_id=334620 HTTP/1.1" 206 19520 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 41.111.59.99 - - [14/Sep/2014:01:00:50 +0200] "GET /users/activities.js HTTP/1.1" 200 1667 "http://example.com/marrakech/hotel-marrakech-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" 66.220.158.112 - - [14/Sep/2014:01:00:50 +0200] "GET /details?hotel_id=334620&fb_locale=fr_FR HTTP/1.1" 206 19519 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 41.111.59.99 - - [14/Sep/2014:01:00:57 +0200] "GET /favicon.ico HTTP/1.1" 200 1790 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" 66.249.69.9 - - [14/Sep/2014:01:00:57 +0200] "GET /hathersage/ HTTP/1.1" 200 21014 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 41.111.59.99 - - [14/Sep/2014:01:00:57 +0200] "GET /users/autologin.js HTTP/1.1" 200 950 "http://example.com/marrakech/hotel-marrakech-pas-cher-az.html" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0" des logs de serveur web
  7. 9.

    { "hits" : { "total" : 7319546, "hits" : [

    ] }, "facets" : { "0" : { "_type" : "date_histogram", "entries" : [ { "time" : 1410807480000, "count" : 353}, { "time" : 1410807510000, "count" : 5385}, { "time" : 1410807540000, "count" : 2113}, { "time" : 1410807570000, "count" : 734}, { "time" : 1410807600000, "count" : 157}, { "time" : 1410807630000, "count" : 52}, { "time" : 1410807660000, "count" : 267}, { "time" : 1410807690000, "count" : 50}, { "time" : 1410807720000, "count" : 107}, { "time" : 1410807750000, "count" : 3055}, { "time" : 1410807780000, "count" : 2668}, { "time" : 1410807810000, "count" : 2197}, { "time" : 1410807840000, "count" : 1110} ] } } } facettes et agrégations
  8. 11.

    1. injecter les données dans Elasticsearch (sans trop de design)

    2. brancher Kibana et un histogramme simple 3. observer les tendances et affiner les graphs 4. re-indexer les données avec un schéma plus précis 5. rincer 6. recommencer dans la pratique
  9. 12.

    Indexation éphémère et recherche en flux tendu 1. index commun

    : données génériques 2. copie pour une recherche spécifique 3. ajout des données de dispo/prix 4. beaucoup de données créées en peu de temps 5. disponibilité immédiate (< 1 sec) 6. recherche intensive pendant quelques minutes 7. données quasi inutiles 24h plus tard
  10. 13.
  11. 15.

    Conclusion des outils très performants et économes en resources beaucoup

    de fonctionnalités mais pas de lourdeur robustesse et facilité de prise en main grande ouverture technique et d'esprit