Slide 1

Slide 1 text

#elasticsearch

Slide 2

Slide 2 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 2 Who ? $ curl http://localhost:9200/talk/speaker/dpilato { "nom" : "David Pilato", "jobs" : [ { "boite" : "SRA Europe (SSII)", "mission" : "bon à tout faire", "date" : "1995" }, { "boite" : "SFR", "mission" : "touche à tout", "date" : "1997" }, { "boite" : "e-Brands / Vivendi", "mission" : "chef de projets", "date": "2000" }, { "boite" : "DGDDI (douane)", "mission" : "mouton à 5 pattes", "date" : "2005" }, { "boite" : "IDEO Technologies", "mission" : "directeur technique", "date" : "2012" }, { "boite" : "elastic", "mission" : "développeur", "date" : "2013" } ], "passions" : [ "famille", "job", "deejay" ], "blog" : "http://dev.david.pilato.fr/", "twitter" : [ "@dadoonet", "@elasticfr", "@scrutmydocs" ], "email" : "[email protected]" }

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 4 elastic.co • Créée en 2012 par les auteurs • Formation • Support de développement • Support de production • Marvel • Shield • Watcher • Found by elastic

Slide 5

Slide 5 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 5 Old school search Cherche moi un document 
 de décembre 2011 portant sur la france et contenant produit et david En SQL : SELECT doc.*, pays.* FROM doc, pays WHERE doc.pays_code = pays.code AND doc.date_doc > to_date('2011-12', 'yyyy-mm') AND doc.date_doc < to_date('2012-01', 'yyyy-mm') AND lower(pays.libelle) = 'france' AND lower(doc.commentaire) LIKE ‘%produit%' AND lower(doc.commentaire) LIKE ‘%david%';

Slide 6

Slide 6 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 6 Graphical User Interface

Slide 7

Slide 7 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 7 Search engine? • Moteur d'indexation de documents • Moteur de recherche dans les index

Slide 8

Slide 8 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 8 elasticsearch • NoSQL orienté document • Apache Lucene • HTTP / REST / JSON • Distribué, Scalable, Cloud ready • Apache2 License • Simple: start in 5 minutes 30 seconds • Efficace: just start new nodes! • Puissant: some ms! • Complet: built-in + plugins

Slide 9

Slide 9 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 9 Think document! • Document : Un objet représentant les données (au sens NoSQL).
 Penser "recherche", c'est oublier le SGBDR et penser "Documents" { "text": "Bienvenue à la conférence #elasticsearch pour #JUG", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "JUG", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.\r\nAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.\r\nDeeJay 4 times a year, just for fun !" } } • Type : Regroupe des documents de même type • Index : Espace logique de stockage des documents dont les types sont fonctionnellement communs

Slide 10

Slide 10 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 10 Index { "_index":"twitter", "_type":"tweet", "_id":"1" } $ curl -XPUT localhost:9200/twitter/tweet/1 -d ' { "text": "Bienvenue à la conférence #elasticsearch pour #JUG", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "JUG", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.\r\nAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.\r\nDeeJay 4 times a year, just for fun !" } }'

Slide 11

Slide 11 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 11 search $ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue à la conférence #elasticsearch pour #JUG", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", […] } } ] } } Nb de documents Coordonnées Pertinence Document source

Slide 12

Slide 12 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 12 advanced search (Query DSL) $ curl localhost:9200/twitter/tweet/_search -d ’{
 "query" : { "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "must_not" : { "range" : { "age" : { "from" : 10, "to" : 20 } } }, "should" : [ { "term" : { "tag" : "wow" } },{ "match" : { "tag" : "elasticsearch is cool" } } ] } } }’

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 15 La puissance des agrégations (aka facettes) Make sense of your (BIG) data! (Et en temps quasi réel, s’il vous plait !) Compute

Slide 16

Slide 16 text

Tweets ID Username Date Hashtag 1 dadoonet 2012-04-18 1 2 talk 2012-04-18 5 3 elasticsearch 2012-04-18 2 4 dadoonet 2012-04-18 2 5 talk 2012-04-18 6 6 elasticsearch 2012-04-19 3 7 dadoonet 2012-04-19 3 8 talk 2012-04-19 7 9 elasticsearch 2012-04-20 4

Slide 17

Slide 17 text

Terms D Username Date Hashtag 1 dadoonet 2012-04-18 1 2 talk 2012-04-18 5 3 elasticsearch 2012-04-18 2 4 dadoonet 2012-04-18 2 5 talk 2012-04-18 6 6 elasticsearch 2012-04-19 3 7 dadoonet 2012-04-19 3 8 talk 2012-04-19 7 9 elasticsearch 2012-04-20 4 Username Count dadoonet 3 talk 3 elasticsearch 3

Slide 18

Slide 18 text

Terms D Username Date Hashtag 1 dadoonet 2012-04-18 1 2 talk 2012-04-18 5 3 elasticsearch 2012-04-18 2 4 dadoonet 2012-04-18 2 5 talk 2012-04-18 6 6 elasticsearch 2012-04-19 3 7 dadoonet 2012-04-19 3 8 talk 2012-04-19 7 9 elasticsearch 2012-04-20 4 "aggregations" : { "users" : { "terms" : {"field" : "username"} } } "aggregations" : { "users" : { "buckets" : [ { "key" : "dadoonet", "doc_count" : 3 }, { "key" : "talk", "doc_count" : 3 }, { "key" : "elasticsearch", "doc_count" : 3 } ] } }

Slide 19

Slide 19 text

Date Histogram e Date Hashtag 2012-04-18 1 2012-04-18 5 ch 2012-04-18 2 2012-04-18 2 2012-04-18 6 ch 2012-04-19 3 2012-04-19 3 2012-04-19 7 ch 2012-04-20 4 Per month Date Count 2012-04 9 Per day Date Count 2012-04-18 5 2012-04-19 3 2012-04-20 1

Slide 20

Slide 20 text

Date Histogram e Date Hashtag 2012-04-18 1 2012-04-18 5 ch 2012-04-18 2 2012-04-18 2 2012-04-18 6 ch 2012-04-19 3 2012-04-19 3 2012-04-19 7 ch 2012-04-20 4 "aggregations" : { "perday" : { "date_histogram" : { "field" : "date", "interval" : "day",
 "format" : "yyyy-MM-dd" } } } "aggregations" : { "perday" : [ { "key_as_string": "2012-04-18", "key": 1334700000000, "doc_count": 5 }, { "key_as_string": "2012-04-19", "key": 1334786400000, "doc_count": 3 }, { "key_as_string": "2012-04-20", "key": 1334872800000, "doc_count": 1 } ] }

Slide 21

Slide 21 text

Range + Stats Hashtag 18 1 18 5 18 2 18 2 18 6 19 3 19 3 19 7 20 4 Hashtag Count x < 3 3 3 <= x < 5 3 x >= 5 3 Min Max Moy Total 1 2 1.67 5 3 4 3.33 10 5 7 6 18

Slide 22

Slide 22 text

Range + Stats Hashtag 18 1 18 5 18 2 18 2 18 6 19 3 19 3 19 7 20 4 "aggregations" : { "hashtags" : { "range" : { "field" : "hashtag", "ranges" : [ { "to" : 3 }, { "from" : 3, "to" : 5 }, { "from" : 5 } ] }, "aggregations" : { "hashtag_stats" : { "stats" : { "field" : "hashtag" } } } } } "aggregations" : { "hashtags" :[ { "to": 3, "doc_count": 3, "hashtag_stats" : { "min": 1, "max": 2,"sum": 5, "mean": 1.667 } }, { "from":3, "to" : 5, "doc_count": 3, "hashtag_stats" : { "min": 3, "max": 4, "sum": 10, "mean": 3.333 } },{ "from":5, "doc_count": 3, "hashtag_stats" : { "min": 5, "max": 7, "sum": 18, "mean": 6 } } ] }

Slide 23

Slide 23 text

Site marchand Range Terms Terms Range

Slide 24

Slide 24 text

Analyse temps-réel Terms Date histogram

Slide 25

Slide 25 text

Facettes Cartographiques

Slide 26

Slide 26 text

Reprenons notre formulaire Recherche Full Text

Slide 27

Slide 27 text

Reprenons notre formulaire

Slide 28

Slide 28 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 28 http://onemilliontweetmap.com/ Make sense of your (BIG) data Demo time!

Slide 29

Slide 29 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 29 aka Inverted Search Percolation

Slide 30

Slide 30 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 30 Index a search request $ curl -XPOST localhost:9200/twitter/.percolator/dadoonet -d ’{ "query" : { "term" : { "user.screen_name" : "dadoonet" } } }’ $ curl -XPOST localhost:9200/twitter/.percolator/elasticsearch -d ’{ "query" : { "match" : { "hashtag.text" : "elasticsearch" } } }’ $ curl -XPOST localhost:9200/twitter/.percolator/mycomplexquery -d ’{ "query" : { "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "must_not" : { "range" : { "age" : { "from" : 10, "to" : 20 } } } } } }’

Slide 31

Slide 31 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 31 Execute a document $ curl localhost:9200/twitter/tweet/_percolate -d ‘{ "doc": { "text": "Bienvenue à la conférence #elasticsearch pour #JUG", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "JUG", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.\r\nAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.\r\nDeeJay 4 times a year, just for fun !" } } }' { "took" : 19, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "total" : 2, "matches" : [ { "_index" : "twitter", "_id" : "dadoonet" }, { "_index" : "twitter", "_id" : "elasticsearch" } ] }

Slide 32

Slide 32 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 32 Analysis and Mapping Should we index everything?

Slide 33

Slide 33 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 33 Standard analyzer $ curl -XPOST 'localhost:9200/test/_analyze?analyzer=standard&pretty=1' -d 'The quick brown fox jumps over the lazy Dog' { "tokens" : [ { "token" : "quick", "start_offset": 4, "end_offset": 9, "type": "", "position": 2 }, { "token" : "brown", "start_offset": 10, "end_offset": 15, "type": "", "position": 3 }, { "token" : "fox", "start_offset": 16, "end_offset": 19, "type": "", "position": 4 }, { "token": "jumps", "start_offset": 20, "end_offset": 26, "type": "", "position": 5 }, { "token": "over", "start_offset": 27, "end_offset": 31, "type": "", "position": 6 }, { "token" : "lazy", "start_offset": 36, "end_offset": 40, "type": "", "position": 8 }, { "token" : "dog", "start_offset": 41, "end_offset": 44, "type": "", "position": 9 } ] }

Slide 34

Slide 34 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 34 Whitespace analyzer $ curl -XPOST 'localhost:9200/test/_analyze?analyzer=whitespace&pretty=1' -d 'The quick brown fox jumps over the lazy Dog' { "tokens" : [ { "token" : "The", ... }, { "token" : "quick", ... }, { "token" : "brown", ... }, { "token" : "fox", ... }, { "token" : "jumps", ... }, { "token" : "over", ... }, { "token" : "the", ... }, { "token" : "lazy", ... }, { "token" : "Dog", ... } ] }

Slide 35

Slide 35 text

Analyzer ?

Slide 36

Slide 36 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 36 Tokenizer / Token filters • whitespace "the dog!" -> "the", "dog!" • standard "the dog!" -> "the", "dog" • asciifolding éléphant -> elephant • stemmer french elephants -> "eleph" prenez -> "prendre" • stopword french (le, la, un, une, être, avoir, …) • ngram ou edge ngram eleph -> ["el","ele","elep","eleph"]

Slide 37

Slide 37 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 37 Custom analyzer "analysis":{ "analyzer":{ "francais":{ "type":"custom", "tokenizer":"standard", "filter":["lowercase", "stop_francais", "fr_stemmer", "asciifolding", "elision"] } }, "filter":{ "stop_francais":{ "type":"stop", "stopwords":["_french_", "twitter"] }, "fr_stemmer" : { "type" : "stemmer", "name" : "french" }, "elision" : { "type" : "elision", "articles" : ["l", "m", "t", "qu", "n", "s", "j", "d"] } } }

Slide 38

Slide 38 text

www.elastic.co Copyright Elastic 2015 Copying, publishing and/or distributing without written permission is strictly prohibited 38 Define your mapping! "type1" : { "properties" : { "text1" : { "type" : "string", "analyzer" : "francais" }, "text2" : { "type" : "string", "index_analyzer" : "ngram", "search_analyzer" : "simple" }, "text3" : { "type" : "string", "analyzer" : "francais", "fields" : { "ngram" : { "type" : "string", "analyzer" : "ngram" }, "facet" : { "type" : "string", "index" : "not_analyzed" } } } } }

Slide 39

Slide 39 text

Users and community

Slide 40

Slide 40 text

Users

Slide 41

Slide 41 text

FR users

Slide 42

Slide 42 text

elasticfr @elasticfr discuss.elastic.co