Slide 1

Slide 1 text

elasticsearchͱSolrͷൺֱ ݉ࢁ ݩଠ @penguinana_ Monday, November 26, 12

Slide 2

Slide 2 text

ࣗݾ঺հ • ݉ࢁ ݩଠ @penguinana_ • ϨγϐݕࡧνʔϜ @ http://cookpad.com/ • Solr4.0 Monday, November 26, 12

Slide 3

Slide 3 text

SolrͷόʔδϣϯΞοϓΛ ݕ౼͍ͯ͠Δͱ͖... Monday, November 26, 12

Slide 4

Slide 4 text

Elasticsearch΋ ௐ΂ͨ΄͏͕͍͍ͷͰ͸ʁ Monday, November 26, 12

Slide 5

Slide 5 text

• Luceneϕʔε • HTTP API • ෼ࢄݕࡧOK • ೔ຊޠOK Monday, November 26, 12

Slide 6

Slide 6 text

• Luceneϕʔε • HTTP API • ෼ࢄݕࡧOK • ೔ຊޠOK طࢹײ Monday, November 26, 12

Slide 7

Slide 7 text

http://solr-vs-elasticsearch.com/ Monday, November 26, 12

Slide 8

Slide 8 text

ײ૝ • ػೳ໘Ͱෆ଍͸ͳ͍ • API͕։ൃऀʹ΍͍͞͠ • ༰қʹशಘͰ͖Δ • େن໛෼ࢄݕࡧҎ֎Ͱ΋༗༻ • SolrΛ࢖ͬͯͳ͚Ε͹ͬͪ͜Λຊ൪ʹ࢖͍ͨ ͍ʂ Monday, November 26, 12

Slide 9

Slide 9 text

αϯϓϧΛ࢖ͬͯ ͻͱ௨Γઆ໌͠·͢ Monday, November 26, 12

Slide 10

Slide 10 text

http://blog.livedoor.jp/techblog/archives/65836960.html Monday, November 26, 12

Slide 11

Slide 11 text

livedoorάϧϝ • Ϩετϥϯ৘ใ(21.4ສళ) • ళ໊ɺѻ͍ͬͯΔྉཧɺॅॴɺҢ౓ ܦ౓ɺΞΫηε਺ɺ࠷دΓฑߦ͖͔ Βͷڑ཭ɺetc... Monday, November 26, 12

Slide 12

Slide 12 text

livedoorάϧϝ • ϨϏϡʔ৘ใ(20.5ສϨϏϡʔ) • ૯߹ධՁʢ5ஈ֊ʣ • งғؾɺ஋ஈɺαʔϏεɺຯ • ϨϏϡʔίϝϯτ Monday, November 26, 12

Slide 13

Slide 13 text

https://github.com/penguinco/ld_gourmet_search Monday, November 26, 12

Slide 14

Slide 14 text

ElasticsearchΛ࢖͏ • 1݅ొ࿥ͯ͠ɺ1݅ݕࡧ • ೔ຊޠͷѻ͍Λఆٛ • εΩʔϚఆٛ • औΓࠐΈ • ݕࡧ • είΞϦϯάͳͲͷௐ੔ Monday, November 26, 12

Slide 15

Slide 15 text

PUT curl -XPUT http://localhost:9200/twitter/tweet/1 -d ' { "user": "kimchy", "post_date": "2012-11-26T20:12:00", "message": "Trying out elasticsearch", "score": 5 } ' index type id Monday, November 26, 12

Slide 16

Slide 16 text

PUT curl -XPUT http://localhost:9200/twitter/user/kimchy -d ' { "name" : "Shay Banon" } ' index type id Monday, November 26, 12

Slide 17

Slide 17 text

GET curl -XGET http://localhost:9200/twitter/tweet/1 { "user": "kimchy", "post_date": "2012-11-26T20:12:00", "message": "Trying out elasticsearch", "score": 5 } } index type id Monday, November 26, 12

Slide 18

Slide 18 text

SEARCH curl -XGET http://localhost:9200/twitter/tweet/_search -d '{ "query" : { "term" : { "user": "kimchy" } } }' index type id { "user": "kimchy", "post_date": "2012-11-26T20:12:00", "message": "Trying out elasticsearch", "score": 5 } Monday, November 26, 12

Slide 19

Slide 19 text

REST API • υΩϡϝϯτͷ௥Ճɾ࡟আ • ઃఆͷ௥Ճɾ࡟আ • શ෦HTTP APIͰͰ͖Δ • εΩʔϚϑϦʔ Monday, November 26, 12

Slide 20

Slide 20 text

೔ຊޠ $ curl -XGET 'localhost:9200/_analyze?pretty' -d 'ਆઘ' { "tokens" : [ { "token" : "ਆ", "start_offset" : 0, "end_offset" : 1, "type" : "", "position" : 1 }, { "token" : "ઘ", "start_offset" : 1, "end_offset" : 2, "type" : "", "position" : 2 } ] } Monday, November 26, 12

Slide 21

Slide 21 text

೔ຊޠ AnalyzerΛมߋ͢Δ͜ͱͰରԠ kuromoji͕࢖͑·͢ʂ http://www.hirotakaster.com/archives/2012/11/ elasticsearch-kuromoji-plugin.php Monday, November 26, 12

Slide 22

Slide 22 text

kuromoji $ cd elasticsearch $ bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.0.0 $ git clone git://github.com/elasticsearch/elasticsearch-analysis- kuromoji.git $ cd elasticsearch-analysis-kuromoji/ $ mvn clean package $ cp target/elasticsearch-analysis-kuromoji-1.2.0-SNAPSHOT.jar ../plugins/ analysis-kuromoji/elasticsearch-analysis-kuromoji-1.0.0.jar # restart elasticsearch Monday, November 26, 12

Slide 23

Slide 23 text

add analyzer $ curl -XPUT 'localhost:9200/test/' -d ' { "index":{ "analysis":{ "tokenizer" : { "kuromoji" : { "type":"kuromoji_tokenizer", "mode":"search" } }, "analyzer" : { "kuromoji_analyzer" : { "type" : "custom", "tokenizer" : "kuromoji_tokenizer" } } } } } ‘ Monday, November 26, 12

Slide 24

Slide 24 text

kuromoji $ curl -XGET 'localhost:9200/test/_analyze? analyzer=kuromoji_analyzer&pretty' -d 'ਆઘ' { "tokens" : [ { "token" : "ਆઘ", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 } ] } Monday, November 26, 12

Slide 25

Slide 25 text

_analyze $ curl -XGET 'localhost:9200/test/_analyze? analyzer=kuromoji_analyzer&pretty' -d 'ؔ੢ࠃࡍۭߓ' { "tokens" : [ {"token" : "ؔ੢",}, {"token" : "ؔ੢ࠃࡍۭߓ",}, {"token" : "ࠃࡍ",}, {"token" : "ۭߓ",} ] } Monday, November 26, 12

Slide 26

Slide 26 text

kuromojiΛσϑΥϧτʹ • default͍ͬͯ͏໊લͰanalyzerΛએݴ Monday, November 26, 12

Slide 27

Slide 27 text

ಉٛޠ • Solrಉ༷ಉٛޠ͕ϑΝΠϧͰॻ͚Δ • +WordNetܗࣜ΋࢖͑Δ Monday, November 26, 12

Slide 28

Slide 28 text

analyzer Monday, November 26, 12

Slide 29

Slide 29 text

೔ຊޠͷ৺഑͸͋Δఔ౓ย෇͍ͨʂ Monday, November 26, 12

Slide 30

Slide 30 text

εΩʔϚఆٛ • εΩʔϚϑϦʔʂ • JSONͷܕ͕࠾༻͞ΕΔ • ڧ੍తʹఆٛ΋Ͱ͖Δ(mapping) Monday, November 26, 12

Slide 31

Slide 31 text

mappingྫ $ curl -XPUT 'http://localhost:9200/twitter/tweet/ _mapping' -d ' { "tweet" : { "properties" : { "message" : {"type" : "string", "store" : "yes"} } } } ' Monday, November 26, 12

Slide 32

Slide 32 text

Solrͱͷࠩ෼ • SolrͷDynamicFieldΑΓ΋؆୯ • type • 1ίΞ಺ʹෳ਺छྨͷdocΛೖΕΔ͜ ͱΛ૝ఆͯ͋ͬͯ͠ศར Monday, November 26, 12

Slide 33

Slide 33 text

import(ruby) ratings = [] CSV.foreach("ratings.csv") do |row| ratings << { :id => row[:id].to_i, :restaurant_id => row[:restaurant_id].to_i, :body => row[:body], :type => 'rating' } end Tire.index 'livedoor_gourmet' do import ratings end Monday, November 26, 12

Slide 34

Slide 34 text

ݕࡧ curl -X GET 'http://localhost:9200/livedoor_gourmet/ restaurant/_search?pretty' -d ' { "query":{ "query_string":{ "query":"ϥʔϝϯ" } }, "sort":[{"access_count":"desc"}], "filter":{ "term":{"closed":"0"} } } ' Monday, November 26, 12

Slide 35

Slide 35 text

Solrͱͷࠩ෼ • DSL͕݁ߏҧ͏ • filter, facet, grouping, highlight΋αϙʔτ • είΞϦϯά͸εΫϦϓτݴޠͰఆٛ Ͱ͖Δ Monday, November 26, 12

Slide 36

Slide 36 text

είΞϦϯά • PVॱͰฒ΂ͨΒ͏·͍ͬͨ͆͘ • ݱ࣮ͷ໰୊΋݁ߏ͜͏͍͏͜ͱଟ͍ Monday, November 26, 12

Slide 37

Slide 37 text

είΞϦϯά • ڵຯͷ͋Δํ͸ͥͻ • εΫϦϓτݴޠͰఆٛͰ͖Δ • google: elasticsearch guide scoring Monday, November 26, 12

Slide 38

Slide 38 text

ײ૝ • ػೳ໘Ͱෆ଍͸ͳ͍ • API͕։ൃऀʹ΍͍͞͠ • ༰қʹशಘͰ͖Δ • େن໛෼ࢄݕࡧҎ֎Ͱ΋༗༻ Monday, November 26, 12

Slide 39

Slide 39 text

API Monday, November 26, 12

Slide 40

Slide 40 text

config curl͚ͩͰͰ͖Δ →ΞϓϦέʔγϣϯʹఆٛΛஔ͚Δ Monday, November 26, 12

Slide 41

Slide 41 text

ίΞ௥Ճ curl͚ͩͰͰ͖Δ →։ൃऀͻͱΓͰ׬݁Ͱ͖Δ Monday, November 26, 12

Slide 42

Slide 42 text

༰қʹशಘͰ͖Δ • ΄ͱΜͲͷૢ࡞͸curlͰ׬݁ • Solrͱڞ௨ͷ஌ࣝ΋ଟ͍ • luceneͷΫΤϦ͕࢖͑Δ • qury DSL͸ͪΐͬͱোน… Monday, November 26, 12

Slide 43

Slide 43 text

෼ࢄݕࡧ Monday, November 26, 12

Slide 44

Slide 44 text

෼ࢄݕࡧ • number_of_shards • number_of_replicas • replication • async/sync • write consistency(one, quorum, all) Monday, November 26, 12

Slide 45

Slide 45 text

multi-tenant • open/close index • write I/O throttling • merge policy control • shard allocation • number_of_replicas per index Monday, November 26, 12

Slide 46

Slide 46 text

plugin Monday, November 26, 12

Slide 47

Slide 47 text

plugin $ bin/plugin -install Aconex/elasticsearch-head Monday, November 26, 12

Slide 48

Slide 48 text

ύϑΥʔϚϯε • ࣄྫ͸ଟ͘ݟ͔ͭΔ • foursquare, soundcloud, bugsense ...etc • ΫΤϦΩϟογϡ͕ͳ͍ • nginx, varnishͳͲͰΩϟογϡ͢Δ Monday, November 26, 12

Slide 49

Slide 49 text

·ͱΊ • ෼ࢄݕࡧΛ࢖͏ͳΒelasticsearch • ෼ࢄݕࡧΛ࢖Θͳͯ͘΋ར఺͕ଟ͍ • ࠓޙ࢖ΘΕΔػձ͕͋Δ͔΋ Monday, November 26, 12

Slide 50

Slide 50 text

see also... • http://www.elasticsearch.org/ • http://www.elasticsearch.org/guide/ • http://solr-vs-elasticsearch.com/ • github.com/elasticsearch • http://blog.sematext.com/ • #elasticsearch Monday, November 26, 12