Slide 1

Slide 1 text

Yokozuna Riak + Solr October 10th, 2012 1 Monday, October 15, 12

Slide 2

Slide 2 text

•Ryan Zezeski • @rzezeski • rzezeski@basho.com Me 2 Monday, October 15, 12

Slide 3

Slide 3 text

What? 3 Monday, October 15, 12

Slide 4

Slide 4 text

Tight Integration • Solr bundled with Riak, turn-key, zero config to start • supervise Solr process, start/stop/restart • present canonical Solr query interface • use Solr clients to query Riak 4 Monday, October 15, 12

Slide 5

Slide 5 text

Erlang Application • made up of process and library modules • has a supervision tree • sits alongside of Riak KV 5 Monday, October 15, 12

Slide 6

Slide 6 text

Intermediary • converts KV data into Solr docs • translates Solr queries to distributed Solr queries • constantly communicates with KV to verify object/index convergence 6 Monday, October 15, 12

Slide 7

Slide 7 text

This Guy http://images4.wikia.nocookie.net/__cb20110220003703/prowrestling/images/a/a5/Yokozuna16.jpg 7 Monday, October 15, 12

Slide 8

Slide 8 text

Sumo Wrestling Term “Horizontal rope. The top rank in sumo, usually translated Grand Champion. The name comes from the rope a yokozuna wears for the dohyō-iri.” http://en.wikipedia.org/wiki/Glossary_of_sumo_terms 8 Monday, October 15, 12

Slide 9

Slide 9 text

Why? 9 Monday, October 15, 12

Slide 10

Slide 10 text

Riak Search Lessons Learned • pretends to be lucene/solr • lack of analyzer/language/feature support • bad performance/resource usage for certain queries • Basho is not in the business of search 10 Monday, October 15, 12

Slide 11

Slide 11 text

• great analyzer/language support • features: ranking, faceting, highlighting, geo, etc. • rests upon solid foundation, Lucene • active development, built by people that innovate on search Solr is Great 11 Monday, October 15, 12

Slide 12

Slide 12 text

Improve Retrieval in Riak • Riak is good at storing data, make it better at finding it • Map/Reduce can be too general and resource hungry • 2i is very limited, resource issues • query non-trivial amount of data efficiently 12 Monday, October 15, 12

Slide 13

Slide 13 text

Use Strengths of Both • Riak - HA, distributed, scale out/in • Solr - efficient index, features people want, known entity (vs. Riak Search which is home grown) • make Solr HA, distributed, and scale with Riak • make Riak searchable with Solr (bolth?) 13 Monday, October 15, 12

Slide 14

Slide 14 text

Features 14 Monday, October 15, 12

Slide 15

Slide 15 text

IF SOLR HAS IT YOKOZUNA HAS IT * * http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations 15 Monday, October 15, 12

Slide 16

Slide 16 text

TIGHTLY INTEGRATED TURN-KEY LOOKS LIKE SINGLE SYSTEM 16 Monday, October 15, 12

Slide 17

Slide 17 text

WRITE IT LIKE RIAK QUERY IT LIKE SOLR 17 Monday, October 15, 12

Slide 18

Slide 18 text

EXTRACT FIELDS BASED ON CONTENT TYPE 18 Monday, October 15, 12

Slide 19

Slide 19 text

ANTI-ENTROPY 19 Monday, October 15, 12

Slide 20

Slide 20 text

ACTIVE ANTI-ENTROPY 20 Monday, October 15, 12

Slide 21

Slide 21 text

How Does it Work? 21 Monday, October 15, 12

Slide 22

Slide 22 text

Riak as Normal http://wiki.basho.com/attachments/riak-ring.png http://s3.amazonaws.com/wernervogels/public/sosp/sosp-figure3-small.png Key Value 22 Monday, October 15, 12

Slide 23

Slide 23 text

One Solr Instance per Node Riak Solr Proc Query Index Start/Monitor KV Hook Yokozuna Jetty/Solr people msgs cores bucket index core 1 1 1 M goal of M:1, bucket:index 23 Monday, October 15, 12

Slide 24

Slide 24 text

All Partitions, One Solr Riak KV 1 KV 4 KV 7 Solr id ryan_7 _yz_pn 7 _yz_fpn 7 _yz_node dev1 _yz_rk ryan value_t “...” Doc ryan Value special fields 24 Monday, October 15, 12

Slide 25

Slide 25 text

Extraction on Media Type content-type “text/xml” riak object yz_xml_extractor(Value) Ryan Zezeski ... ... metadata Key Value 25 Monday, October 15, 12

Slide 26

Slide 26 text

Anti-Entropy read repair handoff put obj modified! 26 Monday, October 15, 12

Slide 27

Slide 27 text

Active Anti-Entropy KV Tree YZ Tree Exchange Entropy Mgr 27 Monday, October 15, 12

Slide 28

Slide 28 text

Query -> Dist. Query search/people?q=zezeski solr/people/select?shards=...&fq=...&q=zezeski yz_cover:plan(“people”) Riak Solr distributed search Node A NodeB Solr NodeB Solr NodeC Solr 28 Monday, October 15, 12

Slide 29

Slide 29 text

Getting Started 29 Monday, October 15, 12

Slide 30

Slide 30 text

Build From Source https://github.com/rzezeski/yokozuna#getting-started 30 Monday, October 15, 12

Slide 31

Slide 31 text

EC2 AMI ami-8d9c20e4 • based on ami-6df93504, x86_64, Amazon Linux, instance storage • Yokozuna ready to go, ~ec2-user/riak/rel • modify node name in vm.args • set ulimit -n • open up port 8098 do this _before_ you start the node 31 Monday, October 15, 12

Slide 32

Slide 32 text

Start Riak $ ./rel/riak/bin/riak start should see new beam.smp and java processes 32 Monday, October 15, 12

Slide 33

Slide 33 text

Attach to Riak $ ./rel/riak/bin/riak start $ ./rel/riak/bin/riak attach 33 Monday, October 15, 12

Slide 34

Slide 34 text

Create Index $ ./rel/riak/bin/riak start $ ./rel/riak/bin/riak attach > yz_index:create(“people”). ok new solr core created with default schema 34 Monday, October 15, 12

Slide 35

Slide 35 text

Install Hook $ ./rel/riak/bin/riak attach > yz_index:create(“people”). ok > yz_kv:install_hook(<<“people”>>). ok obj modified hook installed on bucket 35 Monday, October 15, 12

Slide 36

Slide 36 text

Check Bucket > (Ctrl-D) $ curl http://localhost:8098/riak/people | jsonpp | grep -C2 obj_modified 36 Monday, October 15, 12

Slide 37

Slide 37 text

Write Data $ curl -X PUT \ -H ‘content-type: text/plain” \ http://localhost:8098/riak/people/ryan \ -d ‘I heard there is Natty Boh at RICON’ entire value will be extracted under ‘text’ field 37 Monday, October 15, 12

Slide 38

Slide 38 text

Query Data $ curl 'http://localhost:8098/search/ people?q=natty&wt=json' | jsonpp ‘wt’ is passed through to Solr 38 Monday, October 15, 12

Slide 39

Slide 39 text

Response the key ‘ryan’ was returned magic 39 Monday, October 15, 12

Slide 40

Slide 40 text

Query Highlighting search/people? wt=json&omitHeader=true&hl=true&hl.fl=tex t&fl=_yz_rk&q=natty all these Solr params are passed through to Solr untouched 40 Monday, October 15, 12

Slide 41

Slide 41 text

Highlighting Response where’s the cream filling? 41 Monday, October 15, 12

Slide 42

Slide 42 text

must store fields to highlight Check the Schema 42 Monday, October 15, 12

Slide 43

Slide 43 text

Overwrite Extractor Def > yz_extractor:register("text/plain", {yz_text_extractor, [{field_name, value_t}]}, [overwrite]). now all text/plain data will be written under ‘value_t’ field 43 Monday, October 15, 12

Slide 44

Slide 44 text

Write Data 2 $ curl -X DELETE http://localhost:8098/ riak/people/ryan $ curl -X PUT \ -H ‘content-type: text/plain” \ http://localhost:8098/riak/people/ryan \ -d ‘I heard there is Natty Boh at RICON’ 44 Monday, October 15, 12

Slide 45

Slide 45 text

Query Highlighting 2 search/people? wt=json&omitHeader=true&hl=true&hl.fl=val ue_t&fl=_yz_rk&q=value_t:natty notice the hl.fl changed to value_t, as did q 45 Monday, October 15, 12

Slide 46

Slide 46 text

Highlighting Response 2 Yeaaaaaa buddy! 46 Monday, October 15, 12

Slide 47

Slide 47 text

TODO 47 Monday, October 15, 12

Slide 48

Slide 48 text

HEADERS/METADATA TO FIELDS 48 Monday, October 15, 12

Slide 49

Slide 49 text

SUPPORT SOLR UPDATE XML MESSAGES http://wiki.apache.org/solr/UpdateXmlMessages 49 Monday, October 15, 12

Slide 50

Slide 50 text

PROTOTYPE/ BENCHMARK 50 Monday, October 15, 12

Slide 51

Slide 51 text

GOOD MIGRATION STORY FROM RIAK SEARCH TO YOKOZUNA 51 Monday, October 15, 12

Slide 52

Slide 52 text

IINCLUDE WITH RIAK PROPER, VERSION 1.? 52 Monday, October 15, 12

Slide 53

Slide 53 text

ONE MORE THING? 53 Monday, October 15, 12

Slide 54

Slide 54 text

POWERED BY YOKOZUNA/SOLR 54 Monday, October 15, 12

Slide 55

Slide 55 text

KTHXBAI @jrecursive (mecha) @dizzyd (best “boss” ever) @argv0 (wat dat smell like) @DstroyAllModels (shot of whiskey plz) @jon_meredith (CHOCOLATE CAKE) W. Hilton Apache Solr (thx for all the code) ElasticSearch & Datastax (for the inspiration) Mom & Dad @pharkmillups (dat hustle) @tsantero (you’re gonna owe me 2 bills) @coderoshi (riakdocs) 55 Monday, October 15, 12