Introducing Yokozuna - Riak + Solr

E0f4dbccf64a1d37a92e224b070ee84f?s=47 Basho Technologies
October 10, 2012
23k

Introducing Yokozuna - Riak + Solr

E0f4dbccf64a1d37a92e224b070ee84f?s=128

Basho Technologies

October 10, 2012
Tweet

Transcript

  1. Yokozuna Riak + Solr October 10th, 2012 1 Monday, October

    15, 12
  2. •Ryan Zezeski • @rzezeski • rzezeski@basho.com Me 2 Monday, October

    15, 12
  3. What? 3 Monday, October 15, 12

  4. Tight Integration • Solr bundled with Riak, turn-key, zero config

    to start • supervise Solr process, start/stop/restart • present canonical Solr query interface • use Solr clients to query Riak 4 Monday, October 15, 12
  5. Erlang Application • made up of process and library modules

    • has a supervision tree • sits alongside of Riak KV 5 Monday, October 15, 12
  6. Intermediary • converts KV data into Solr docs • translates

    Solr queries to distributed Solr queries • constantly communicates with KV to verify object/index convergence 6 Monday, October 15, 12
  7. This Guy http://images4.wikia.nocookie.net/__cb20110220003703/prowrestling/images/a/a5/Yokozuna16.jpg 7 Monday, October 15, 12

  8. Sumo Wrestling Term “Horizontal rope. The top rank in sumo,

    usually translated Grand Champion. The name comes from the rope a yokozuna wears for the dohyō-iri.” http://en.wikipedia.org/wiki/Glossary_of_sumo_terms 8 Monday, October 15, 12
  9. Why? 9 Monday, October 15, 12

  10. Riak Search Lessons Learned • pretends to be lucene/solr •

    lack of analyzer/language/feature support • bad performance/resource usage for certain queries • Basho is not in the business of search 10 Monday, October 15, 12
  11. • great analyzer/language support • features: ranking, faceting, highlighting, geo,

    etc. • rests upon solid foundation, Lucene • active development, built by people that innovate on search Solr is Great 11 Monday, October 15, 12
  12. Improve Retrieval in Riak • Riak is good at storing

    data, make it better at finding it • Map/Reduce can be too general and resource hungry • 2i is very limited, resource issues • query non-trivial amount of data efficiently 12 Monday, October 15, 12
  13. Use Strengths of Both • Riak - HA, distributed, scale

    out/in • Solr - efficient index, features people want, known entity (vs. Riak Search which is home grown) • make Solr HA, distributed, and scale with Riak • make Riak searchable with Solr (bolth?) 13 Monday, October 15, 12
  14. Features 14 Monday, October 15, 12

  15. IF SOLR HAS IT YOKOZUNA HAS IT * * http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

    15 Monday, October 15, 12
  16. TIGHTLY INTEGRATED TURN-KEY LOOKS LIKE SINGLE SYSTEM 16 Monday, October

    15, 12
  17. WRITE IT LIKE RIAK QUERY IT LIKE SOLR 17 Monday,

    October 15, 12
  18. EXTRACT FIELDS BASED ON CONTENT TYPE 18 Monday, October 15,

    12
  19. ANTI-ENTROPY 19 Monday, October 15, 12

  20. ACTIVE ANTI-ENTROPY 20 Monday, October 15, 12

  21. How Does it Work? 21 Monday, October 15, 12

  22. Riak as Normal http://wiki.basho.com/attachments/riak-ring.png http://s3.amazonaws.com/wernervogels/public/sosp/sosp-figure3-small.png Key Value 22 Monday, October

    15, 12
  23. One Solr Instance per Node Riak Solr Proc Query Index

    Start/Monitor KV Hook Yokozuna Jetty/Solr people msgs cores bucket index core 1 1 1 M goal of M:1, bucket:index 23 Monday, October 15, 12
  24. All Partitions, One Solr Riak KV 1 KV 4 KV

    7 Solr id ryan_7 _yz_pn 7 _yz_fpn 7 _yz_node dev1 _yz_rk ryan value_t “...” Doc ryan Value special fields 24 Monday, October 15, 12
  25. Extraction on Media Type content-type “text/xml” riak object yz_xml_extractor(Value) <doc>

    <person_name_s>Ryan Zezeski</person_name_s> <person_bio_t>...</person_bio_t> ... </doc> metadata Key Value 25 Monday, October 15, 12
  26. Anti-Entropy read repair handoff put obj modified! 26 Monday, October

    15, 12
  27. Active Anti-Entropy KV Tree YZ Tree Exchange Entropy Mgr 27

    Monday, October 15, 12
  28. Query -> Dist. Query search/people?q=zezeski solr/people/select?shards=...&fq=...&q=zezeski yz_cover:plan(“people”) Riak Solr distributed

    search Node A NodeB Solr NodeB Solr NodeC Solr 28 Monday, October 15, 12
  29. Getting Started 29 Monday, October 15, 12

  30. Build From Source https://github.com/rzezeski/yokozuna#getting-started 30 Monday, October 15, 12

  31. EC2 AMI ami-8d9c20e4 • based on ami-6df93504, x86_64, Amazon Linux,

    instance storage • Yokozuna ready to go, ~ec2-user/riak/rel • modify node name in vm.args • set ulimit -n • open up port 8098 do this _before_ you start the node 31 Monday, October 15, 12
  32. Start Riak $ ./rel/riak/bin/riak start should see new beam.smp and

    java processes 32 Monday, October 15, 12
  33. Attach to Riak $ ./rel/riak/bin/riak start $ ./rel/riak/bin/riak attach 33

    Monday, October 15, 12
  34. Create Index $ ./rel/riak/bin/riak start $ ./rel/riak/bin/riak attach > yz_index:create(“people”).

    ok new solr core created with default schema 34 Monday, October 15, 12
  35. Install Hook $ ./rel/riak/bin/riak attach > yz_index:create(“people”). ok > yz_kv:install_hook(<<“people”>>).

    ok obj modified hook installed on bucket 35 Monday, October 15, 12
  36. Check Bucket > (Ctrl-D) $ curl http://localhost:8098/riak/people | jsonpp |

    grep -C2 obj_modified 36 Monday, October 15, 12
  37. Write Data $ curl -X PUT \ -H ‘content-type: text/plain”

    \ http://localhost:8098/riak/people/ryan \ -d ‘I heard there is Natty Boh at RICON’ entire value will be extracted under ‘text’ field 37 Monday, October 15, 12
  38. Query Data $ curl 'http://localhost:8098/search/ people?q=natty&wt=json' | jsonpp ‘wt’ is

    passed through to Solr 38 Monday, October 15, 12
  39. Response the key ‘ryan’ was returned magic 39 Monday, October

    15, 12
  40. Query Highlighting search/people? wt=json&omitHeader=true&hl=true&hl.fl=tex t&fl=_yz_rk&q=natty all these Solr params are

    passed through to Solr untouched 40 Monday, October 15, 12
  41. Highlighting Response where’s the cream filling? 41 Monday, October 15,

    12
  42. must store fields to highlight Check the Schema 42 Monday,

    October 15, 12
  43. Overwrite Extractor Def > yz_extractor:register("text/plain", {yz_text_extractor, [{field_name, value_t}]}, [overwrite]). now

    all text/plain data will be written under ‘value_t’ field 43 Monday, October 15, 12
  44. Write Data 2 $ curl -X DELETE http://localhost:8098/ riak/people/ryan $

    curl -X PUT \ -H ‘content-type: text/plain” \ http://localhost:8098/riak/people/ryan \ -d ‘I heard there is Natty Boh at RICON’ 44 Monday, October 15, 12
  45. Query Highlighting 2 search/people? wt=json&omitHeader=true&hl=true&hl.fl=val ue_t&fl=_yz_rk&q=value_t:natty notice the hl.fl changed

    to value_t, as did q 45 Monday, October 15, 12
  46. Highlighting Response 2 Yeaaaaaa buddy! 46 Monday, October 15, 12

  47. TODO 47 Monday, October 15, 12

  48. HEADERS/METADATA TO FIELDS 48 Monday, October 15, 12

  49. SUPPORT SOLR UPDATE XML MESSAGES http://wiki.apache.org/solr/UpdateXmlMessages 49 Monday, October 15,

    12
  50. PROTOTYPE/ BENCHMARK 50 Monday, October 15, 12

  51. GOOD MIGRATION STORY FROM RIAK SEARCH TO YOKOZUNA 51 Monday,

    October 15, 12
  52. IINCLUDE WITH RIAK PROPER, VERSION 1.? 52 Monday, October 15,

    12
  53. ONE MORE THING? 53 Monday, October 15, 12

  54. POWERED BY YOKOZUNA/SOLR 54 Monday, October 15, 12

  55. KTHXBAI @jrecursive (mecha) @dizzyd (best “boss” ever) @argv0 (wat dat

    smell like) @DstroyAllModels (shot of whiskey plz) @jon_meredith (CHOCOLATE CAKE) W. Hilton Apache Solr (thx for all the code) ElasticSearch & Datastax (for the inspiration) Mom & Dad @pharkmillups (dat hustle) @tsantero (you’re gonna owe me 2 bills) @coderoshi (riakdocs) 55 Monday, October 15, 12