Introducing Yokozuna - Riak + Solr

E0f4dbccf64a1d37a92e224b070ee84f?s=47 Basho Technologies
October 10, 2012
23k

Introducing Yokozuna - Riak + Solr

E0f4dbccf64a1d37a92e224b070ee84f?s=128

Basho Technologies

October 10, 2012
Tweet

Transcript

  1. 4.

    Tight Integration • Solr bundled with Riak, turn-key, zero config

    to start • supervise Solr process, start/stop/restart • present canonical Solr query interface • use Solr clients to query Riak 4 Monday, October 15, 12
  2. 5.

    Erlang Application • made up of process and library modules

    • has a supervision tree • sits alongside of Riak KV 5 Monday, October 15, 12
  3. 6.

    Intermediary • converts KV data into Solr docs • translates

    Solr queries to distributed Solr queries • constantly communicates with KV to verify object/index convergence 6 Monday, October 15, 12
  4. 8.

    Sumo Wrestling Term “Horizontal rope. The top rank in sumo,

    usually translated Grand Champion. The name comes from the rope a yokozuna wears for the dohyō-iri.” http://en.wikipedia.org/wiki/Glossary_of_sumo_terms 8 Monday, October 15, 12
  5. 10.

    Riak Search Lessons Learned • pretends to be lucene/solr •

    lack of analyzer/language/feature support • bad performance/resource usage for certain queries • Basho is not in the business of search 10 Monday, October 15, 12
  6. 11.

    • great analyzer/language support • features: ranking, faceting, highlighting, geo,

    etc. • rests upon solid foundation, Lucene • active development, built by people that innovate on search Solr is Great 11 Monday, October 15, 12
  7. 12.

    Improve Retrieval in Riak • Riak is good at storing

    data, make it better at finding it • Map/Reduce can be too general and resource hungry • 2i is very limited, resource issues • query non-trivial amount of data efficiently 12 Monday, October 15, 12
  8. 13.

    Use Strengths of Both • Riak - HA, distributed, scale

    out/in • Solr - efficient index, features people want, known entity (vs. Riak Search which is home grown) • make Solr HA, distributed, and scale with Riak • make Riak searchable with Solr (bolth?) 13 Monday, October 15, 12
  9. 23.

    One Solr Instance per Node Riak Solr Proc Query Index

    Start/Monitor KV Hook Yokozuna Jetty/Solr people msgs cores bucket index core 1 1 1 M goal of M:1, bucket:index 23 Monday, October 15, 12
  10. 24.

    All Partitions, One Solr Riak KV 1 KV 4 KV

    7 Solr id ryan_7 _yz_pn 7 _yz_fpn 7 _yz_node dev1 _yz_rk ryan value_t “...” Doc ryan Value special fields 24 Monday, October 15, 12
  11. 25.

    Extraction on Media Type content-type “text/xml” riak object yz_xml_extractor(Value) <doc>

    <person_name_s>Ryan Zezeski</person_name_s> <person_bio_t>...</person_bio_t> ... </doc> metadata Key Value 25 Monday, October 15, 12
  12. 31.

    EC2 AMI ami-8d9c20e4 • based on ami-6df93504, x86_64, Amazon Linux,

    instance storage • Yokozuna ready to go, ~ec2-user/riak/rel • modify node name in vm.args • set ulimit -n • open up port 8098 do this _before_ you start the node 31 Monday, October 15, 12
  13. 32.
  14. 34.

    Create Index $ ./rel/riak/bin/riak start $ ./rel/riak/bin/riak attach > yz_index:create(“people”).

    ok new solr core created with default schema 34 Monday, October 15, 12
  15. 36.

    Check Bucket > (Ctrl-D) $ curl http://localhost:8098/riak/people | jsonpp |

    grep -C2 obj_modified 36 Monday, October 15, 12
  16. 37.

    Write Data $ curl -X PUT \ -H ‘content-type: text/plain”

    \ http://localhost:8098/riak/people/ryan \ -d ‘I heard there is Natty Boh at RICON’ entire value will be extracted under ‘text’ field 37 Monday, October 15, 12
  17. 43.

    Overwrite Extractor Def > yz_extractor:register("text/plain", {yz_text_extractor, [{field_name, value_t}]}, [overwrite]). now

    all text/plain data will be written under ‘value_t’ field 43 Monday, October 15, 12
  18. 44.

    Write Data 2 $ curl -X DELETE http://localhost:8098/ riak/people/ryan $

    curl -X PUT \ -H ‘content-type: text/plain” \ http://localhost:8098/riak/people/ryan \ -d ‘I heard there is Natty Boh at RICON’ 44 Monday, October 15, 12
  19. 55.

    KTHXBAI @jrecursive (mecha) @dizzyd (best “boss” ever) @argv0 (wat dat

    smell like) @DstroyAllModels (shot of whiskey plz) @jon_meredith (CHOCOLATE CAKE) W. Hilton Apache Solr (thx for all the code) ElasticSearch & Datastax (for the inspiration) Mom & Dad @pharkmillups (dat hustle) @tsantero (you’re gonna owe me 2 bills) @coderoshi (riakdocs) 55 Monday, October 15, 12