Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Yokozuna - Riak + Solr

Basho Technologies
October 10, 2012
24k

Introducing Yokozuna - Riak + Solr

Basho Technologies

October 10, 2012
Tweet

Transcript

  1. Yokozuna
    Riak + Solr
    October 10th, 2012
    1
    Monday, October 15, 12

    View full-size slide

  2. •Ryan Zezeski
    • @rzezeski
    [email protected]
    Me
    2
    Monday, October 15, 12

    View full-size slide

  3. What?
    3
    Monday, October 15, 12

    View full-size slide

  4. Tight Integration
    • Solr bundled with Riak, turn-key, zero
    config to start
    • supervise Solr process, start/stop/restart
    • present canonical Solr query interface
    • use Solr clients to query Riak
    4
    Monday, October 15, 12

    View full-size slide

  5. Erlang Application
    • made up of process and library modules
    • has a supervision tree
    • sits alongside of Riak KV
    5
    Monday, October 15, 12

    View full-size slide

  6. Intermediary
    • converts KV data into Solr docs
    • translates Solr queries to distributed Solr
    queries
    • constantly communicates with KV to verify
    object/index convergence
    6
    Monday, October 15, 12

    View full-size slide

  7. This Guy
    http://images4.wikia.nocookie.net/__cb20110220003703/prowrestling/images/a/a5/Yokozuna16.jpg
    7
    Monday, October 15, 12

    View full-size slide

  8. Sumo Wrestling Term
    “Horizontal rope. The top rank in sumo,
    usually translated Grand Champion. The
    name comes from the rope a yokozuna
    wears for the dohyō-iri.”
    http://en.wikipedia.org/wiki/Glossary_of_sumo_terms
    8
    Monday, October 15, 12

    View full-size slide

  9. Why?
    9
    Monday, October 15, 12

    View full-size slide

  10. Riak Search
    Lessons Learned
    • pretends to be lucene/solr
    • lack of analyzer/language/feature support
    • bad performance/resource usage for
    certain queries
    • Basho is not in the business of search
    10
    Monday, October 15, 12

    View full-size slide

  11. • great analyzer/language support
    • features: ranking, faceting, highlighting,
    geo, etc.
    • rests upon solid foundation, Lucene
    • active development, built by people that
    innovate on search
    Solr is Great
    11
    Monday, October 15, 12

    View full-size slide

  12. Improve Retrieval in Riak
    • Riak is good at storing data, make it better
    at finding it
    • Map/Reduce can be too general and
    resource hungry
    • 2i is very limited, resource issues
    • query non-trivial amount of data
    efficiently
    12
    Monday, October 15, 12

    View full-size slide

  13. Use Strengths of Both
    • Riak - HA, distributed, scale out/in
    • Solr - efficient index, features people
    want, known entity (vs. Riak Search which
    is home grown)
    • make Solr HA, distributed, and scale with
    Riak
    • make Riak searchable with Solr
    (bolth?)
    13
    Monday, October 15, 12

    View full-size slide

  14. Features
    14
    Monday, October 15, 12

    View full-size slide

  15. IF SOLR HAS IT
    YOKOZUNA HAS IT *
    * http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
    15
    Monday, October 15, 12

    View full-size slide

  16. TIGHTLY INTEGRATED
    TURN-KEY
    LOOKS LIKE SINGLE
    SYSTEM
    16
    Monday, October 15, 12

    View full-size slide

  17. WRITE IT LIKE RIAK
    QUERY IT LIKE SOLR
    17
    Monday, October 15, 12

    View full-size slide

  18. EXTRACT FIELDS BASED
    ON CONTENT TYPE
    18
    Monday, October 15, 12

    View full-size slide

  19. ANTI-ENTROPY
    19
    Monday, October 15, 12

    View full-size slide

  20. ACTIVE ANTI-ENTROPY
    20
    Monday, October 15, 12

    View full-size slide

  21. How Does it
    Work?
    21
    Monday, October 15, 12

    View full-size slide

  22. Riak as Normal
    http://wiki.basho.com/attachments/riak-ring.png http://s3.amazonaws.com/wernervogels/public/sosp/sosp-figure3-small.png
    Key Value
    22
    Monday, October 15, 12

    View full-size slide

  23. One Solr Instance per Node
    Riak
    Solr
    Proc
    Query
    Index
    Start/Monitor
    KV
    Hook
    Yokozuna
    Jetty/Solr
    people
    msgs
    cores
    bucket index core
    1 1 1
    M
    goal of M:1, bucket:index
    23
    Monday, October 15, 12

    View full-size slide

  24. All Partitions, One Solr
    Riak
    KV 1
    KV 4
    KV 7
    Solr
    id ryan_7
    _yz_pn 7
    _yz_fpn 7
    _yz_node dev1
    _yz_rk ryan
    value_t “...”
    Doc
    ryan Value
    special
    fields
    24
    Monday, October 15, 12

    View full-size slide

  25. Extraction on Media Type
    content-type “text/xml”
    riak object yz_xml_extractor(Value)

    Ryan Zezeski
    ...
    ...

    metadata
    Key
    Value
    25
    Monday, October 15, 12

    View full-size slide

  26. Anti-Entropy
    read
    repair
    handoff
    put
    obj modified!
    26
    Monday, October 15, 12

    View full-size slide

  27. Active Anti-Entropy
    KV Tree YZ Tree
    Exchange
    Entropy
    Mgr
    27
    Monday, October 15, 12

    View full-size slide

  28. Query -> Dist. Query
    search/people?q=zezeski
    solr/people/select?shards=...&fq=...&q=zezeski
    yz_cover:plan(“people”)
    Riak
    Solr
    distributed search
    Node A
    NodeB
    Solr
    NodeB
    Solr
    NodeC
    Solr
    28
    Monday, October 15, 12

    View full-size slide

  29. Getting
    Started
    29
    Monday, October 15, 12

    View full-size slide

  30. Build From Source
    https://github.com/rzezeski/yokozuna#getting-started
    30
    Monday, October 15, 12

    View full-size slide

  31. EC2 AMI
    ami-8d9c20e4
    • based on ami-6df93504, x86_64, Amazon
    Linux, instance storage
    • Yokozuna ready to go, ~ec2-user/riak/rel
    • modify node name in vm.args
    • set ulimit -n
    • open up port 8098
    do this _before_ you
    start the node
    31
    Monday, October 15, 12

    View full-size slide

  32. Start Riak
    $ ./rel/riak/bin/riak start
    should see new
    beam.smp and java
    processes
    32
    Monday, October 15, 12

    View full-size slide

  33. Attach to Riak
    $ ./rel/riak/bin/riak start
    $ ./rel/riak/bin/riak attach
    33
    Monday, October 15, 12

    View full-size slide

  34. Create Index
    $ ./rel/riak/bin/riak start
    $ ./rel/riak/bin/riak attach
    > yz_index:create(“people”).
    ok
    new solr core created
    with default schema
    34
    Monday, October 15, 12

    View full-size slide

  35. Install Hook
    $ ./rel/riak/bin/riak attach
    > yz_index:create(“people”).
    ok
    > yz_kv:install_hook(<<“people”>>).
    ok
    obj modified hook
    installed on bucket
    35
    Monday, October 15, 12

    View full-size slide

  36. Check Bucket
    > (Ctrl-D)
    $ curl http://localhost:8098/riak/people
    | jsonpp | grep -C2 obj_modified
    36
    Monday, October 15, 12

    View full-size slide

  37. Write Data
    $ curl -X PUT \
    -H ‘content-type: text/plain” \
    http://localhost:8098/riak/people/ryan \
    -d ‘I heard there is Natty Boh at RICON’
    entire value will be
    extracted under ‘text’
    field
    37
    Monday, October 15, 12

    View full-size slide

  38. Query Data
    $ curl 'http://localhost:8098/search/
    people?q=natty&wt=json' | jsonpp
    ‘wt’ is passed through
    to Solr
    38
    Monday, October 15, 12

    View full-size slide

  39. Response
    the key ‘ryan’ was
    returned
    magic
    39
    Monday, October 15, 12

    View full-size slide

  40. Query Highlighting
    search/people?
    wt=json&omitHeader=true&hl=true&hl.fl=tex
    t&fl=_yz_rk&q=natty
    all these Solr params
    are passed through to
    Solr untouched
    40
    Monday, October 15, 12

    View full-size slide

  41. Highlighting Response
    where’s the cream
    filling?
    41
    Monday, October 15, 12

    View full-size slide

  42. must store fields to
    highlight
    Check the Schema
    42
    Monday, October 15, 12

    View full-size slide

  43. Overwrite Extractor Def
    > yz_extractor:register("text/plain",
    {yz_text_extractor, [{field_name,
    value_t}]}, [overwrite]).
    now all text/plain data
    will be written under
    ‘value_t’ field
    43
    Monday, October 15, 12

    View full-size slide

  44. Write Data 2
    $ curl -X DELETE http://localhost:8098/
    riak/people/ryan
    $ curl -X PUT \
    -H ‘content-type: text/plain” \
    http://localhost:8098/riak/people/ryan \
    -d ‘I heard there is Natty Boh at RICON’
    44
    Monday, October 15, 12

    View full-size slide

  45. Query Highlighting 2
    search/people?
    wt=json&omitHeader=true&hl=true&hl.fl=val
    ue_t&fl=_yz_rk&q=value_t:natty
    notice the hl.fl
    changed to value_t, as
    did q
    45
    Monday, October 15, 12

    View full-size slide

  46. Highlighting Response 2
    Yeaaaaaa buddy!
    46
    Monday, October 15, 12

    View full-size slide

  47. TODO
    47
    Monday, October 15, 12

    View full-size slide

  48. HEADERS/METADATA
    TO FIELDS
    48
    Monday, October 15, 12

    View full-size slide

  49. SUPPORT SOLR UPDATE
    XML MESSAGES
    http://wiki.apache.org/solr/UpdateXmlMessages
    49
    Monday, October 15, 12

    View full-size slide

  50. PROTOTYPE/
    BENCHMARK
    50
    Monday, October 15, 12

    View full-size slide

  51. GOOD MIGRATION
    STORY FROM RIAK
    SEARCH TO YOKOZUNA
    51
    Monday, October 15, 12

    View full-size slide

  52. IINCLUDE WITH RIAK
    PROPER, VERSION 1.?
    52
    Monday, October 15, 12

    View full-size slide

  53. ONE MORE
    THING?
    53
    Monday, October 15, 12

    View full-size slide

  54. POWERED BY YOKOZUNA/SOLR
    54
    Monday, October 15, 12

    View full-size slide

  55. KTHXBAI
    @jrecursive (mecha)
    @dizzyd (best “boss” ever)
    @argv0 (wat dat smell like)
    @DstroyAllModels (shot of whiskey plz)
    @jon_meredith (CHOCOLATE CAKE)
    W. Hilton
    Apache Solr (thx for all the code)
    ElasticSearch & Datastax (for the inspiration)
    Mom & Dad
    @pharkmillups (dat hustle)
    @tsantero (you’re gonna owe me 2 bills)
    @coderoshi (riakdocs)
    55
    Monday, October 15, 12

    View full-size slide