Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Yokozuna - Riak + Solr

Basho Technologies
October 10, 2012
24k

Introducing Yokozuna - Riak + Solr

Basho Technologies

October 10, 2012
Tweet

Transcript

  1. Yokozuna
    Riak + Solr
    October 10th, 2012
    1
    Monday, October 15, 12

    View Slide

  2. •Ryan Zezeski
    • @rzezeski
    [email protected]
    Me
    2
    Monday, October 15, 12

    View Slide

  3. What?
    3
    Monday, October 15, 12

    View Slide

  4. Tight Integration
    • Solr bundled with Riak, turn-key, zero
    config to start
    • supervise Solr process, start/stop/restart
    • present canonical Solr query interface
    • use Solr clients to query Riak
    4
    Monday, October 15, 12

    View Slide

  5. Erlang Application
    • made up of process and library modules
    • has a supervision tree
    • sits alongside of Riak KV
    5
    Monday, October 15, 12

    View Slide

  6. Intermediary
    • converts KV data into Solr docs
    • translates Solr queries to distributed Solr
    queries
    • constantly communicates with KV to verify
    object/index convergence
    6
    Monday, October 15, 12

    View Slide

  7. This Guy
    http://images4.wikia.nocookie.net/__cb20110220003703/prowrestling/images/a/a5/Yokozuna16.jpg
    7
    Monday, October 15, 12

    View Slide

  8. Sumo Wrestling Term
    “Horizontal rope. The top rank in sumo,
    usually translated Grand Champion. The
    name comes from the rope a yokozuna
    wears for the dohyō-iri.”
    http://en.wikipedia.org/wiki/Glossary_of_sumo_terms
    8
    Monday, October 15, 12

    View Slide

  9. Why?
    9
    Monday, October 15, 12

    View Slide

  10. Riak Search
    Lessons Learned
    • pretends to be lucene/solr
    • lack of analyzer/language/feature support
    • bad performance/resource usage for
    certain queries
    • Basho is not in the business of search
    10
    Monday, October 15, 12

    View Slide

  11. • great analyzer/language support
    • features: ranking, faceting, highlighting,
    geo, etc.
    • rests upon solid foundation, Lucene
    • active development, built by people that
    innovate on search
    Solr is Great
    11
    Monday, October 15, 12

    View Slide

  12. Improve Retrieval in Riak
    • Riak is good at storing data, make it better
    at finding it
    • Map/Reduce can be too general and
    resource hungry
    • 2i is very limited, resource issues
    • query non-trivial amount of data
    efficiently
    12
    Monday, October 15, 12

    View Slide

  13. Use Strengths of Both
    • Riak - HA, distributed, scale out/in
    • Solr - efficient index, features people
    want, known entity (vs. Riak Search which
    is home grown)
    • make Solr HA, distributed, and scale with
    Riak
    • make Riak searchable with Solr
    (bolth?)
    13
    Monday, October 15, 12

    View Slide

  14. Features
    14
    Monday, October 15, 12

    View Slide

  15. IF SOLR HAS IT
    YOKOZUNA HAS IT *
    * http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
    15
    Monday, October 15, 12

    View Slide

  16. TIGHTLY INTEGRATED
    TURN-KEY
    LOOKS LIKE SINGLE
    SYSTEM
    16
    Monday, October 15, 12

    View Slide

  17. WRITE IT LIKE RIAK
    QUERY IT LIKE SOLR
    17
    Monday, October 15, 12

    View Slide

  18. EXTRACT FIELDS BASED
    ON CONTENT TYPE
    18
    Monday, October 15, 12

    View Slide

  19. ANTI-ENTROPY
    19
    Monday, October 15, 12

    View Slide

  20. ACTIVE ANTI-ENTROPY
    20
    Monday, October 15, 12

    View Slide

  21. How Does it
    Work?
    21
    Monday, October 15, 12

    View Slide

  22. Riak as Normal
    http://wiki.basho.com/attachments/riak-ring.png http://s3.amazonaws.com/wernervogels/public/sosp/sosp-figure3-small.png
    Key Value
    22
    Monday, October 15, 12

    View Slide

  23. One Solr Instance per Node
    Riak
    Solr
    Proc
    Query
    Index
    Start/Monitor
    KV
    Hook
    Yokozuna
    Jetty/Solr
    people
    msgs
    cores
    bucket index core
    1 1 1
    M
    goal of M:1, bucket:index
    23
    Monday, October 15, 12

    View Slide

  24. All Partitions, One Solr
    Riak
    KV 1
    KV 4
    KV 7
    Solr
    id ryan_7
    _yz_pn 7
    _yz_fpn 7
    _yz_node dev1
    _yz_rk ryan
    value_t “...”
    Doc
    ryan Value
    special
    fields
    24
    Monday, October 15, 12

    View Slide

  25. Extraction on Media Type
    content-type “text/xml”
    riak object yz_xml_extractor(Value)

    Ryan Zezeski
    ...
    ...

    metadata
    Key
    Value
    25
    Monday, October 15, 12

    View Slide

  26. Anti-Entropy
    read
    repair
    handoff
    put
    obj modified!
    26
    Monday, October 15, 12

    View Slide

  27. Active Anti-Entropy
    KV Tree YZ Tree
    Exchange
    Entropy
    Mgr
    27
    Monday, October 15, 12

    View Slide

  28. Query -> Dist. Query
    search/people?q=zezeski
    solr/people/select?shards=...&fq=...&q=zezeski
    yz_cover:plan(“people”)
    Riak
    Solr
    distributed search
    Node A
    NodeB
    Solr
    NodeB
    Solr
    NodeC
    Solr
    28
    Monday, October 15, 12

    View Slide

  29. Getting
    Started
    29
    Monday, October 15, 12

    View Slide

  30. Build From Source
    https://github.com/rzezeski/yokozuna#getting-started
    30
    Monday, October 15, 12

    View Slide

  31. EC2 AMI
    ami-8d9c20e4
    • based on ami-6df93504, x86_64, Amazon
    Linux, instance storage
    • Yokozuna ready to go, ~ec2-user/riak/rel
    • modify node name in vm.args
    • set ulimit -n
    • open up port 8098
    do this _before_ you
    start the node
    31
    Monday, October 15, 12

    View Slide

  32. Start Riak
    $ ./rel/riak/bin/riak start
    should see new
    beam.smp and java
    processes
    32
    Monday, October 15, 12

    View Slide

  33. Attach to Riak
    $ ./rel/riak/bin/riak start
    $ ./rel/riak/bin/riak attach
    33
    Monday, October 15, 12

    View Slide

  34. Create Index
    $ ./rel/riak/bin/riak start
    $ ./rel/riak/bin/riak attach
    > yz_index:create(“people”).
    ok
    new solr core created
    with default schema
    34
    Monday, October 15, 12

    View Slide

  35. Install Hook
    $ ./rel/riak/bin/riak attach
    > yz_index:create(“people”).
    ok
    > yz_kv:install_hook(<<“people”>>).
    ok
    obj modified hook
    installed on bucket
    35
    Monday, October 15, 12

    View Slide

  36. Check Bucket
    > (Ctrl-D)
    $ curl http://localhost:8098/riak/people
    | jsonpp | grep -C2 obj_modified
    36
    Monday, October 15, 12

    View Slide

  37. Write Data
    $ curl -X PUT \
    -H ‘content-type: text/plain” \
    http://localhost:8098/riak/people/ryan \
    -d ‘I heard there is Natty Boh at RICON’
    entire value will be
    extracted under ‘text’
    field
    37
    Monday, October 15, 12

    View Slide

  38. Query Data
    $ curl 'http://localhost:8098/search/
    people?q=natty&wt=json' | jsonpp
    ‘wt’ is passed through
    to Solr
    38
    Monday, October 15, 12

    View Slide

  39. Response
    the key ‘ryan’ was
    returned
    magic
    39
    Monday, October 15, 12

    View Slide

  40. Query Highlighting
    search/people?
    wt=json&omitHeader=true&hl=true&hl.fl=tex
    t&fl=_yz_rk&q=natty
    all these Solr params
    are passed through to
    Solr untouched
    40
    Monday, October 15, 12

    View Slide

  41. Highlighting Response
    where’s the cream
    filling?
    41
    Monday, October 15, 12

    View Slide

  42. must store fields to
    highlight
    Check the Schema
    42
    Monday, October 15, 12

    View Slide

  43. Overwrite Extractor Def
    > yz_extractor:register("text/plain",
    {yz_text_extractor, [{field_name,
    value_t}]}, [overwrite]).
    now all text/plain data
    will be written under
    ‘value_t’ field
    43
    Monday, October 15, 12

    View Slide

  44. Write Data 2
    $ curl -X DELETE http://localhost:8098/
    riak/people/ryan
    $ curl -X PUT \
    -H ‘content-type: text/plain” \
    http://localhost:8098/riak/people/ryan \
    -d ‘I heard there is Natty Boh at RICON’
    44
    Monday, October 15, 12

    View Slide

  45. Query Highlighting 2
    search/people?
    wt=json&omitHeader=true&hl=true&hl.fl=val
    ue_t&fl=_yz_rk&q=value_t:natty
    notice the hl.fl
    changed to value_t, as
    did q
    45
    Monday, October 15, 12

    View Slide

  46. Highlighting Response 2
    Yeaaaaaa buddy!
    46
    Monday, October 15, 12

    View Slide

  47. TODO
    47
    Monday, October 15, 12

    View Slide

  48. HEADERS/METADATA
    TO FIELDS
    48
    Monday, October 15, 12

    View Slide

  49. SUPPORT SOLR UPDATE
    XML MESSAGES
    http://wiki.apache.org/solr/UpdateXmlMessages
    49
    Monday, October 15, 12

    View Slide

  50. PROTOTYPE/
    BENCHMARK
    50
    Monday, October 15, 12

    View Slide

  51. GOOD MIGRATION
    STORY FROM RIAK
    SEARCH TO YOKOZUNA
    51
    Monday, October 15, 12

    View Slide

  52. IINCLUDE WITH RIAK
    PROPER, VERSION 1.?
    52
    Monday, October 15, 12

    View Slide

  53. ONE MORE
    THING?
    53
    Monday, October 15, 12

    View Slide

  54. POWERED BY YOKOZUNA/SOLR
    54
    Monday, October 15, 12

    View Slide

  55. KTHXBAI
    @jrecursive (mecha)
    @dizzyd (best “boss” ever)
    @argv0 (wat dat smell like)
    @DstroyAllModels (shot of whiskey plz)
    @jon_meredith (CHOCOLATE CAKE)
    W. Hilton
    Apache Solr (thx for all the code)
    ElasticSearch & Datastax (for the inspiration)
    Mom & Dad
    @pharkmillups (dat hustle)
    @tsantero (you’re gonna owe me 2 bills)
    @coderoshi (riakdocs)
    55
    Monday, October 15, 12

    View Slide