Upgrade to Pro — share decks privately, control downloads, hide ads and more …

分散データベース Riak と オブジェクトストレージ RiakCS

ksauzz
August 06, 2013

分散データベース Riak と オブジェクトストレージ RiakCS

オープンソースカンファレンス 2013 @ Kyoto

ksauzz

August 06, 2013
Tweet

More Decks by ksauzz

Other Decks in Technology

Transcript

  1. 2013-08-03
    ΦʔϓϯιʔεΧϯϑΝϨϯε
    2013 @ Kyoto
    ෼ࢄσʔλϕʔεˍ

    ΦϒδΣΫτετϨʔδ

    Basho Japan KK

    Software Engineer

    Kazuhiro Suzuki

    View Slide



  2. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Agenda

    •  Basho
    •  Riak (෼ࢄσʔλϕʔε)
    •  RIakCS (ΦϒδΣΫτετϨʔδ)
    •  Ϣʔεέʔε

    View Slide



  3. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Basho Technologies, Inc.

    •  ઃཱ: 2008/01

    •  ຊࣾ: ϚανϡʔηοπभέϯϒϦοδ

    •  ࣾһ: ໿130໊

    •  ೔ຊ๏ਓ 2012/09 ઃཱ



    View Slide



  4. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    View Slide



  5. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    • ӡ༻ͷ༰қ͞
    • ߴՄ༻ੑ
    • ਫฏ֦ுੑ
    ઃܭํ਑

    View Slide



  6. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Ωʔ / όϦϡʔ/ όέοτ
    •  ΩʔɺόϦϡʔͷϖΞΛόέ
    οτ΁อଘ͢Δ
    •  όϦϡʔ͸ͲͷΑ͏ͳόΠφ
    ϦͰ΋Α͍  (JSON,  XML,  
    Msgpack,  etc…)  
    KEY

    KEY

    bucket  
    KEY
    VALUE

    VALUE

    VALUE

    View Slide



  7. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ϚελʔϨε
    •  ෳ਺ϊʔυͰΫϥελΛߏ੒  
    •  ͢΂ͯͷϊʔυ͸ର౳Ͱɺ  
    Ϛελʔ΍୯Ұো֐఺͸ͳ͍  
    •  ͢΂ͯͷϊʔυ͸ಉ౳Ͱɺ  
    ϦεΤετΛࡹ͖ɺσʔλΛ
    อ࣋͢Δ  
    node  
    node   node  
    node   node  

    View Slide



  8. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σʔλͷෳ੡

    •  160-­‐bit  ੔਺ͷΩʔྖҬ  =  Ring  
    •  RingΛ౳ִؒͰύʔςΟγϣϯʹ෼ׂ  
    •  ύʔςγϣϯΛΫϥελͷ֤ϊʔυʹ
    ׂΓ౰ͯΔ  
    node  0  
    node  1  
    node  2  
    node  3  

    View Slide



  9. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σʔλͷෳ੡

    •  160-­‐bit  ੔਺ͷΩʔྖҬ  =  Ring  
    •  RingΛ౳ִؒͰύʔςΟγϣϯʹ෼ׂ  
    •  ύʔςγϣϯΛΫϥελͷ֤ϊʔυʹ
    ׂΓ౰ͯΔ  
    •  bucket  /  key  ͷϋογϡ஋ʹΑΓɺ  
    อଘ͢ΔύʔςΟγϣϯΛܾఆ  
    node  0  
    node  1  
    node  2  
    node  3  
    hash(“bucket/key”)  

    View Slide



  10. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σʔλͷෳ੡

    •  160-­‐bit  ੔਺ͷΩʔྖҬ  =  Ring  
    •  RingΛ౳ִؒͰύʔςΟγϣϯʹ෼ׂ  
    •  ύʔςγϣϯΛΫϥελͷ֤ϊʔυʹ
    ׂΓ౰ͯΔ  
    •  bucket  /  key  ͷϋογϡ஋ʹΑΓɺ  
    อଘ͢ΔύʔςΟγϣϯΛܾఆ  
    •  ࿈ଓ͢ΔύʔςΟγϣϯʹෳ੡Λอଘ  
    node  0  
    node  1  
    node  2  
    node  3  
    hash(“bucket/key”)  




    View Slide



  11. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Ұ࣌ো֐ൃੜ࣌

    •  Ұ࣌తͳϊʔυো֐ʢnode  2ʣ͕ൃੜ  
    •  PUT,  GET,  DELETEϦΫΤετ͸ɺϑΥʔϧ
    όοΫϊʔυʢnode  0ʣ΁  
    node  0  
    node  1  
    node  2  
    node  3  
    hash(“bucket/key”)  




    View Slide



  12. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Ұ࣌ো֐ϊʔυͷ෮چ࣌

    •  Ұ࣌తͳϊʔυো֐ʢnode  2ʣ͕ൃੜ  
    •  PUT,  GET,  DELETEϦΫΤετ͸ɺϑΥʔϧ
    όοΫϊʔυʢnode  0ʣ΁  
    •  ো֐ϊʔυͷ෮چʢnode  2ʣ  
    •  “Handoff”ʹΑΓɺσʔλΛϑΥʔϧόο
    Ϋϊʔυʢnode  0ʣ͔Β෮چϊʔυ
    ʢnode  2ʣ΁Ҡߦ  
    •  ௨ৗӡ༻Λ࠶։  
    node  0  
    node  1  
    node  2  
    node  3  
    hash(“bucket/key”)  

    View Slide



  13. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σʔλΞΫηε/API

    View Slide



  14. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ΠϯλʔϑΣʔε

    Client
    HTTP
    ProtocolBuffer
    Java
    Ruby
    Python
    PHP
    Node.js
    Haskell
    etc…

    View Slide



  15. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    όέοτ/Ωʔࢦఆ

    BUCKET/KEY
    VALUE

    GET /buckets/people/keys/alice

    PUT /buckets/people/keys/alice

    DELETE /buckets/people/keys/alice
    KEY
    VALUE

    View Slide



  16. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ηΧϯμϦʔΠϯσοΫε(2i)

    •  binaryͱintegerܕ͕ར༻Մ  
    •  ׬શҰக΋͘͠͸ൣғࢦఆ(range)  
    KEY
    VALUE

    {

    “name”: “alice”,

    “age”: 32

    }



    14

    INDEX

    age_int: 32

    KEY: 14

    View Slide



  17. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    MapReduce

    •  σʔλͷ໰͍߹Θͤɺϑ
    ΟϧλϦϯάͷ෼ࢄɺղ
    ੳͱूܭ  
    •  Erlang,    JavaScriptͰهड़Մ  
    •  Erlangͷํ͕ߴ଎  

    View Slide



  18. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    શจݕࡧ (Yokozuna β)

    •  Riak  +  Solr  
    •  ೔ຊޠαϙʔτ  
    •  Riak  2.0  ͰϦϦʔε༧ఆ  

    View Slide



  19. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σϞ
    1.  1ϊʔυ͚ͩͷΫϥελͰ࢝ΊΔ
    2.  10 ݸͷKey/Value Λ௥Ճ
    3.  4ϊʔυΛΫϥελʹ௥Ճ (Join)
    4.  ࠷ॳͷϊʔυΛؚΊɺ2ϊʔυΛ kill -9
    5.  ͢΂ͯͷΩʔΛऔಘ

    View Slide



  20. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    • ӡ༻ͷ༰қ͞
    • ߴՄ༻ੑ
    • ਫฏ֦ுੑ
    ·ͱΊ

    View Slide



  21. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    View Slide



  22. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    • ΦϒδΣΫτετϨʔδ  
    • Riakͷ্ʹ࣮૷  
    • AWS  S3  ޓ׵API    

    View Slide



  23. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ΦϒδΣΫτετϨʔδʁ

    View Slide



  24. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    • αΠζͷେ͖͍ϑΝΠϧΛޮ཰తʹอଘ  
    • ϝσΟΞσʔλʢը૾ɺಈըɺԻ੠ʣ
    • όοΫΞοϓσʔλ  
    ΦϒδΣΫτετϨʔδʁ    

    View Slide



  25. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Riak CS Architecture

    Stanchion
    ΦϒδΣΫτ

    ૢ࡞
    block
    block
    block
    block
    block
    block
    manifest
    ü  ߴՄ༻ੑ

    ü  ෼ࢄ഑ஔ

    ü  ෳ੡

    όέοτ

    ૢ࡞
    Ϣʔβૢ࡞ɺ
    Ϩϙʔτ




    S3 REST API


    View Slide



  26. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    APIͱΠϯλϑΣʔε
     
     
    •  AWS  S3  REST  API४ڌ  
    •  ҰൠతͳS3  ޲͚ϥΠϒϥϦɺπʔϧΛར༻Մೳ  
    •  REST  GET,  PUTͱDELETE  ΦϖϨʔγϣϯ  
    •  S3-­‐style  ACLsɺόέοτϙϦγʔ  

    View Slide



  27. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σϞ

    DragonDisk

    http://www.dragondisk.com
    s3cmd

    http://s3tools.org/s3cmd

    View Slide



  28. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    • Riak  ͷ্ʹ࣮૷  
    • ؆୯ʹ࢖͑ΔΦϒδΣΫτετϨʔδ  
    • AWS  S3  ޓ׵API    
    ·ͱΊ

    View Slide



  29. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Enterprise
    ঎༻൛
    σʔληϯλʔؒϨϓϦέʔγϣϯ  

    View Slide



  30. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    σʔληϯλʔؒ
    ϨϓϦέʔγϣϯ


    ෳ਺DC  ؒͰͷɺยํ޲·ͨ͸  
    ྆ํ޲ͷσʔλಉظ  
     
    ར༻໨త  
    •  ॏେࡂ֐ԼͰ΋αʔϏεΛܧଓ  
    •  σʔλϩʔΧϦςΟ  
    •  ΞΫςΟϒόοΫΞοϓ  
    •  ProducYonΫϥελʔͱStage༻  
    ΫϥελʔʹΑΓݕূ؀ڥΛߏங  
    Primary  
    Cluster  (DC#1)  
    Secondary  
    Cluster  (DC#2)  
    Secondary  
    Cluster  (DC#3)  
    Client  
    Update  

    View Slide



  31. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Use  Cases  /  Case  Studies  

    View Slide



  32. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ओͳΧελϚʔ
    hZp://basho.com/riak-­‐users/ʹߋʹଟ͘ͷΧελϚʔ৘ใ͕ޚ࠲͍·͢ɻ  

    View Slide



  33. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ੡඼৘ใͷอ؅

    •  ֦ுੑͱϢʔβΤΫεϖϦΤϯεͷύϑΥ
    ʔϚϯεվળ  
    •  bestbuy.comͱখചΓళฯͰ࢖༻͢ΔΦϯ
    ϥΠϯ੡඼ΧλϩάɺϨʔςΟϯάʹRiak
    Λબ୒ɻ  
    •  Holiday  ShoppingʢΫϦεϚεηʔϧͳ
    Ͳʣ࣌ʹ߹Θͤͯϊʔυ௥Ճɻ  
    •  Bestbuy.comͷϗʔϜϖʔδϨϯμϦϯά
    ͷSLA͸̍ඵҎ಺ɻ  
    •  2013೥ͷHoliday  seasonʹ͸ɺSKU਺Ͱ26ඦ
    ສ͔Β500ඦສ΁֦ுΛ૝ఆɻ  
    •  Amazon  AWSͷෳ਺Availability  Zone্ʹRiak
    ΫϥελʔΛߏங  

    View Slide



  34. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ΦϯϥΠϯ޿ࠂ
    •  DCϨϓϦέʔγϣϯͱ֦ுੑͷඞཁ͔Β  
    MySQL  →  Cassandra  →  Riak  ͱϦϓϨʔε  
    •  ϢʔβΞΫςΟϏςΟσʔλ͓ΑͼτϥϑΟο
    ΫσʔλΛอଘ  
    •  τϥϑΟοΫσʔλ͸MySQL͔ΒҠߦʢෳ਺DC
    ؒͷσʔλϨϓϦέʔγϣϯ͕ඞཁʣ  
    •  Ϣʔβσʔλ͸Cassandra͔ΒҠߦʢbackward  
    compaYbility͕֬อͰ͖͍ͯͳ͍͜ͱ͕աڈͷ
    ϦϦʔεʹ͋Γɺ৴༻Ͱ͖ͳ͍ͨΊܾஅʣ  
    •  5ΧॴͷDCؒͰσʔλϨϓϦέʔγϣϯ  
    •  2011೥ʹ4  trillionͷadσʔλΛѻ͏ɻ  
    advertising
    conomic potential of
    uding OpenX
    mpTime) provide a
    bining ad serving, an ad exchange, a Supply Side Platform, and
    ach year.
    OpenX uses Riak for user and trafficking data
    behind its data services API. They selected Riak due
    to its highly available, low-latency, redundant
    architecture. OpenX also uses Riak’s  multi-
    datacenter replication across several data centers,
    providing up-to-date data throughout its global
    infrastructure.
    For more details about how OpenX uses Riak, check
    out the video of Anthony Molinaro, OpenX
    engineer, speaking at RICON2012,  Basho’s  2012  
    developer conference.
    ng technology provider.
    gencies, mobile operators,
    active and measurable
    obile devices. In 2009,
    rly all of the broadcasters
    le operators. With the
    dly, they needed to move to an architecture that could gracefully
    new platform because it is distributed, scalable, and highly
    mes of traffic.
    they opted to build two geographically separated, mirrored sites
    ation feature. As Marcus Kern, VP of Technology at Velti,
    ver  140  customers.  We  cannot  afford  a  single  minute  of  
    d exceed our requirements for scale, data durability, and
    2009೥࣌఺  

    View Slide



  35. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    ϝχϡʔγεςϜ
    •  ౰ॳAmazon  S3Λ࢖༻;  On  demandϝχϡʔͷϥ΢ϯ
    υτϦοϓϨΠςϯγʔͷ௿ݮΛୡ੒  
    •  ߴ଎ͳಡΈग़͠/ॻࠐΈͷͨΊRiakΛબ୒  
    •  Video  On  demandϝχϡʔͷΞάϦήʔτʹඞཁͳ
    σʔλΛอଘ  
    •  ϚʔέςΟϯάΩϟϯϖʔϯʹকདྷ࢖༻͢ΔͰ͋Ζ
    ͏Ϣʔβؔ࿈ͷ৘ใΛอଘ  
    •  ຖ೔ɺϦϞʔτίϯτϩʔϥ͔Βͷ2500ສΫϦοΫ
    ਺Λॲཧ  
    •  On  demandϝχϡʔͷͨΊʹ̏ΧॴͷDCʹRiakΫϥ
    ελʔΛߏஙɻ  
    •  ϚʔέςΟϯά༻ʹ̐ͭͷΫϥελʔΛߏங  

    View Slide



  36. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    •  Electronic  Health  Recordsʢॲํᝦ৘ใʣΛҩऀ΍
    ϝσΟΧϧγεςϜ͕༷ʑͳσόΠε͔ΒΞΫη
    ε  
    •  5.5MͷશσϯϚʔΫࠃຽ޲͚  
    Danish Health Services
    ϔϧεέΞ৘ใ؅ཧ

    View Slide



  37. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Riak CS  ͷϢʔεέʔε
    ύϒϦοΫ
    Ϋϥ΢υεετϨʔδ
    AWSҎ֎ͷS3४ڌͷ
    ετϨʔδ  
    Ϋϥ΢υυϥΠϒ  
    (ҰൠతͳίϯςϯπετϨʔδ)  
    Backup-­‐as-­‐a-­‐Service   ΞʔΧΠϒετϨʔδ ࣾһͱࣾ಺෦໳ͷͨ
    ΊͷετϨʔδ

    View Slide



  38. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    :BIPP+"1"/
    •  ΠϯλʔωοτγϣοϐϯάαΠτͷϓ
    ϥοτϑΥʔϜΛYahoo!δϟύϯ͕ఏڙ  
    •  γϣοϐϯάαΠτͷը૾σʔλΛRiak  
    CSʹετΞ  
    •  ొ࿥ΦϒδΣΫτ਺ɿ20ສ݅ʢ2012೥຤
    ࣌఺ʣ  
    •  ϦΫΤετ਺ɿ450  req/sec  
    •  Ϩεϙϯεɿ10ms  –  80ms  
    •  ߏஙɿ1೔ɻ  
    •  S3ޓ׵Ϋϥ΢υετϨʔδαʔϏεΛఏڙ  
    •  2ΧॴͷDCؒͰσʔλϨϓϦέʔγϣϯ  

    View Slide



  39. ©2013 BASHO TECHNOLOGIES INC. ALL RIGHTS RESERVED.

    Questions?

    •  Twitter: @BashoJapan
    •  ML: [email protected]
    •  [email protected]
    •  Basho/TED @ లࣔϒʔε

    View Slide

  40. View Slide