Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond NoSQL - Distributed Databases in Production

Alex
December 10, 2013
55

Beyond NoSQL - Distributed Databases in Production

This webinar, hosted by Matt Aslett (Research Director at 451 Research) and Bobby Patrick (EVP and CMO at Basho Technologies), discusses the history of NoSQL, the current NoSQL landscape, and then dives into Basho's Riak. It also includes a case study from Riak User, Tapjoy, about how they use Riak as the cornerstone of their data management strategy. Finally, it wraps up with a look at what's to come with Riak 2.0.

Alex

December 10, 2013
Tweet

Transcript

  1. ©  2013  by  The  451  Group.  All  rights  reserved  

      Beyond  NoSQL     Distributed  Databases  in  ProducFon   MaHhew  AsleH        Research  Director,  Data  Management  and  AnalyFcs,  451  Research   Bobby  Patrick              EVP  and  CMO,  Basho  Technologies   Wes  Jossey                      Systems  Engineer,  Tapjoy  
  2. ©  2013  by  The  451  Group.  All  rights  reserved  

      In  a  nutshell   §  A  Brief  History  of  NoSQL   §  NoSQL  Drivers  and  AdopFon  Trends   §  The  NoSQL  Database  Landscape   §  Scalability  and  Distributed  Architecture   §  Basho’s  Riak   §  Tapjoy  Case  Study   §  Riak  2.0  and  Beyond  
  3. ©  2013  by  The  451  Group.  All  rights  reserved  

      Ma#hew  Asle#   •  Research  Director,  Data  Management  and  AnalyFcs   §  [email protected]   §  www.twiHer.com/masleH   §  Responsible  for  data  management     and  analyFcs  research  agenda     §  Focus  on  operaFonal  and  analyFc     databases,  including  NoSQL,     NewSQL,  and  Hadoop   §  With  451  Research  since  2007  
  4. ©  2013  by  The  451  Group.  All  rights  reserved  

      Bobby  Patrick   •  ExecuFve  Vice  President  and  CMO   §  [email protected]   §  @bpatrick001   §  Responsible  for  company’s  business            operaFons,  strategic  partnerships  and          product  strategy       §  Prior  to  joining  Basho,  Bobby  was  CMO            for  GXS  and  Digex.  He  also  previously            worked  for  Accenture  and  the  FBI.  
  5. ©  2013  by  The  451  Group.  All  rights  reserved  

      Wes  Jossey   •  Systems  Engineer   §  @dustywes   §  Tapjoy  is  a  leading  mobile            adverFsing  and  moneFzaFon          pla`orm       §  Runs  the  Systems  Team  for  the  past          four  years      
  6. ©  2013  by  The  451  Group.  All  rights  reserved  

      Company  Overview   §  One  company  with  3  operaFng   divisions   §  Syndicated  research,  advisory,   professional  services,  datacenter   cerFficaFon,  and  events   §  Global  focus   §  270+  staff   §  1,500+  client  organizaFons:   enterprises,  vendors,  service   providers,  and  investment  firms   §  Organic  and  growth  through   acquisiFon  
  7. ©  2013  by  The  451  Group.  All  rights  reserved  

      Unique  combinaFon  of  research,  analysis  &  data   Emerging  tech  market  segment  focus   Daily  qualitaFve  &  quanFtaFve  insight   Analyst  advisory  &  Go-­‐to-­‐market  support   Global  events  
  8. ©  2013  by  The  451  Group.  All  rights  reserved  

      A  brief  history  of  NoSQL   Google  publishes     BigTable  research   Amazon  publishes     Dynamo  research   2006   2007   2008   2009   2012   2013   2010   2011   Cassandra     open  sourced   Voldemort  open   sourced   HBase   started   Accumulo   started   First  “NOSQL”   Meet-­‐up   451   blog   coverage   451     Research   coverage   “NoSQL,   NewSQL   and     Beyond”     MongoDB   founded   DataStax    founded   Basho     founded   Couchbase   founded   Sqrrl   founded   Oracle   NoSQL  
  9. ©  2013  by  The  451  Group.  All  rights  reserved  

      What  has  driven  the  development  and  adopFon  of  NoSQL?   §  NoSQL,  NewSQL  and  Beyond   •  Assessing  the  drivers  behind  the  development  and  adopFon   of  NoSQL  and  NewSQL  databases,  as  well  as  data  grid/ caching  technologies   •  Released  April  2011   •  Role  of  open  source  in  driving  innovaFon  
  10. ©  2013  by  The  451  Group.  All  rights  reserved  

      SPRAINED  RELATIONAL  DATABASES   Photo  credit:     Foxtongue  on  Flickr   hHp://www.flickr.com/photos/foxtongue/ 4844016087/    
  11. ©  2013  by  The  451  Group.  All  rights  reserved  

      Database  SPRAIN   §  The  tradiFonal  relaFonal  database  has  been  stretched  beyond  its   normal  capacity  by  the  needs  of  high-­‐volume,  highly  distributed  or   highly  complex  applicaFons.       §  There  are  workarounds  –  such  as  DIY  sharding  –  but  manual,   homegrown  efforts  can  result  in  database  administrators  being   stretched  beyond  their  normal  capacity  in  terms  of  managing   complexity.   §  Scalability   §  Performance   §  Relaxed  consistency    Increased  willingness  to  look  towards   §  Agility      emerging  alternaWves   §  Intricacy   §  Necessity      
  12. ©  2013  by  The  451  Group.  All  rights  reserved  

      Necessity  is  the  mother  of  NoSQL   §  Hadoop  and  NoSQL  innovaFon  did  not  come  from  exisFng  relaFonal   database  and  storage  suppliers   §  It  came  from  Google,  Amazon,  Facebook,  Yahoo,  LinkedIn  and  open   source  communiFes…           §  This  has  significantly  altered  the  relaFonship  between  customer   and  vendor,  and  changed  the  database  landscape  enormously   §  And  also  generated  a  new  breed  of  database  vendors  and  database   products     “  We  couldn’t  bet  the  company  on  other  companies  building   the  answer  for  us.”   –  Werner  Vogels,  Amazon  CTO  
  13. ©  2013  by  The  451  Group.  All  rights  reserved  

      A  brief  history  of  NoSQL   Google  publishes     BigTable  research   Amazon  publishes     Dynamo  research   2006   2007   2008   2009   2012   2013   2010   2011   Cassandra     open  sourced   Voldemort  open   sourced   HBase   started   Accumulo   started   First  “NOSQL”   Meet-­‐up   451   blog   coverage   451     Research   coverage   “NoSQL,   NewSQL   and     Beyond”   451  NoSQL   market  size   esWmates   Updated     451  NoSQL   market  size   esWmates     MongoDB   founded   DataStax    founded   Basho     founded   Couchbase   founded   Sqrrl   founded   Oracle   NoSQL  
  14. ©  2013  by  The  451  Group.  All  rights  reserved  

      §  NoSQL  database  market  revenue  and  forecast  ($M)   NoSQL  databases   Next-­‐GeneraFon  OperaFonal  Databases  2012-­‐2016  
  15. ©  2013  by  The  451  Group.  All  rights  reserved  

      The  NoSQL  database  landscape   Wide-­‐column   stores     Data  is  mapped  by   a  row  key,  column   key  and  Fme   stamp.   Key  Value   Stores     Store  keys  and   associated  values.   Graph   databases     Store  data  and  the   relaFonships   between  data.   Document   stores     Store  all  data   related  to  a   specific  key  as  a   single  document.     DATA  MODEL  COMPLEXITY  
  16. ©  2013  by  The  451  Group.  All  rights  reserved  

      The  NoSQL  database  landscape   Wide-­‐column   stores     Data  is  mapped  by   a  row  key,  column   key  and  Fme   stamp.   Key  Value   Stores     Store  keys  and   associated  values.   Graph   databases     Store  data  and  the   relaFonships   between  data.   Document   stores     Store  all  data   related  to  a   specific  key  as  a   single  document.     MulF-­‐model  databases     Support  a  combinaFon  of  the  various  individual  NoSQL  data   models.   •   While  Riak  is  a  key-­‐value  store,  the  value  can  be  a  JSON,  XML  or   HTML  document   DATA  MODEL  COMPLEXITY  
  17. ©  2013  by  The  451  Group.  All  rights  reserved  

      NoSQL  databases  are  not  created  equal Apache   Cassandra   Basho     Riak   MongoDB   Neo4j   Scalability   Performance     Relaxed   consistency   Agility     Intricacy       Necessity   Very  high  level  view     The  devil  is  in  the  details,  and  different  databases  have  different   strengths  and  design  goals     But  scalability  is  one  of  the  core  differenFaFng  factors,  that  is   likely  to  come  in  for  increasing  aHenFon  as  the  industry  shiws   towards  distributed  architecture   e.g.  Basho  Riak  and  Apache  Cassandra  are  distributed  databases     Another  factor  is  operaFonal  simplicity,  where  we  would  see   Basho  Riak  and  MongoDB  scoring  beHer  than  others  
  18. ©  2013  by  The  451  Group.  All  rights  reserved  

      Scalability  and  distributed  architecture   §  InteracFve  applicaFons  means  the  pace  of  user  growth  and   mulFplicity  of  data  types  is  too  great  for  the  relaFonal  model  to   efficiently  absorb.     §  AddiFonally,  enterprise  architectures  have  shiwed  from  a  scale-­‐up   to  a  scale-­‐out  approach  to  make  use  of  distributed  hardware.   •  Greater  scalability  demands   •  Predictable  performance  problems   §  New  requirements:   •  ProliferaFon  of  cloud   •  Geo-­‐distributed  data  
  19. ©  2013  by  The  451  Group.  All  rights  reserved  

      Scalability  and  distributed  architecture   users   users   users   database   §  While  companies  could  previously  rely  on  building  out  their  data   presence  in  a  single  region    
  20. ©  2013  by  The  451  Group.  All  rights  reserved  

      Scalability  and  distributed  architecture   users   users   users   database   users   users   users   database   users   users   users   database   §  While  companies  could  previously  rely  on  building  out  their  data   presence  in  a  single  region,  and  then  replicate  it  as  they  grew    
  21. ©  2013  by  The  451  Group.  All  rights  reserved  

      Scalability  and  distributed  architecture   users   users   users   database   users   users   users   database   users   users   users   database   users   users   users   database   users   users   users   database   users   users   users   database   §  Increasingly  they  have  to  be  be  prepared  for  instant  data  availability,   globally,  from  day  one,  with  true  globally  distributed  processing  
  22. ©  2013  by  The  451  Group.  All  rights  reserved  

      Scalability  and  distributed  architecture   §  TradiFonal  relaFonal  database  were  never  designed  to  cope  with   modern  applicaFon  requirements   •  Geographic  distribuFon   •  MulFple  data  types   §  The  tradiFonal  relaFonal  database  has  been  stretched  to  breaking   point,  encouraging  users  to  look  at  alternaFves   §  NoSQL  arose  out  of  the  failure  of  incumbent  database  providers  to   respond  to  emerging  applicaFon  and  architecture  requirements   §  NoSQL  database  are  now  being  adopted  by  mainstream  enterprises   to  fulfill  their  requirements  for  next-­‐generaFon  data  management      
  23. About  Basho   24   •  Founded  in  2008  

    •  130  employees   •  Creators  of  Riak,  the  open-­‐source,   distributed  database     •  Basho  offices  in  Cambridge,  London,   Washington  DC,  San  Francisco,  and   Tokyo   •  Basho  customers  include  1/3  of  the   Fortune  50   •  Sponsor  of  RICON,  the  industry’s   exclusive  distributed  system   conferences   •  Over  200  Riak  meetups  worldwide   in  2013  
  24. Why  Distributed  Data  is  so  Important   25   Everything

     works  at   small  scale   What  happens  when   something  goes  wrong   •  Scale  out,  up  and  down,   predictably  and  linearly   The  customer   experience  ma#ers   •  Survive  server,  network,  or  data   center  failures   •  Data  locality  enables  data   operaFons  close  to  end-­‐users   Sales   Customers   Developers   OperaWons   Business  Impacts  using  Distributed  Data   =  
  25. Riak  Core  Concepts   26   Riak  is  masterless  

      •  MulFple  nodes  operate  to  form  a  cluster   •  Every  node  in  a  Riak  cluster  is  created   equal   •  Data  is  replicated  automaFcally  to  n   nodes   •  ReplicaFon  properFes  are  tunable   Riak  uses  consistent  hashing  to   form  a  logical  ring   •  Allows  for  organizaFon  of  data  storage   and  replicaFon  across  logical  parFFons   •  Physical  nodes  “claim”  virtual  nodes   throughout  “the  ring”   •  Every  node  can  take  applicaFon  requests   •  Every  node  is  aware  of  the  physical   locaFon  of  data  in  a  cluster  
  26. Riak  Core  Concepts   27   Riak  scales  linearly  and

      predictably   •  Adding  2x  nodes  increases   throughput  2x   •  Single  Riak  command  can  join   a  new  node  to  a  cluster   Data  distribuFon  is  automaWc   and  behind-­‐the-­‐scenes   •  Adding  addiFonal  servers   automaFcally  handoffs  data   and  increases  throughput  and   storage  capacity  with  liHle   operaFonal  overhead  
  27. Riak  MulF-­‐Datacenter  (MDC)  ReplicaFon   Riak automatically replicates between clusters

    •  Configurable number of remote replicas •  Options for real-time sync and full sync •  Spanning tree support for cascading replication Geo-Data Locality allows localized data processing •  Reduced latency to 
 end-users •  Allows sub 5ms responses •  Active-Active ensures continuous user experience
  28. Riak  is  designed  to  survive   failure  scenarios   • 

    Eventual  consistency   •  Gossip  protocol   •  Hinted  handoff   Riak  has  built-­‐in  self-­‐ healing  properWes     •  Read  Repair   •  AcFve  AnF-­‐Entropy   Riak  is  Built  to  Survive  Failures  
  29. Key-­‐Value  Store   Riak  Services  +  Extras   HTTP  API

      Class  Libraries   Full-­‐Text  Search   Secondary  Indexes   Riak  CS:  S3  and  Swiw  API   Riak  CS:  MulF-­‐Tenancy  
  30. 80%  Reduced  Total   Cost  of  Ownership  vs.   TradiFonal

     Databases   1/5  operaFons   personnel  required  vs   Apache  Cassandra   Built  for  Commodity  Hardware  and   OperaFons  Friendliness  
  31. Popular  Use  Cases   •  User  AcFvity     • 

    Session  Data   •  LocaFon  Data   •  Content  Management   and  DistribuFon   •  Customer  APIs   •  Sensor  Data     •  Machine-­‐to-­‐Machine   •  “Internet  of  Things”   •  Point-­‐of-­‐Sale   •  Flash  Retail   •  Ad  Networks   •  Product  Catalog   •  RecommendaFons  Engine   •  Mobile  Payments   •  Health  Monitoring   •  Health  Records   •  Supply  Chain  Visibility   •  S3-­‐CompaFble  Public   Storage   •  Swiw-­‐CompaFble   Cloud  Storage   •  Private  Cloud  Storage   •  Private  Dropbox   •  Cloud  CompuFng  Drive   •  Media  Streaming   •  Media  Archival   OPERATIONAL  DATA   CRITICAL  DATA   CLOUD  STORAGE   400   Largest  Known  Cluster   550,000   Fastest  Known  Ops/Sec   1ms   Lowest  Known  Latency   Riak  known  staFsFcs  
  32. Proven  in  ProducFon   Health  Care   Media    

    U.li.es   Social   Web   Gaming   “Using  Riak  to  insure  high   data  availability  helps   avoid  health  risks,  and  in   the  worst  case,  paWent   death”   “We selected Riak because Cassandra simply could not keep up“ “To enable rapid iteration at scale, Riot moved to Riak to support millions of concurrent players at any moment” “Riak is used to manage thousands of users and millions of meters that create billions of data points“ “A  massive  data  explosion  is   at  the  center  of  our  growth   strategy.  Riak  is  a  criWcal   component  of  our  new  IT   pla_orm”   “By far our biggest problem with Riak is that we don’t have problems with Riak“
  33. About  Tapjoy   • Mobile  performance-­‐based  adverFsing  pla`orm     • Reaches

     435MM  mobile  users  each  month   • Available  on  over  1  billion  devices  around  the  world     • Billions  of  requests  per  day  across  the  Tapjoy  pla`orm   • Global  traffic  means  there  is  never  a  “low”  period    
  34. Riak  at  Tapjoy   • 75,000  writes/sec   •  Average  write

     latency  <2ms   • 80,000  reads/sec   •  Average  read  latency  <750  microseconds   • Three  separate  producFon  clusters   • Fully  redundant  across  mulFple  data  centers  
  35. 2.0

  36. Riak  2.0   •  Riak  Data  Types:  Simplify  applicaFon  development

     without  sacrificing   Riak’s  availability  and  parFFon  tolerance  characterisFcs   •  Strong  Consistency:  Flexibility  to  choose  between  high  availability  and   strong  consistency  on  a  per  bucket  basis   •  Riak  Search  2.0:  Full-­‐text  search  integraFon  with  Apache  Solr   •  Security:  AuthenFcaFon  and  AuthorizaFon  provided  via  client  APIs   •  Simplified  ConfiguraWon  Management:  Improves  operaFonal  simplicity   by  changing  how,  and  where,  configuraFon  informaFon  is  stored  in  a   transparent  format   •  Bucket  Types:  Operators  can  define  a  group  of  buckets  that  share  the   same  properFes  and  only  store  informaFon  about  each  Bucket  Type   instead  of  individual  buckets  
  37. Riak  2.0   •  Default  Change  for  Sibling  ResoluWon:  New

     clusters  will  handoff  siblings   to  applicaFons  by  default,  versus  vector  clock-­‐based  Last  Write  Wins   •  More  Efficient  Use  of  Physical  Memory:  Local  databases  can  dynamically   change  their  cache  size  as  the  cluster  fluctuates  under  load,  improving   LevelDB’s  use  of  RAM   •  Reduced  Replicas  for  MulWple  Data  Centers:  BeHer  maintain  a  balance   between  storage  overhead  and  availability   Technical  Preview  Available  Today  on  docs.basho.com  or  GitHub  
  38. More  InformaFon   40   Everything  Riak:  docs.basho.com   Resources:

     basho.com/resources/   Hangouts:  youtube.com/BashoTechnologies   About  Distributed  Systems:  ricon.io   Online  Books:  liHleriakbook.com   Sign  Up  for  a  free  Riak  Tech  Talk:  basho.com  
  39. ©  2013  by  The  451  Group.  All  rights  reserved  

        QuesFons?  Comments?   @bpatrick001   @masleH