Upgrade to Pro — share decks privately, control downloads, hide ads and more …

April 2014: Franz Bettag "Through the database ...

Munich NoSQL
April 24, 2014
37

April 2014: Franz Bettag "Through the database jungle"

Munich NoSQL

April 24, 2014
Tweet

Transcript

  1. Who is this guy? Franz Bettag 28 years old (soon

    29) Nuremberg, Germany Scala, Liftweb and Netty Connoisseur FreeBSD, Redis, PostgreSQL and Cassandra aficionado Director of R&D and Network Operations at wasted.io Ltd.
  2. Seriously, what does he do.. Lead development on projects for

    the company Client consulting in special cases BGP, OSPF, etc, short - the Routing guy Molesting Open Source projects
  3. Not wasting IO anymore 17 years of coding (10 years

    commercial) 16 years on FreeBSD and NetBSD 8 years on PostgreSQL 6 years on Scala, Liftweb and Akka 3 years on Cassandra (since 0.6)
  4. What am i going to talk about? Common issues and

    general solutions Databases Combining Databases Special cases
  5. Common Issues and Solutions Coders choose a DB they felt

    most comfortable with Suggest and prove your suggestion IT Staff is outdated Update them, also prove your suggestion Fear of Change/Never change a running System Take extensive testing into account Screwy code/code duplication Takes more money, their fault. explain that! Old Language/Project Yeah, it’s gonna cost you.
  6. Once picked, the most common mistakes Only test a few

    corner cases Worked with 2 concurrent users (developers) Worked with 5 real concurrent users testing it Worked with 10000 Rows Use a *SINGLE* database to do all the work
  7. Poor choices Using Full-Text Index in your RDBMS It’s 2014,

    we have SOLR and ElasticSearch ! Using RDBMS as cache for other RDBMS-generated data There is stuff like memcached and Redis ! Using RDBMS for bulk inserts with client polling Check your technology, or develop better one
  8. Why SOLR or Elastic Search? Boosting Queries Geospatial Queries Stemming

    Words Language Support Performance Streaming Results (Elastic Search) And many more You’ve got Apache Lucene under the Hood!
  9. Why Redis or Memcached? Caching Data which is slow to

    query Ease database query load Data needed for limited time (TTL) Database has big tomb-stoning/vacuuming penalty No way to store it in the running application Not designed for sharding Need data easily accessible outside of your app
  10. Which database? Well we like Apache Cassandra with Apache SOLR

    and Apache Mahout and Apache Pig ! Especially Datastax Enterprise Have you seen their startup program? AWESOME!
  11. So what’s up with PostgreSQL INET/CIDR DataType Query for incl/excl.

    of INET in CIDR Range Do a variety of stunts on them
  12. PostgreSQL combined with Big Data IP Based Analysis (netflow.io) !

    Cassandra stores each NetFlow and Counters PostgreSQL contains the keys to query for an IP Range ! Indexed Scan across a table containing ca. 2k Ranges 0.2ms ! Querying Cassandra for current total of 127 IPs 3ms
  13. Cassandra and SOLR (DSE) Network Monitoring (netflow.io) ! Cassandra stores

    the alerts generated by Snort SOLR Indexes on those Events (severity, type, etc) PostgreSQL knows the key for the IP Ranges ! SOLR search for all Events regarding Hosts X, Y and Z 16ms
  14. SOLR and other Databases SOLR is great for searching, except

    IP Ranges ;) ! ORMs usually have callbacks onSave - write a new version of the data to SOLR ! Query SOLR Write to the Database
  15. It’s not all rainbows SOLR needs a Schema (can be

    tricky the first few times) ! Structuring your data right IP ADDRESSES ARE NOT INTEGERS! IPv6 OMFG ! If there is nothing out there?
  16. So we had this client... Heavy notification-system using RDBMS Publisher-Follower-System

    including a Web-Based Live-Video-Chat system more than 4 million users multiple publishers with 40k+ followers 200k inserts/sec at peak times used MySQL, Memcache, PHP Clients polling every 5 seconds for notifications what do you do?
  17. What was the setup? He had TWO 48 core, 64GB

    MySQL One read-only, one write-only (master) both using well about 60-80% of all CPUs constantly ! So many hard-disks died So many hours went by for MySQL restores So much data was being held just temporarily Bulk deletion took out MySQL a couple of times
  18. What we did Reduced the load on the write-only host

    to 10-12% ! Reduced the load on the read-only host to 20-30% ! Introduced two major changes SOLR for Full-Text Search (not the Live-Chat) Designed/Wrote thruput.io (our own product)
  19. Scales up to 250k concurrents on 4 CPUs/2GB Browser (down

    to IE6) incl. JavaScript API (tiny) Native API (OSX/IOS - drag and drop) Scala API (JAR) Delivery using curl/wget Apple Push and Google Cloud Messaging builtin Geo-targetted messaging Uses Cassandra for persistency Specifically
  20. What we learned today Diversify! Use Applications for what they’re

    made for Don’t be afraid of change Evaluate existing setups constantly Sometimes you have to write your own stuff. :)