Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DataLayer AustinRB

DataLayer AustinRB

Regina Imhoff

June 05, 2017
Tweet

More Decks by Regina Imhoff

Other Decks in Programming

Transcript

  1. ABOUT DATALAYER ➤ Was held May 17, 2017 at Alamo

    Drafthouse South Lamar ➤ 2nd ever DataLayer ➤ Hosted by Compose ➤ Database-as-a-Service platform ➤ Formerly MongoHQ ➤ https://datalayer.com
  2. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ Monitoring

    your systems is largely useless ➤ it sucks ➤ Observability is what you need ➤ build a system that you can understand ➤ Our systems have become so complex that the old system of 1000 dashboards can no longer cut it
  3. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ All

    we have are unreliable symptoms and reports ➤ We need a way to ask questions ➤ As soon as we know the question we usually know the answer too ➤ Modern web apps exist between the nodes ➤ Apps and microservices are so complex that we no longer know the QUESTION ➤ “Complexity is exploding everywhere, but our tools are designed for a predictable world”
  4. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ Problem:

    photos are loading slowly ➤ LAMP stack edition ➤ App tier capacity exceeded ➤ Connections to DB are slower than normal ➤ ==> Connection timeout and latency to rise ➤ Microservices edition ➤ On one of our 50 micro services, one node is running on degraded hardware ➤ Photos loading fine, but Canadian users running an French language pack on one iPhone version are hitting a firmware condition making them unable to save local cache ➤ Our newest SDK makes additional DB queries if the developer has enabled an optional feature
  5. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ “There

    are no more easy problems in the future, there are only hard problems (Duh…you fixed the easy ones)” ➤ Observability means being open-ended and exploratory ➤ Debug by asking questions, not by using muscle memory ➤ Stop debugging with your eyeballs ➤ Debug with data! ➤ Debugging is a social act ➤ Solving problems is mentally taxing ➤ Sharing information and solutions is not taxing
  6. ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ What

    is a container? ➤ Stand-alone executable package of software with everything you need to run it: code, runtime, tools, libraries, settings, etc. ➤ Sounds like a Virtual Machine… ➤ VM has a complete guest operating system, containers don’t have log in or other unnecessary parts of OS ➤ Containers can pack a lot more apps into a single server than a VM can ➤ VM better for running unknown code - it can only mess up the VM OS, not your OS
  7. ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ Benefits

    of Containers: ➤ Packaging ➤ Eliminate problems with different devs having different version #s ➤ Container images are immutable ➤ Performance ➤ Containers are faster because you aren’t running an entire different OS ➤ Efficiency ➤ Containers are small ➤ You can run lots of them if you want to!
  8. ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ What

    is Kubernetes? ➤ Open source orchestration system for Docker containers ➤ Automates deployment, scaling and management of containerized applications ➤ If a node goes down Kubernetes will roll a new instance Observe Analyze Act
  9. ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ Queue: optimize

    conversion funnel and activating product advocates ➤ Kickstarter campaigns & gamification of getting more users ➤ The clients wanted to use MongoDB (heard it was good!) ➤ Ruby on Rails with MongoDB, Redis, and Heroku ➤ In MongoDB, the document size is limited to 16MB ➤ Translated to ~100 users per campaign ➤ Campaigns could have +100k users, so this wasn’t going to work!
  10. ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ They tried

    lots of different ways to make MongoDB work ➤ Went from 1 master list to 3 lists ➤ Tried MapReduce ➤ ~30s for 3000 documents ➤ Introduced Sidekiq to help with background processes ➤ 1 HOUR for 329,000 documents ➤ Decided that MongoDB was good for simple operations, but not their app ➤ They couldn’t think of more to do and so his gut told him MySQL was the best option
  11. ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ Wrote a

    NodeJS tool to migrate the MongoDB to MySQL ➤ Average response time: ➤ Heroku + MongoDB = 150ms ➤ Amazon Aurora + MySQL = 50ms ➤ Query time: ➤ MongoDB = 1 hour ➤ MySQL = 40 seconds ➤ Without any other query improvements ➤ All of this cost them a year of developer time ➤ 2015 was just putting out fires
  12. JONAS HELFER - JOINS ACROSS DATABASES WITH GRAPHQL ➤ GraphQL:

    an application layer query language that interprets a string by a server then returns the data in a specified format ➤ A replacement for REST ➤ Easier to consume ➤ Get exactly the data you ask for ➤ Not more or less data than you asked for! ➤ Easier to produce ➤ Write resolvers to write a query to return user #1’s SQL record ➤ Resolver: functions that resolve a field to its value ➤ Version free ➤ The client decides what data it wants, so you can support many different client versions (instead of versioning endpoints)
  13. JOSHUA DRAKE - POSTGRESQL ➤ PostgreSQL is the best (???)

    ➤ 40+ years old (Note: Wikipedia says 20 years) ➤ Postgres NoSQL = unstructured + relational database ➤ still ACID (Atomic, Consistent, Isolation, and Durable) ➤ Relational database ➤ Can set up Master Slave replication ➤ Copying database information to a second system to create high availability and redundancy ➤ Cool new features ➤ Date and time auto-magically change based on timezone when the data was entered vs. timezone when queried! ➤ Connect to more stuff with 1 (Postgres) API
  14. LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ ➤ RabbitMQ

    is an open source message queue ➤ Queues help your application by ➤ Introducing coupling points ➤ Asynchronously process tasks ➤ Enable parts of the system to scale appropriately ➤ An event in one system causes 100+ other actions
  15. LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ ➤ Implement

    retries ➤ Identify if a message should be retried ➤ Create a new message with the same data ➤ Add retry count ➤ Ack the original message ➤ Reject after #x attempts ➤ Use a dead letter exchange if you can’t process the message! ➤ Republish the message to another exchange when: ➤ A message is rejected ➤ The time-to-live (TTL) for the message expires ➤ The queue length limit is exceeded
  16. AMY UNRUH - SCALING OUT SQL DATABASES WITH SPANNER ➤

    Spanner: Google’s globally distributed NewSQL database ➤ NewSQL? Modern relational database that combines the scalable performance of NoSQL while keeping the ACID guarantees ➤ Scales horizontally ➤ Distributed database ➤ Fully managed ➤ Traditional SQL transactions ➤ Currently works with Python, Java, NodeJS, and Go ➤ More languages to come!
  17. AMY UNRUH - SCALING OUT SQL DATABASES WITH SPANNER ➤

    No Foreign Keys! (but you have primary keys) ➤ Instead you interleave definitions ➤ Table rows will be physically co-located ➤ Increases leverage for Spanner to distribute data ➤ It’s a bad idea to create non-interleaved indexes on column whose values are monotonically increasing or decreasing (even if they aren’t primary key columns) ➤ Note: Spanner will be slow if you interleave EVERYTHING, so only interleave with discretion ➤ No downtime for schema migration ➤ No free tier :(
  18. REGINA’S NOTE RE: SPANNER ➤ A really good overview is

    at this link: ➤ https://opencredo.com/google-spanner-first-look/
  19. EMILE BAIZEL - BUILDING A FINTECH BOT ➤ Built with

    MongoDB and Elasticsearch ➤ Users can text simple commands to a bot to get balance, move money from checking to savings, etc. ➤ What they needed for this bot: ➤ Zero False Positives ➤ “I don’t know” is better than a wrong answer ➤ Only get 1 chance to respond
  20. EMILE BAIZEL - BUILDING A FINTECH BOT 1. Gather and

    map responses in MongoDB ➤ map the message to the intended action 2. Index Message in Elasticsearch ➤ Stemming: accounting -> account ➤ Fuzzy query: acount -> account 3. Weigh frequencies ➤ High frequency: I, a, the ➤ Low frequency: balance, account, checking 4. Test it ➤ Test especially for false positives 5. Ship it
  21. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Former community manager at RethinkDB ➤ Business model: ➤ Open Core ➤ SaaS + Open SaaS ➤ Professional Services
  22. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Open Core ➤ Core features are open source ➤ Upgraded features are paid ➤ Pro: financially sustainable ➤ Con: seen as corporate/not really open source ➤ Examples: ➤ GitLab ➤ MySQL ➤ SugarCRM
  23. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Open SaaS ➤ Hosted and maintained by service provider ➤ Roadmap for product is done by the community ➤ Examples: ➤ MongoDB ➤ EnterpriseDB
  24. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Professional Services ➤ Training and services are paid ➤ Example: ➤ Cloudera
  25. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ What went wrong at RethinkDB ➤ Lost funding ➤ Lost IP ➤ Had to purchase IP back ➤ Joined Linux Foundation ➤ Focused on building community and not on building a business model (and sticking with it) ➤ Hired Christina (community manager) when they should have been hiring marketing/sales/sales engineers
  26. JOHN SINGLETON - STREAM-FILTER-DRAIN: A NEW PARADIGM ➤ watchful.io ➤

    Stream-Filter-Drain(SFD) ➤ Alternative to Extract-Transform-Load(ETL) data pipelines ➤ Not just finding needle in haystack, there are only needles in the stack ➤ Make a real time stream processing tool ➤ Work with data at massive scale ➤ Keep it fast!
  27. JOHN SINGLETON - STREAM-FILTER-DRAIN: A NEW PARADIGM ➤ Stream ➤

    Data is fast before it is big ➤ Event processor ➤ Filter ➤ Grammar restricted RegEx ➤ Content based filtration ➤ Drain ➤ Pull matches by key tag ➤ Goes to data consumers ➤ Application, microservices, Spark, etc.
  28. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Apps

    now are 12-factor monolithic beasts ➤ Hard to maintain ➤ Bit rot ➤ Microservices ➤ Part of a cluster of services ➤ Does 1 thing ➤ Coordinates with other services
  29. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Serverless

    ➤ Amazon Lambda ➤ There is no cloud, just somebody else’s computer ➤ It isn’t: ➤ Small services ➤ Fine grained billing ➤ Cost as a constraint ➤ Distributed ➤ “Stateless”
  30. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Serverless

    Upside ➤ Smaller operational burden ➤ Scalable ➤ Easy to expand within the service ➤ Serverless providers ➤ OpenWhisk ➤ Lambda ➤ Google Cloud Foundry
  31. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Example:

    Redis ➤ Fast ➤ Well supported ➤ Well Documented ➤ Fast ➤ Fixes the “stateless” problem ➤ Language agnostic ➤ Can lower costs by billing by milliseconds rather than by hour/minute ➤ You don’t have to do everything yourself!!
  32. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ What are

    data streams? ➤ Transfer of data at steady high speed rate
  33. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Uses for

    data streams ➤ Streams/firehoses from IoT ➤ Monitoring systems ➤ Analytics ➤ Ecommerce (search, price monitoring, etc) ➤ Fraud detection (payments, cyber security)
  34. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Typical approach

    has limitations ➤ Middleware logic that connects DB with realtime protocols ➤ Can’t deal with complex realtime scenarios ➤ Doesn’t scale well ➤ Alternative: Streaming Database System ➤ Take the best parts of a DB system and add realtime protocols ➤ Middleware layer is optional!
  35. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Elasticsearch ➤

    Distributed text search ➤ Based on Lucene ➤ Scales to many nodes easily ➤ Percolation ➤ Indexes queries and filter documents against the indexed queries to know which queries they match ➤ Allows reverse search ➤ Matches when new documents are added
  36. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Stream Pros

    ➤ Performant —> Nginx! ➤ Can work anywhere —> Docker! ➤ All the good parts of existing data layers