Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DataLayer AustinRB

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

DataLayer AustinRB

Avatar for Regina Imhoff

Regina Imhoff

June 05, 2017
Tweet

More Decks by Regina Imhoff

Other Decks in Programming

Transcript

  1. ABOUT DATALAYER ➤ Was held May 17, 2017 at Alamo

    Drafthouse South Lamar ➤ 2nd ever DataLayer ➤ Hosted by Compose ➤ Database-as-a-Service platform ➤ Formerly MongoHQ ➤ https://datalayer.com
  2. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ Monitoring

    your systems is largely useless ➤ it sucks ➤ Observability is what you need ➤ build a system that you can understand ➤ Our systems have become so complex that the old system of 1000 dashboards can no longer cut it
  3. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ All

    we have are unreliable symptoms and reports ➤ We need a way to ask questions ➤ As soon as we know the question we usually know the answer too ➤ Modern web apps exist between the nodes ➤ Apps and microservices are so complex that we no longer know the QUESTION ➤ “Complexity is exploding everywhere, but our tools are designed for a predictable world”
  4. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ Problem:

    photos are loading slowly ➤ LAMP stack edition ➤ App tier capacity exceeded ➤ Connections to DB are slower than normal ➤ ==> Connection timeout and latency to rise ➤ Microservices edition ➤ On one of our 50 micro services, one node is running on degraded hardware ➤ Photos loading fine, but Canadian users running an French language pack on one iPhone version are hitting a firmware condition making them unable to save local cache ➤ Our newest SDK makes additional DB queries if the developer has enabled an optional feature
  5. CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ “There

    are no more easy problems in the future, there are only hard problems (Duh…you fixed the easy ones)” ➤ Observability means being open-ended and exploratory ➤ Debug by asking questions, not by using muscle memory ➤ Stop debugging with your eyeballs ➤ Debug with data! ➤ Debugging is a social act ➤ Solving problems is mentally taxing ➤ Sharing information and solutions is not taxing
  6. ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ What

    is a container? ➤ Stand-alone executable package of software with everything you need to run it: code, runtime, tools, libraries, settings, etc. ➤ Sounds like a Virtual Machine… ➤ VM has a complete guest operating system, containers don’t have log in or other unnecessary parts of OS ➤ Containers can pack a lot more apps into a single server than a VM can ➤ VM better for running unknown code - it can only mess up the VM OS, not your OS
  7. ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ Benefits

    of Containers: ➤ Packaging ➤ Eliminate problems with different devs having different version #s ➤ Container images are immutable ➤ Performance ➤ Containers are faster because you aren’t running an entire different OS ➤ Efficiency ➤ Containers are small ➤ You can run lots of them if you want to!
  8. ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ What

    is Kubernetes? ➤ Open source orchestration system for Docker containers ➤ Automates deployment, scaling and management of containerized applications ➤ If a node goes down Kubernetes will roll a new instance Observe Analyze Act
  9. ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ Queue: optimize

    conversion funnel and activating product advocates ➤ Kickstarter campaigns & gamification of getting more users ➤ The clients wanted to use MongoDB (heard it was good!) ➤ Ruby on Rails with MongoDB, Redis, and Heroku ➤ In MongoDB, the document size is limited to 16MB ➤ Translated to ~100 users per campaign ➤ Campaigns could have +100k users, so this wasn’t going to work!
  10. ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ They tried

    lots of different ways to make MongoDB work ➤ Went from 1 master list to 3 lists ➤ Tried MapReduce ➤ ~30s for 3000 documents ➤ Introduced Sidekiq to help with background processes ➤ 1 HOUR for 329,000 documents ➤ Decided that MongoDB was good for simple operations, but not their app ➤ They couldn’t think of more to do and so his gut told him MySQL was the best option
  11. ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ Wrote a

    NodeJS tool to migrate the MongoDB to MySQL ➤ Average response time: ➤ Heroku + MongoDB = 150ms ➤ Amazon Aurora + MySQL = 50ms ➤ Query time: ➤ MongoDB = 1 hour ➤ MySQL = 40 seconds ➤ Without any other query improvements ➤ All of this cost them a year of developer time ➤ 2015 was just putting out fires
  12. JONAS HELFER - JOINS ACROSS DATABASES WITH GRAPHQL ➤ GraphQL:

    an application layer query language that interprets a string by a server then returns the data in a specified format ➤ A replacement for REST ➤ Easier to consume ➤ Get exactly the data you ask for ➤ Not more or less data than you asked for! ➤ Easier to produce ➤ Write resolvers to write a query to return user #1’s SQL record ➤ Resolver: functions that resolve a field to its value ➤ Version free ➤ The client decides what data it wants, so you can support many different client versions (instead of versioning endpoints)
  13. JOSHUA DRAKE - POSTGRESQL ➤ PostgreSQL is the best (???)

    ➤ 40+ years old (Note: Wikipedia says 20 years) ➤ Postgres NoSQL = unstructured + relational database ➤ still ACID (Atomic, Consistent, Isolation, and Durable) ➤ Relational database ➤ Can set up Master Slave replication ➤ Copying database information to a second system to create high availability and redundancy ➤ Cool new features ➤ Date and time auto-magically change based on timezone when the data was entered vs. timezone when queried! ➤ Connect to more stuff with 1 (Postgres) API
  14. LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ ➤ RabbitMQ

    is an open source message queue ➤ Queues help your application by ➤ Introducing coupling points ➤ Asynchronously process tasks ➤ Enable parts of the system to scale appropriately ➤ An event in one system causes 100+ other actions
  15. LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ ➤ Implement

    retries ➤ Identify if a message should be retried ➤ Create a new message with the same data ➤ Add retry count ➤ Ack the original message ➤ Reject after #x attempts ➤ Use a dead letter exchange if you can’t process the message! ➤ Republish the message to another exchange when: ➤ A message is rejected ➤ The time-to-live (TTL) for the message expires ➤ The queue length limit is exceeded
  16. AMY UNRUH - SCALING OUT SQL DATABASES WITH SPANNER ➤

    Spanner: Google’s globally distributed NewSQL database ➤ NewSQL? Modern relational database that combines the scalable performance of NoSQL while keeping the ACID guarantees ➤ Scales horizontally ➤ Distributed database ➤ Fully managed ➤ Traditional SQL transactions ➤ Currently works with Python, Java, NodeJS, and Go ➤ More languages to come!
  17. AMY UNRUH - SCALING OUT SQL DATABASES WITH SPANNER ➤

    No Foreign Keys! (but you have primary keys) ➤ Instead you interleave definitions ➤ Table rows will be physically co-located ➤ Increases leverage for Spanner to distribute data ➤ It’s a bad idea to create non-interleaved indexes on column whose values are monotonically increasing or decreasing (even if they aren’t primary key columns) ➤ Note: Spanner will be slow if you interleave EVERYTHING, so only interleave with discretion ➤ No downtime for schema migration ➤ No free tier :(
  18. REGINA’S NOTE RE: SPANNER ➤ A really good overview is

    at this link: ➤ https://opencredo.com/google-spanner-first-look/
  19. EMILE BAIZEL - BUILDING A FINTECH BOT ➤ Built with

    MongoDB and Elasticsearch ➤ Users can text simple commands to a bot to get balance, move money from checking to savings, etc. ➤ What they needed for this bot: ➤ Zero False Positives ➤ “I don’t know” is better than a wrong answer ➤ Only get 1 chance to respond
  20. EMILE BAIZEL - BUILDING A FINTECH BOT 1. Gather and

    map responses in MongoDB ➤ map the message to the intended action 2. Index Message in Elasticsearch ➤ Stemming: accounting -> account ➤ Fuzzy query: acount -> account 3. Weigh frequencies ➤ High frequency: I, a, the ➤ Low frequency: balance, account, checking 4. Test it ➤ Test especially for false positives 5. Ship it
  21. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Former community manager at RethinkDB ➤ Business model: ➤ Open Core ➤ SaaS + Open SaaS ➤ Professional Services
  22. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Open Core ➤ Core features are open source ➤ Upgraded features are paid ➤ Pro: financially sustainable ➤ Con: seen as corporate/not really open source ➤ Examples: ➤ GitLab ➤ MySQL ➤ SugarCRM
  23. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Open SaaS ➤ Hosted and maintained by service provider ➤ Roadmap for product is done by the community ➤ Examples: ➤ MongoDB ➤ EnterpriseDB
  24. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ Professional Services ➤ Training and services are paid ➤ Example: ➤ Cloudera
  25. CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS

    ➤ What went wrong at RethinkDB ➤ Lost funding ➤ Lost IP ➤ Had to purchase IP back ➤ Joined Linux Foundation ➤ Focused on building community and not on building a business model (and sticking with it) ➤ Hired Christina (community manager) when they should have been hiring marketing/sales/sales engineers
  26. JOHN SINGLETON - STREAM-FILTER-DRAIN: A NEW PARADIGM ➤ watchful.io ➤

    Stream-Filter-Drain(SFD) ➤ Alternative to Extract-Transform-Load(ETL) data pipelines ➤ Not just finding needle in haystack, there are only needles in the stack ➤ Make a real time stream processing tool ➤ Work with data at massive scale ➤ Keep it fast!
  27. JOHN SINGLETON - STREAM-FILTER-DRAIN: A NEW PARADIGM ➤ Stream ➤

    Data is fast before it is big ➤ Event processor ➤ Filter ➤ Grammar restricted RegEx ➤ Content based filtration ➤ Drain ➤ Pull matches by key tag ➤ Goes to data consumers ➤ Application, microservices, Spark, etc.
  28. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Apps

    now are 12-factor monolithic beasts ➤ Hard to maintain ➤ Bit rot ➤ Microservices ➤ Part of a cluster of services ➤ Does 1 thing ➤ Coordinates with other services
  29. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Serverless

    ➤ Amazon Lambda ➤ There is no cloud, just somebody else’s computer ➤ It isn’t: ➤ Small services ➤ Fine grained billing ➤ Cost as a constraint ➤ Distributed ➤ “Stateless”
  30. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Serverless

    Upside ➤ Smaller operational burden ➤ Scalable ➤ Easy to expand within the service ➤ Serverless providers ➤ OpenWhisk ➤ Lambda ➤ Google Cloud Foundry
  31. JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Example:

    Redis ➤ Fast ➤ Well supported ➤ Well Documented ➤ Fast ➤ Fixes the “stateless” problem ➤ Language agnostic ➤ Can lower costs by billing by milliseconds rather than by hour/minute ➤ You don’t have to do everything yourself!!
  32. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ What are

    data streams? ➤ Transfer of data at steady high speed rate
  33. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Uses for

    data streams ➤ Streams/firehoses from IoT ➤ Monitoring systems ➤ Analytics ➤ Ecommerce (search, price monitoring, etc) ➤ Fraud detection (payments, cyber security)
  34. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Typical approach

    has limitations ➤ Middleware logic that connects DB with realtime protocols ➤ Can’t deal with complex realtime scenarios ➤ Doesn’t scale well ➤ Alternative: Streaming Database System ➤ Take the best parts of a DB system and add realtime protocols ➤ Middleware layer is optional!
  35. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Elasticsearch ➤

    Distributed text search ➤ Based on Lucene ➤ Scales to many nodes easily ➤ Percolation ➤ Indexes queries and filter documents against the indexed queries to know which queries they match ➤ Allows reverse search ➤ Matches when new documents are added
  36. SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Stream Pros

    ➤ Performant —> Nginx! ➤ Can work anywhere —> Docker! ➤ All the good parts of existing data layers