DataLayer AustinRB - Speaker Deck

Slide 1

Slide 1 text

DataLayer 2017 Recap Regina Imhoﬀ @StabbyMcDuck

Slide 2

Slide 2 text

ABOUT DATALAYER ➤ Was held May 17, 2017 at Alamo Drafthouse South Lamar ➤ 2nd ever DataLayer ➤ Hosted by Compose ➤ Database-as-a-Service platform ➤ Formerly MongoHQ ➤ https://datalayer.com

Slide 3

Slide 3 text

CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ Monitoring your systems is largely useless ➤ it sucks ➤ Observability is what you need ➤ build a system that you can understand ➤ Our systems have become so complex that the old system of 1000 dashboards can no longer cut it

Slide 4

Slide 4 text

CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE Stolen from Charity’s Slides:

Slide 5

Slide 5 text

CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ All we have are unreliable symptoms and reports ➤ We need a way to ask questions ➤ As soon as we know the question we usually know the answer too ➤ Modern web apps exist between the nodes ➤ Apps and microservices are so complex that we no longer know the QUESTION ➤ “Complexity is exploding everywhere, but our tools are designed for a predictable world”

Slide 6

Slide 6 text

CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ Problem: photos are loading slowly ➤ LAMP stack edition ➤ App tier capacity exceeded ➤ Connections to DB are slower than normal ➤ ==> Connection timeout and latency to rise ➤ Microservices edition ➤ On one of our 50 micro services, one node is running on degraded hardware ➤ Photos loading ﬁne, but Canadian users running an French language pack on one iPhone version are hitting a ﬁrmware condition making them unable to save local cache ➤ Our newest SDK makes additional DB queries if the developer has enabled an optional feature

Slide 7

Slide 7 text

CHARITY MAJORS - OBSERVABILITY & THE GLORIOUS FUTURE ➤ “There are no more easy problems in the future, there are only hard problems (Duh…you ﬁxed the easy ones)” ➤ Observability means being open-ended and exploratory ➤ Debug by asking questions, not by using muscle memory ➤ Stop debugging with your eyeballs ➤ Debug with data! ➤ Debugging is a social act ➤ Solving problems is mentally taxing ➤ Sharing information and solutions is not taxing

Slide 8

Slide 8 text

ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ What is a container? ➤ Stand-alone executable package of software with everything you need to run it: code, runtime, tools, libraries, settings, etc. ➤ Sounds like a Virtual Machine… ➤ VM has a complete guest operating system, containers don’t have log in or other unnecessary parts of OS ➤ Containers can pack a lot more apps into a single server than a VM can ➤ VM better for running unknown code - it can only mess up the VM OS, not your OS

Slide 9

Slide 9 text

ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ Benefits of Containers: ➤ Packaging ➤ Eliminate problems with different devs having different version #s ➤ Container images are immutable ➤ Performance ➤ Containers are faster because you aren’t running an entire different OS ➤ Efficiency ➤ Containers are small ➤ You can run lots of them if you want to!

Slide 10

Slide 10 text

ROSS KUKULINSKI - STATE OF STATE IN CONTAINERS ➤ What is Kubernetes? ➤ Open source orchestration system for Docker containers ➤ Automates deployment, scaling and management of containerized applications ➤ If a node goes down Kubernetes will roll a new instance Observe Analyze Act

Slide 11

Slide 11 text

ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ Queue: optimize conversion funnel and activating product advocates ➤ Kickstarter campaigns & gamiﬁcation of getting more users ➤ The clients wanted to use MongoDB (heard it was good!) ➤ Ruby on Rails with MongoDB, Redis, and Heroku ➤ In MongoDB, the document size is limited to 16MB ➤ Translated to ~100 users per campaign ➤ Campaigns could have +100k users, so this wasn’t going to work!

Slide 12

Slide 12 text

ANTONIO CHAVEZ - WHY WE LEFT MONGODB

Slide 13

Slide 13 text

ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ They tried lots of diﬀerent ways to make MongoDB work ➤ Went from 1 master list to 3 lists ➤ Tried MapReduce ➤ ~30s for 3000 documents ➤ Introduced Sidekiq to help with background processes ➤ 1 HOUR for 329,000 documents ➤ Decided that MongoDB was good for simple operations, but not their app ➤ They couldn’t think of more to do and so his gut told him MySQL was the best option

Slide 14

Slide 14 text

ANTONIO CHAVEZ - WHY WE LEFT MONGODB ➤ Wrote a NodeJS tool to migrate the MongoDB to MySQL ➤ Average response time: ➤ Heroku + MongoDB = 150ms ➤ Amazon Aurora + MySQL = 50ms ➤ Query time: ➤ MongoDB = 1 hour ➤ MySQL = 40 seconds ➤ Without any other query improvements ➤ All of this cost them a year of developer time ➤ 2015 was just putting out ﬁres

Slide 15

Slide 15 text

JONAS HELFER - JOINS ACROSS DATABASES WITH GRAPHQL ➤ GraphQL: an application layer query language that interprets a string by a server then returns the data in a specified format ➤ A replacement for REST ➤ Easier to consume ➤ Get exactly the data you ask for ➤ Not more or less data than you asked for! ➤ Easier to produce ➤ Write resolvers to write a query to return user #1’s SQL record ➤ Resolver: functions that resolve a field to its value ➤ Version free ➤ The client decides what data it wants, so you can support many different client versions (instead of versioning endpoints)

Slide 16

Slide 16 text

JONAS HELFER - JOINS ACROSS DATABASES WITH GRAPHQL

Slide 17

Slide 17 text

JOSHUA DRAKE - POSTGRESQL ➤ PostgreSQL is the best (???) ➤ 40+ years old (Note: Wikipedia says 20 years) ➤ Postgres NoSQL = unstructured + relational database ➤ still ACID (Atomic, Consistent, Isolation, and Durable) ➤ Relational database ➤ Can set up Master Slave replication ➤ Copying database information to a second system to create high availability and redundancy ➤ Cool new features ➤ Date and time auto-magically change based on timezone when the data was entered vs. timezone when queried! ➤ Connect to more stuﬀ with 1 (Postgres) API

Slide 18

Slide 18 text

LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ ➤ RabbitMQ is an open source message queue ➤ Queues help your application by ➤ Introducing coupling points ➤ Asynchronously process tasks ➤ Enable parts of the system to scale appropriately ➤ An event in one system causes 100+ other actions

Slide 19

Slide 19 text

LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ ➤ Implement retries ➤ Identify if a message should be retried ➤ Create a new message with the same data ➤ Add retry count ➤ Ack the original message ➤ Reject after #x attempts ➤ Use a dead letter exchange if you can’t process the message! ➤ Republish the message to another exchange when: ➤ A message is rejected ➤ The time-to-live (TTL) for the message expires ➤ The queue length limit is exceeded

Slide 20

Slide 20 text

LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ

Slide 21

Slide 21 text

LORNA JANE MITCHELL - HANDLING FAILURE IN RABBITMQ

Slide 22

Slide 22 text

AMY UNRUH - SCALING OUT SQL DATABASES WITH SPANNER ➤ Spanner: Google’s globally distributed NewSQL database ➤ NewSQL? Modern relational database that combines the scalable performance of NoSQL while keeping the ACID guarantees ➤ Scales horizontally ➤ Distributed database ➤ Fully managed ➤ Traditional SQL transactions ➤ Currently works with Python, Java, NodeJS, and Go ➤ More languages to come!

Slide 23

Slide 23 text

AMY UNRUH - SCALING OUT SQL DATABASES WITH SPANNER ➤ No Foreign Keys! (but you have primary keys) ➤ Instead you interleave deﬁnitions ➤ Table rows will be physically co-located ➤ Increases leverage for Spanner to distribute data ➤ It’s a bad idea to create non-interleaved indexes on column whose values are monotonically increasing or decreasing (even if they aren’t primary key columns) ➤ Note: Spanner will be slow if you interleave EVERYTHING, so only interleave with discretion ➤ No downtime for schema migration ➤ No free tier :(

Slide 24

Slide 24 text

REGINA’S NOTE RE: SPANNER ➤ A really good overview is at this link: ➤ https://opencredo.com/google-spanner-ﬁrst-look/

Slide 25

Slide 25 text

EMILE BAIZEL - BUILDING A FINTECH BOT ➤ Built with MongoDB and Elasticsearch ➤ Users can text simple commands to a bot to get balance, move money from checking to savings, etc. ➤ What they needed for this bot: ➤ Zero False Positives ➤ “I don’t know” is better than a wrong answer ➤ Only get 1 chance to respond

Slide 26

Slide 26 text

EMILE BAIZEL - BUILDING A FINTECH BOT 1. Gather and map responses in MongoDB ➤ map the message to the intended action 2. Index Message in Elasticsearch ➤ Stemming: accounting -> account ➤ Fuzzy query: acount -> account 3. Weigh frequencies ➤ High frequency: I, a, the ➤ Low frequency: balance, account, checking 4. Test it ➤ Test especially for false positives 5. Ship it

Slide 27

Slide 27 text

CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS ➤ Former community manager at RethinkDB ➤ Business model: ➤ Open Core ➤ SaaS + Open SaaS ➤ Professional Services

Slide 28

Slide 28 text

CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS ➤ Open Core ➤ Core features are open source ➤ Upgraded features are paid ➤ Pro: ﬁnancially sustainable ➤ Con: seen as corporate/not really open source ➤ Examples: ➤ GitLab ➤ MySQL ➤ SugarCRM

Slide 29

Slide 29 text

CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS ➤ Open SaaS ➤ Hosted and maintained by service provider ➤ Roadmap for product is done by the community ➤ Examples: ➤ MongoDB ➤ EnterpriseDB

Slide 30

Slide 30 text

CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS ➤ Professional Services ➤ Training and services are paid ➤ Example: ➤ Cloudera

Slide 31

Slide 31 text

CHRISTINA KEELAN - THE STATE OF OPEN SOURCE COMPANY MODELS ➤ What went wrong at RethinkDB ➤ Lost funding ➤ Lost IP ➤ Had to purchase IP back ➤ Joined Linux Foundation ➤ Focused on building community and not on building a business model (and sticking with it) ➤ Hired Christina (community manager) when they should have been hiring marketing/sales/sales engineers

Slide 32

Slide 32 text

JOHN SINGLETON - STREAM-FILTER-DRAIN: A NEW PARADIGM ➤ watchful.io ➤ Stream-Filter-Drain(SFD) ➤ Alternative to Extract-Transform-Load(ETL) data pipelines ➤ Not just ﬁnding needle in haystack, there are only needles in the stack ➤ Make a real time stream processing tool ➤ Work with data at massive scale ➤ Keep it fast!

Slide 33

Slide 33 text

JOHN SINGLETON - STREAM-FILTER-DRAIN: A NEW PARADIGM ➤ Stream ➤ Data is fast before it is big ➤ Event processor ➤ Filter ➤ Grammar restricted RegEx ➤ Content based ﬁltration ➤ Drain ➤ Pull matches by key tag ➤ Goes to data consumers ➤ Application, microservices, Spark, etc.

Slide 34

Slide 34 text

JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Apps now are 12-factor monolithic beasts ➤ Hard to maintain ➤ Bit rot ➤ Microservices ➤ Part of a cluster of services ➤ Does 1 thing ➤ Coordinates with other services

Slide 35

Slide 35 text

JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Serverless ➤ Amazon Lambda ➤ There is no cloud, just somebody else’s computer ➤ It isn’t: ➤ Small services ➤ Fine grained billing ➤ Cost as a constraint ➤ Distributed ➤ “Stateless”

Slide 36

Slide 36 text

JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Serverless Upside ➤ Smaller operational burden ➤ Scalable ➤ Easy to expand within the service ➤ Serverless providers ➤ OpenWhisk ➤ Lambda ➤ Google Cloud Foundry

Slide 37

Slide 37 text

JOSHUA B. SMITH - SPEEDING UP SLOW MONOLITHS ➤ Example: Redis ➤ Fast ➤ Well supported ➤ Well Documented ➤ Fast ➤ Fixes the “stateless” problem ➤ Language agnostic ➤ Can lower costs by billing by milliseconds rather than by hour/minute ➤ You don’t have to do everything yourself!!

Slide 38

Slide 38 text

SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ What are data streams? ➤ Transfer of data at steady high speed rate

Slide 39

Slide 39 text

SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Uses for data streams ➤ Streams/ﬁrehoses from IoT ➤ Monitoring systems ➤ Analytics ➤ Ecommerce (search, price monitoring, etc) ➤ Fraud detection (payments, cyber security)

Slide 40

Slide 40 text

SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Typical approach has limitations ➤ Middleware logic that connects DB with realtime protocols ➤ Can’t deal with complex realtime scenarios ➤ Doesn’t scale well ➤ Alternative: Streaming Database System ➤ Take the best parts of a DB system and add realtime protocols ➤ Middleware layer is optional!

Slide 41

Slide 41 text

SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Elasticsearch ➤ Distributed text search ➤ Based on Lucene ➤ Scales to many nodes easily ➤ Percolation ➤ Indexes queries and ﬁlter documents against the indexed queries to know which queries they match ➤ Allows reverse search ➤ Matches when new documents are added

Slide 42

Slide 42 text

SIDDHARTH KOTHARI - DATA STREAMS WITH ELASTICSEARCH ➤ Stream Pros ➤ Performant —> Nginx! ➤ Can work anywhere —> Docker! ➤ All the good parts of existing data layers