Slide 1

Slide 1 text

Heroku Postgres The Tale of Conceiving and Building a Leading Cloud Database Service 1 Harold Giménez @hgimenez 1 Saturday, July 28, 12

Slide 2

Slide 2 text

2 2 Saturday, July 28, 12

Slide 3

Slide 3 text

Heroku origins 3 3 Saturday, July 28, 12

Slide 4

Slide 4 text

focus on rails 4 4 Saturday, July 28, 12

Slide 5

Slide 5 text

rails apps need a database 5 5 Saturday, July 28, 12

Slide 6

Slide 6 text

web apps need a database 6 6 Saturday, July 28, 12

Slide 7

Slide 7 text

thankfully postgres was chosen 7 7 Saturday, July 28, 12

Slide 8

Slide 8 text

otherwise I wouldn’t be here 8 8 Saturday, July 28, 12

Slide 9

Slide 9 text

“let’s make a production grade postgres service” 9 9 Saturday, July 28, 12

Slide 10

Slide 10 text

10 10 Saturday, July 28, 12

Slide 11

Slide 11 text

(hopefully yours) 11 11 Saturday, July 28, 12

Slide 12

Slide 12 text

12 Heroku Postgres v.0.pre.alpha • A sinatra app implementing the heroku addons API • create servers • install postgres service • create databases for users - a “Resource” • Sequel talks to postgres • stem talks to AWS 12 Saturday, July 28, 12

Slide 13

Slide 13 text

Two main entities 13 13 Saturday, July 28, 12

Slide 14

Slide 14 text

Resource 14 { database: ‘d4f9wdf02’, port: 5432, username: ‘uf0wjasdf’, password: ‘pf14fhjas’, created_at: ‘2012-05-02’, state: ‘available’ } 14 Saturday, July 28, 12

Slide 15

Slide 15 text

Server 15 { elastic_ip: ‘192.168.0.1’, instance_id: ‘i-2efjoiads’, ami: ‘pg-prod’, availability_zone: ‘us-east-1’, created_at: ‘2012-05-02’, state: ‘booting’ } 15 Saturday, July 28, 12

Slide 16

Slide 16 text

...and a thin admin web interface 16 erb templates in sinatra endpoint 16 Saturday, July 28, 12

Slide 17

Slide 17 text

We are just an add-on 17 17 Saturday, July 28, 12

Slide 18

Slide 18 text

18 18 Saturday, July 28, 12

Slide 19

Slide 19 text

19 we run on 19 Saturday, July 28, 12

Slide 20

Slide 20 text

the simplest thing that could possibly work, but no less 20 20 Saturday, July 28, 12

Slide 21

Slide 21 text

We’ve come a long way since then 21 21 Saturday, July 28, 12

Slide 22

Slide 22 text

Workflow and Monitoring 22 22 Saturday, July 28, 12

Slide 23

Slide 23 text

draw inspiration from gaming 23 23 Saturday, July 28, 12

Slide 24

Slide 24 text

24 class Resource def feel observations.create( Feeler.new(self).current_environment ) end end class Feeler def current_environment { service_available?: service_available?, open_connections: open_connections, row_count: row_count, table_count: table_count, seq_scans: seq_scans, index_scans: index_scans } end end monitoring 24 Saturday, July 28, 12

Slide 25

Slide 25 text

25 class Resource include Stateful state :available do unless service_available? transition :unavailable end end end resource = Resource.new resource.transition :available resource.feel resource.tick puts resource.state # ‘unavailable’ workflow 25 Saturday, July 28, 12

Slide 26

Slide 26 text

26 module Stateful def self.included(base) base.extend ClassMethods end module ClassMethods def state(name, &block) states[name] = block end def states; @states ||= {}; end end def tick self.instance_eval( &self.class.states[self.state.to_sym] ) end def transition(state) # log and assign new state end end workflow 26 Saturday, July 28, 12

Slide 27

Slide 27 text

27 27 Saturday, July 28, 12

Slide 28

Slide 28 text

28 resource.feel resource.tick Need to do this all the time 28 Saturday, July 28, 12

Slide 29

Slide 29 text

29 db1 db2 db3 db4 db5 db6 db7 db8 db9 ... dbn db1.feel db1.tick 29 Saturday, July 28, 12

Slide 30

Slide 30 text

30 db2 db3 db4 db5 db6 db7 db8 db9 ... dbn db1 db2.feel db2.tick enqueue(db1) 30 Saturday, July 28, 12

Slide 31

Slide 31 text

31 QueueClassic http://github.com/ryandotsmith/queue_classic 31 Saturday, July 28, 12

Slide 32

Slide 32 text

Durability & Availability 32 32 Saturday, July 28, 12

Slide 33

Slide 33 text

33 33 Saturday, July 28, 12

Slide 34

Slide 34 text

34 34 Saturday, July 28, 12

Slide 35

Slide 35 text

35 35 Saturday, July 28, 12

Slide 36

Slide 36 text

36 36 Saturday, July 28, 12

Slide 37

Slide 37 text

Continuous Protection 37 • Write-Ahead Log segments shipped to durable storage every 60 seconds • We can replay these logs on a new server to recover your data • https://github.com/heroku/WAL-E 37 Saturday, July 28, 12

Slide 38

Slide 38 text

Need a more flexible object model 38 38 Saturday, July 28, 12

Slide 39

Slide 39 text

39 timeline 39 Saturday, July 28, 12

Slide 40

Slide 40 text

40 participant 40 Saturday, July 28, 12

Slide 41

Slide 41 text

41 41 Saturday, July 28, 12

Slide 42

Slide 42 text

42 42 Saturday, July 28, 12

Slide 43

Slide 43 text

43 resource 43 Saturday, July 28, 12

Slide 44

Slide 44 text

44 follower 44 Saturday, July 28, 12

Slide 45

Slide 45 text

45 fork 45 Saturday, July 28, 12

Slide 46

Slide 46 text

46 disaster 46 Saturday, July 28, 12

Slide 47

Slide 47 text

47 47 Saturday, July 28, 12

Slide 48

Slide 48 text

48 recovery 48 Saturday, July 28, 12

Slide 49

Slide 49 text

big project 49 49 Saturday, July 28, 12

Slide 50

Slide 50 text

lots of moving parts 50 50 Saturday, July 28, 12

Slide 51

Slide 51 text

long test suite 51 51 Saturday, July 28, 12

Slide 52

Slide 52 text

modularize and build APIs 52 52 Saturday, July 28, 12

Slide 53

Slide 53 text

53 53 Saturday, July 28, 12

Slide 54

Slide 54 text

gain in agility 54 54 Saturday, July 28, 12

Slide 55

Slide 55 text

composable services 55 55 Saturday, July 28, 12

Slide 56

Slide 56 text

independently scalable 56 56 Saturday, July 28, 12

Slide 57

Slide 57 text

Logging and Metrics 57 57 Saturday, July 28, 12

Slide 58

Slide 58 text

log generation 58 58 Saturday, July 28, 12

Slide 59

Slide 59 text

59 59 Saturday, July 28, 12

Slide 60

Slide 60 text

logs are event streams 60 60 Saturday, July 28, 12

Slide 61

Slide 61 text

how should you log? 61 61 Saturday, July 28, 12

Slide 62

Slide 62 text

62 post “/work” do puts “starting to do work” worker = Worker.new(params) begin worker.lift_things_up worker.put_them_down rescue WorkerError => e puts “Fail :( #{e.message}” status 500 end puts “done doing work” status 200 end 62 Saturday, July 28, 12

Slide 63

Slide 63 text

63 $ heroku logs --tail 2012-07-28T02:43:35 [web.4] starting to do work 2012-07-28T02:43:35 [web.4] Fail :( invalid worker, nothing to do 2012-07-28T02:43:35 heroku[router] POST myapp.com/work dyno=web.4 queue=0 wait=0ms service=14ms status=500 bytes=643 63 Saturday, July 28, 12

Slide 64

Slide 64 text

64 bad logging good logging 64 Saturday, July 28, 12

Slide 65

Slide 65 text

65 post “/work” do log(create_work: true, request_id: uuid) do worker = Worker.new(params.merge(uuid: uuid)) begin worker.lift_things_up worker.put_them_down rescue WorkerError => e log_exception(e, create_work: true) end end end helpers do def uuid SecureRandom.uuid end end 65 Saturday, July 28, 12

Slide 66

Slide 66 text

66 require ‘scrolls’ module App module Logs extend self def log(data, &block) Scrolls.log(with_env(data), &block) end def log_exception(exception, data, &block) Scrolls.log_exception(with_env(data), &block) end def with_env(hash) { environment: ENV[‘RACK_ENV’] }.merge(data) end end end 66 Saturday, July 28, 12

Slide 67

Slide 67 text

67 $ heroku logs --tail 2012-07-28T02:43:35 [web.4] create_work request_id=afe2-f0d at=start 2012-07-28T02:43:35 [web.4] create_work request_id=afe2-f0d at=exception message=invalid worker, nothing to do 2012-07-28T02:43:35 [web.4] create_work request_id=afe2-f0d at=finish elapsed=53 2012-07-28T02:43:35 heroku[router] POST myapp.com/work dyno=web.4 queue=0 wait=0ms service=14ms status=500 bytes=643 67 Saturday, July 28, 12

Slide 68

Slide 68 text

68 log consumption 68 Saturday, July 28, 12

Slide 69

Slide 69 text

(this is the fun part) 69 69 Saturday, July 28, 12

Slide 70

Slide 70 text

70 70 Saturday, July 28, 12

Slide 71

Slide 71 text

71 select * from events; 71 Saturday, July 28, 12

Slide 72

Slide 72 text

72 72 Saturday, July 28, 12

Slide 73

Slide 73 text

73 good logging metrics alerts 73 Saturday, July 28, 12

Slide 74

Slide 74 text

current tooling 74 • still using sequel and sinatra • fog displaced stem • backbone.js for web UIs • fernet for auth tokens, valcro for simple validations, QueueClassic for job queues • Wal-e for durability • python, go and bash in some subsystems 74 Saturday, July 28, 12

Slide 75

Slide 75 text

• managing databases is hard • start simple • extract (and share) reusable code • separate concerns into services • use the right tool (framework, library, language) • learn to love your event stream, metrics for everything 75 lessons 75 Saturday, July 28, 12

Slide 76

Slide 76 text

TRUNCATE TABLE talk; 76 @hgimenez @herokupostgres Thank you! 76 Saturday, July 28, 12

Slide 77

Slide 77 text

select a.body from answers a inner join questions q on a.id = q.answer_id; 77 77 Saturday, July 28, 12