Slide 1

Slide 1 text

Reporting and the First Law of Holes

Slide 2

Slide 2 text

Sebastian von Conrad

Slide 3

Slide 3 text

Work President Organiser Citizen @ @ @ @ Envato Ruby Australia RubyConf AU 2015 Sweden (There will be some Swedish in this presentation.)

Slide 4

Slide 4 text

Story time!

Slide 5

Slide 5 text

This happened to me. Pretty much everywhere I’ve worked.

Slide 6

Slide 6 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 7

Slide 7 text

Sale Item User

Slide 8

Slide 8 text

Item Category

Slide 9

Slide 9 text

User Profile

Slide 10

Slide 10 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 11

Slide 11 text

Yesterday’s sales? Mission:

Slide 12

Slide 12 text

We can solve this with ActiveRecord! This is the point where we realise:

Slide 13

Slide 13 text

Sale.where(created_at: ).count # or, a named scope Sale.yesterday.count

Slide 14

Slide 14 text

Let’s automate!

Slide 15

Slide 15 text

Reporting admin panel!

Slide 16

Slide 16 text

We’re in a hole. It’s a small one, but still a hole.

Slide 17

Slide 17 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 18

Slide 18 text

Last month’s top countries? Mission:

Slide 19

Slide 19 text

Sale User Profile

Slide 20

Slide 20 text

sales = Sale.includes(user: :profile) by_country = sales.group_by { |s| s.user.country } by_country.map do |(country, records)| { country: country, sales: records.count } end

Slide 21

Slide 21 text

sales = Sale.includes(user: :profile) by_country = sales.group_by { |s| s.user.country } by_country.map do |(country, records)| { country: country, sales: records.count } end InEfficient

Slide 22

Slide 22 text

Sale.select('country, count(sales.id)') .join(:user).join(:profile) .where('sales.created_at BETWEEN ? AND ?', ) .group(:country)

Slide 23

Slide 23 text

Sale.select('country, count(sales.id)') .join(:user).join(:profile) .where('sales.created_at BETWEEN ? AND ?', ) .group(:country) Messy

Slide 24

Slide 24 text

We can solve this with SQL! This is the point where we realise:

Slide 25

Slide 25 text

SELECT p.country, count(s.id) AS num_sales FROM sales s INNER JOIN users u ON s.user_id = u.id INNER JOIN user_profiles p ON p.user_id = u.id WHERE s.created_at BETWEEN AND GROUP BY p.country;

Slide 26

Slide 26 text

Sweet! Let’s add it to the admin panel.

Slide 27

Slide 27 text

But we don’t have time to update the existing Ruby reports.

Slide 28

Slide 28 text

And so the hole gets a little bit deeper.

Slide 29

Slide 29 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 30

Slide 30 text

Which categories do well in which countries? Mission:

Slide 31

Slide 31 text

Sale Item User Category Profile

Slide 32

Slide 32 text

SELECT p.country, c.name, count(s.id) AS num_sales FROM sales s INNER JOIN items i ON s.item_id = i.id INNER JOIN categories c ON i.category_id = c.id INNER JOIN users u ON s.user_id = u.id INNER JOIN user_profiles p ON p.user_id = u.id WHERE s.created_at BETWEEN AND GROUP BY p.country, c.name;

Slide 33

Slide 33 text

We never liked writing SQL in the first place. This is the point where we realise:

Slide 34

Slide 34 text

SQL is not DRY. SQL is ugly. Ruby is DRY. Ruby is beautiful.

Slide 35

Slide 35 text

Ruby is great for defining custom DSLs. This is the point where we realise:

Slide 36

Slide 36 text

…like a custom DSL for writing SQL queries.

Slide 37

Slide 37 text

reports/generic_accounts do name 'Categories/countries' description 'Category sales per country' column_groups do [['Total', :sum], ['Average', :average], ['Count', :count]].each do |group_title, function| reports/column_group(title: group_title) do columns do categories.product(countries).each do |(category, country)| reports/column do title "#{category}/#{country}" aggregate_type function account :sales code :purchase filter [{ category: category, country: country }] end end end end end end end

Slide 38

Slide 38 text

reports/generic_accounts do name 'Categories/countries' description 'Category sales per country' column_groups do [['Total', :sum], ['Average', :average], ['Count', :count]].each do |group_title, function| reports/column_group(title: group_title) do columns do categories.product(countries).each do |(category, country)| reports/column do title "#{category}/#{country}" aggregate_type function account :sales code :purchase filter [{ category: category, country: country }] end end end end end end end Nightmare

Slide 39

Slide 39 text

Add to admin panel, just for the heck of it!

Slide 40

Slide 40 text

The hole is pretty big by now.

Slide 41

Slide 41 text

But we’re not done yet, are we?

Slide 42

Slide 42 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 43

Slide 43 text

Items featured (and not) on homepage? Mission:

Slide 44

Slide 44 text

We can do this, because we think about the future. Right?

Slide 45

Slide 45 text

+ ------- + ---------------- + -------------- + | item_id | feature_start_at | feature_end_at | | ------- | ---------------- | -------------- | | 34292 | 2015-02-01 | 2015-02-03 | | 64233 | 2015-02-02 | 2015-02-03 | | 77245 | 2015-02-05 | 2015-02-05 | | 212 | 2015-01-23 | 2015-02-08 | | 22196 | 2015-02-06 | 2015-02-08 | | ... | ... | ... | + ------- + ---------------- + -------------- +

Slide 46

Slide 46 text

No, we’re not.

Slide 47

Slide 47 text

+ ------- + ---------------- + -------------- + | id | price | category_id | ... | featured | | --- | ----- | ------------ | --- | -------- | | 209 | 40.00 | 112 | ... | false | | 210 | 34.00 | 2 | ... | false | | 211 | 78.00 | 61 | ... | false | | 212 | 53.00 | 54 | ... | true | | 213 | 12.00 | 14 | ... | false | | ... | ... | ... | ... | ... | + ------- + ---------------- + -------------- +

Slide 48

Slide 48 text

We don’t have historical data.

Slide 49

Slide 49 text

We can’t solve this problem. This is the point where we realise:

Slide 50

Slide 50 text

Not only are we in a hole, it can’t even do what we need it to.

Slide 51

Slide 51 text

So why are we in this hole?

Slide 52

Slide 52 text

The way we build applications does not cater well to reporting.

Slide 53

Slide 53 text

As a result, our database is good for the application, but bad for reporting.

Slide 54

Slide 54 text

join.join.join

Slide 55

Slide 55 text

Sub-optimal indexing.

Slide 56

Slide 56 text

Oh so brittle. Schema changes suck.

Slide 57

Slide 57 text

You have to understand the domain in order to understand the data.

Slide 58

Slide 58 text

Expensive reporting queries can hurt prod.

Slide 59

Slide 59 text

ORM maintains current state, not history.

Slide 60

Slide 60 text

(ORMs are bad.)

Slide 61

Slide 61 text

First law of holes: if you find yourself in one, stop digging. - Will Rogers, 1911

Slide 62

Slide 62 text

Step 1: What do you care about? Step 2: ??? Step 3: ??? Step 4: ??? Step 5: ???

Slide 63

Slide 63 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 64

Slide 64 text

Not a trick question!

Slide 65

Slide 65 text

Events!

Slide 66

Slide 66 text

What entities are? What do they do?

Slide 67

Slide 67 text

Signup. Sale. Subscription. Cancellation. Refund. Etc.

Slide 68

Slide 68 text

The events are key.

Slide 69

Slide 69 text

Step 1: What do you care about? Step 2: What do you know? Step 3: ??? Step 4: ??? Step 5: ???

Slide 70

Slide 70 text

When the event happens, what info do you have?

Slide 71

Slide 71 text

Collect everything.

Slide 72

Slide 72 text

Buyer. Who are they? Where are they? When did they sign up? Previous purchases?

Slide 73

Slide 73 text

Buyer. Who are they? Where are they? When did they sign up? Previous purchases? Seller. Who are they? Where are they? When did they sign up? Previous sales?

Slide 74

Slide 74 text

Buyer. Who are they? Where are they? When did they sign up? Previous purchases? Seller. Who are they? Where are they? When did they sign up? Previous sales? Transaction. Amount? Payment gateway? Coupon? Tax?

Slide 75

Slide 75 text

Buyer. Who are they? Where are they? When did they sign up? Previous purchases? Seller. Who are they? Where are they? When did they sign up? Previous sales? Transaction. Amount? Payment gateway? Coupon? Tax? Item. Category? Featured on homepage? Price?

Slide 76

Slide 76 text

Buyer. Who are they? Where are they? When did they sign up? Previous purchases? Seller. Who are they? Where are they? When did they sign up? Previous sales? Transaction. Amount? Payment gateway? Coupon? Tax? Item. Category? Featured on homepage? Price? Category. Number of items? Price range?

Slide 77

Slide 77 text

Buyer. Who are they? Where are they? When did they sign up? Previous purchases? Seller. Who are they? Where are they? When did they sign up? Previous sales? Transaction. Amount? Payment gateway? Coupon? Tax? Item. Category? Featured on homepage? Price? Category. Number of items? Price range? Not exhaustive

Slide 78

Slide 78 text

Step 1: What do you care about? Step 2: What do you know? Step 3: JSON.generate() Step 4: ??? Step 5: ???

Slide 79

Slide 79 text

(Doesn’t have to be JSON.)

Slide 80

Slide 80 text

Denormalised. Immutable. Descriptive.

Slide 81

Slide 81 text

job_id job_purchased_at service_id service_name service_price_in_cents service_turnaround_in_days service_revision_requests service_currently_featured service_created_at service_approved_at service_approved_by_id service_approved_by_envato_id service_approved_by_username service_approved_by_full_name service_approved_by_email_address service_category_id service_category_name service_category_top_level_name service_category_min_price_in_cents service_category_max_price_in_cents service_category_parent_id service_category_parent_name duration_between_approval_and_purchase_in_seconds service_enquiry_count service_enquiry_message_count custom_job buyer_login_method buyer_id buyer_envato_id buyer_username buyer_full_name buyer_email_address buyer_ip_address buyer_city buyer_country buyer_account_created_at provider_id provider_envato_id provider_username provider_full_name provider_email_address provider_city provider_country provider_account_created_at provider_became_provider_at payment_uuid payment_gateway_name payment_gateway_reference payment_discount_amount_in_cents payment_charged_amount_in_cents coupon_code coupon_expiry coupon_name coupon_discount_percentage coupon_discount_dollar_value_in_cents

Slide 82

Slide 82 text

Step 1: What do you care about? Step 2: What do you know? Step 3: JSON.generate() Step 4: Where do you want it? Step 5: ???

Slide 83

Slide 83 text

Easiest difficulty: Log table.

Slide 84

Slide 84 text

(PostgreSQL hstore.)

Slide 85

Slide 85 text

Moderate difficulty: Log file (+ Hadoop?).

Slide 86

Slide 86 text

Expert difficulty: Data warehouse.

Slide 87

Slide 87 text

Step 1: What do you care about? Step 2: What do you know? Step 3: JSON.generate() Step 4: Where do you want it? Step 5: GOTO 1

Slide 88

Slide 88 text

Rinse, repeat.

Slide 89

Slide 89 text

You can add more data, but never backfill. (Unless your job is on the line.)

Slide 90

Slide 90 text

Combine event logs for realtime data.

Slide 91

Slide 91 text

Step 1: What do you care about? Step 2: What do you know? Step 3: JSON.generate() Step 4: Where do you want it? Step 5: GOTO 1

Slide 92

Slide 92 text

Decouple reporting from application.

Slide 93

Slide 93 text

Application generates data, not reports.

Slide 94

Slide 94 text

Application must not be allowed to read the data.

Slide 95

Slide 95 text

But other things can!

Slide 96

Slide 96 text

Build reports in tools designed to build reports.

Slide 97

Slide 97 text

Example: Export to .csv, import in Excel.

Slide 98

Slide 98 text

Easier to maintain. Easier to scale.

Slide 99

Slide 99 text

Easier all the way. Seriously.

Slide 100

Slide 100 text

Immutability .

Slide 101

Slide 101 text

* Caveats.

Slide 102

Slide 102 text

Image credits: http://www.gifbooster.com/pin/2345/troll-face-stick-figure-dances_58

Slide 103

Slide 103 text

Image credits: http://forum.deviantart.com/devart/drawplz/1704047

Slide 104

Slide 104 text

Tack så mycket. (I told you there’d be a little bit of Swedish.)