Slide 1

Slide 1 text

IMPORTING AND SERVING MILLIONS OF RECORDS

Slide 2

Slide 2 text

TALE OF A PROTOTYPE

Slide 3

Slide 3 text

OXEANPEDIA

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

fishing sailing science oil gas exploration shipping deep sea mining governmental use ecological monitoring

Slide 6

Slide 6 text

WHERE'S THE DATA?

Slide 7

Slide 7 text

11 DIFFERENT SOURCES

Slide 8

Slide 8 text

500K - 1M RECORDS / MONTH / SOURCE

Slide 9

Slide 9 text

BIG DATA?

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

VVV

Slide 12

Slide 12 text

VOLUME. VELOCITY. VARIETY.

Slide 13

Slide 13 text

VOLUME FACEBOOK - 250 BILLION IMAGES

Slide 14

Slide 14 text

VELOCITY FACEBOOK - 100 MILLION IMAGES / DAY

Slide 15

Slide 15 text

VARIETY FACEBOOK - IMAGES, POSTS, PERSONAL MESSAGES, FRIENDSHIPS, LOCATIONS

Slide 16

Slide 16 text

EXTREMELY LARGE DATA SETS ANALYSED COMPUTATIONALLY TO REVEAL PATTERNS, TRENDS, AND ASSOCIATIONS

Slide 17

Slide 17 text

R&D DATABASE TYPE DECISION MAKING

Slide 18

Slide 18 text

POSTGRES VS AMAZON DYNAMODB VS GOOGLE BIG TABLE

Slide 19

Slide 19 text

POSTGRES

Slide 20

Slide 20 text

AMAZON DYNAMODB

Slide 21

Slide 21 text

GOOGLE BIG TABLE

Slide 22

Slide 22 text

PG FOR THE WIN

Slide 23

Slide 23 text

IMPORT

Slide 24

Slide 24 text

THE STUPID SIMPLE WAY: MEASUREMENT.CREATE()

Slide 25

Slide 25 text

ACTIVERECORDIMPORTER MEASUREMENT.IMPORT()

Slide 26

Slide 26 text

WWGD WHAT WOULD GEORGE DO?

Slide 27

Slide 27 text

HAMMERFIX POSTGRES COPY

Slide 28

Slide 28 text

COPY COMMAND COPY measurements FROM 'measurements.csv'

Slide 29

Slide 29 text

COPY COMMAND COPY measurements (temperature, latitude, longitude) FROM 'measurements.csv'

Slide 30

Slide 30 text

COPY IN RUBY db_conn = ActiveRecord::Base.connection.raw_connection db_conn.copy_data('COPY measurements FROM STDIN') do CSV.foreach('measurements.csv', headers: true) do |row| db_conn.put_copy_data(row.fields + [Time.zone.now]) end end

Slide 31

Slide 31 text

module Oxeanpedia module Argo class CopyData method_object :file_path CSV_FIELD_NAMES = [ 'ARGOS_ID', 'LATITUDE (degree_north)', 'LONGITUDE (degree_east)', 'PRES (decibar)', 'TEMP (degree_Celsius)' ] def call pgconn.copy_data(copy_statement, enco) do CSV.foreach(file_path, headers: true) do |row| pgconn.put_copy_data( ['argo'] + [date_field(row)] + row.fields(*CSV_FIELD_NAMES) + [file_path] ) end end end private def date_field(row) row['DATE (YYYY/MM/DD HH:MI:SS)'] || row['DATE (YYYY-MM-DDTHH:MI:SSZ)'] end def pgconn @pgconn ||= ActiveRecord::Base.connection.raw_connection end def time @time ||= Time.zone.now end def enco PG::TextEncoder::CopyRow.new end def copy_statement <<-HEREDOC.strip_heredoc COPY argo_measurements (source, measured_on, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, filename) FROM STDIN HEREDOC end end end end

Slide 32

Slide 32 text

TEMP TABLE

Slide 33

Slide 33 text

WITH moved_rows AS ( DELETE FROM argo_measurements am WHERE filename = '#{filename}' AND latitude IS NOT NULL AND longitude IS NOT NULL AND temperature_in_celsius IS NOT NULL AND depth_in_meters IS NOT NULL RETURNING am.* ) INSERT INTO measurements (source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, created_at, updated_at, location) SELECT source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, current_timestamp, current_timestamp, ST_SetSRID(ST_MakePoint(longitude, latitude),4326) as location FROM moved_rows ON CONFLICT (latitude, longitude, depth_in_meters, measured_on) DO NOTHING

Slide 34

Slide 34 text

GEO WTFS

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

geometry ST_MakePoint(double precision x, double precision y);

Slide 37

Slide 37 text

ARCHITECTURE

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

REACT ON RAILS

Slide 40

Slide 40 text

CLIENT: WHERE'S THE DESKTOP APP?

Slide 41

Slide 41 text

SERVING THE DATA

Slide 42

Slide 42 text

INDEXING

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

7K ROWS PER REQUEST

Slide 45

Slide 45 text

COMPLETED 200 OK IN 2750MS (VIEWS: 2500MS | ACTIVERECORD: 250MS)

Slide 46

Slide 46 text

> PG string > PG Ruby row objects > ActiveRecord objects > ActiveRecordSerializer objects > JSON string as JSON API

Slide 47

Slide 47 text

PAGINATE?

Slide 48

Slide 48 text

CACHE?

Slide 49

Slide 49 text

JSON DIRECTLY FROM POSTGRES

Slide 50

Slide 50 text

SELECT json_build_object( 'data', json_agg( json_build_object( 'id', m.id, 'type', 'measurements', 'attributes', json_build_object( 'latitude', m.latitude, 'longitude', m.longitude, 'depth_in_meters', m.depth_in_meters, 'temperature_in_celsius', m.temperature_in_celsius, 'measured_on', m.measured_on ) ) ) ) as result FROM ..

Slide 51

Slide 51 text

COMPLETED 200 OK IN 302MS (VIEWS: 2MS | ACTIVERECORD: 300MS)

Slide 52

Slide 52 text

THAT'S ALL FOLKS!