Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Importing and serving millions of records

DamirSvrtan
February 01, 2018
93

Importing and serving millions of records

DamirSvrtan

February 01, 2018
Tweet

Transcript

  1. IMPORTING AND SERVING MILLIONS OF RECORDS

    View Slide

  2. TALE OF A
    PROTOTYPE

    View Slide

  3. OXEANPEDIA

    View Slide

  4. View Slide

  5. fishing
    sailing
    science
    oil gas exploration
    shipping
    deep sea mining
    governmental use
    ecological monitoring

    View Slide

  6. WHERE'S THE DATA?

    View Slide

  7. 11 DIFFERENT SOURCES

    View Slide

  8. 500K - 1M RECORDS /
    MONTH / SOURCE

    View Slide

  9. BIG DATA?

    View Slide

  10. View Slide

  11. VVV

    View Slide

  12. VOLUME. VELOCITY. VARIETY.

    View Slide

  13. VOLUME
    FACEBOOK - 250 BILLION IMAGES

    View Slide

  14. VELOCITY
    FACEBOOK - 100 MILLION IMAGES / DAY

    View Slide

  15. VARIETY
    FACEBOOK - IMAGES, POSTS, PERSONAL MESSAGES, FRIENDSHIPS, LOCATIONS

    View Slide

  16. EXTREMELY LARGE DATA SETS ANALYSED
    COMPUTATIONALLY TO REVEAL PATTERNS,
    TRENDS, AND ASSOCIATIONS

    View Slide

  17. R&D
    DATABASE TYPE DECISION MAKING

    View Slide

  18. POSTGRES
    VS
    AMAZON DYNAMODB
    VS
    GOOGLE BIG TABLE

    View Slide

  19. POSTGRES

    View Slide

  20. AMAZON DYNAMODB

    View Slide

  21. GOOGLE BIG TABLE

    View Slide

  22. PG FOR THE WIN

    View Slide

  23. IMPORT

    View Slide

  24. THE STUPID SIMPLE WAY:
    MEASUREMENT.CREATE()

    View Slide

  25. ACTIVERECORDIMPORTER
    MEASUREMENT.IMPORT()

    View Slide

  26. WWGD
    WHAT WOULD GEORGE
    DO?

    View Slide

  27. HAMMERFIX
    POSTGRES COPY

    View Slide

  28. COPY COMMAND
    COPY measurements
    FROM 'measurements.csv'

    View Slide

  29. COPY COMMAND
    COPY measurements (temperature, latitude, longitude)
    FROM 'measurements.csv'

    View Slide

  30. COPY IN RUBY
    db_conn = ActiveRecord::Base.connection.raw_connection
    db_conn.copy_data('COPY measurements FROM STDIN') do
    CSV.foreach('measurements.csv', headers: true) do |row|
    db_conn.put_copy_data(row.fields + [Time.zone.now])
    end
    end

    View Slide

  31. module Oxeanpedia
    module Argo
    class CopyData
    method_object :file_path
    CSV_FIELD_NAMES = [
    'ARGOS_ID',
    'LATITUDE (degree_north)',
    'LONGITUDE (degree_east)',
    'PRES (decibar)',
    'TEMP (degree_Celsius)'
    ]
    def call
    pgconn.copy_data(copy_statement, enco) do
    CSV.foreach(file_path, headers: true) do |row|
    pgconn.put_copy_data(
    ['argo'] + [date_field(row)] + row.fields(*CSV_FIELD_NAMES) + [file_path]
    )
    end
    end
    end
    private
    def date_field(row)
    row['DATE (YYYY/MM/DD HH:MI:SS)'] || row['DATE (YYYY-MM-DDTHH:MI:SSZ)']
    end
    def pgconn
    @pgconn ||= ActiveRecord::Base.connection.raw_connection
    end
    def time
    @time ||= Time.zone.now
    end
    def enco
    PG::TextEncoder::CopyRow.new
    end
    def copy_statement
    <<-HEREDOC.strip_heredoc
    COPY argo_measurements (source, measured_on, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, filename)
    FROM STDIN
    HEREDOC
    end
    end
    end
    end

    View Slide

  32. TEMP TABLE

    View Slide

  33. WITH moved_rows AS (
    DELETE FROM argo_measurements am
    WHERE filename = '#{filename}'
    AND latitude IS NOT NULL
    AND longitude IS NOT NULL
    AND temperature_in_celsius IS NOT NULL
    AND depth_in_meters IS NOT NULL
    RETURNING am.*
    )
    INSERT INTO
    measurements (source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, created_at, updated_at, location)
    SELECT
    source,
    source_id,
    latitude,
    longitude,
    depth_in_meters,
    temperature_in_celsius,
    measured_on,
    current_timestamp,
    current_timestamp,
    ST_SetSRID(ST_MakePoint(longitude, latitude),4326) as location
    FROM moved_rows
    ON CONFLICT (latitude, longitude, depth_in_meters, measured_on) DO NOTHING

    View Slide

  34. GEO WTFS

    View Slide

  35. View Slide

  36. geometry ST_MakePoint(double precision x,
    double precision y);

    View Slide

  37. ARCHITECTURE

    View Slide

  38. View Slide

  39. REACT ON RAILS

    View Slide

  40. CLIENT: WHERE'S THE
    DESKTOP APP?

    View Slide

  41. SERVING THE
    DATA

    View Slide

  42. INDEXING

    View Slide

  43. View Slide

  44. 7K ROWS PER
    REQUEST

    View Slide

  45. COMPLETED 200 OK IN 2750MS
    (VIEWS: 2500MS | ACTIVERECORD:
    250MS)

    View Slide

  46. > PG string
    > PG Ruby row objects
    > ActiveRecord objects
    > ActiveRecordSerializer objects
    > JSON string as JSON API

    View Slide

  47. PAGINATE?

    View Slide

  48. CACHE?

    View Slide

  49. JSON DIRECTLY FROM
    POSTGRES

    View Slide

  50. SELECT
    json_build_object(
    'data',
    json_agg(
    json_build_object(
    'id', m.id,
    'type', 'measurements',
    'attributes', json_build_object(
    'latitude', m.latitude,
    'longitude', m.longitude,
    'depth_in_meters', m.depth_in_meters,
    'temperature_in_celsius', m.temperature_in_celsius,
    'measured_on', m.measured_on
    )
    )
    )
    ) as result
    FROM ..

    View Slide

  51. COMPLETED 200 OK IN 302MS
    (VIEWS: 2MS | ACTIVERECORD: 300MS)

    View Slide

  52. THAT'S ALL FOLKS!

    View Slide