Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Importing and serving millions of records

DamirSvrtan
February 01, 2018
110

Importing and serving millions of records

DamirSvrtan

February 01, 2018
Tweet

Transcript

  1. IMPORTING AND SERVING MILLIONS OF RECORDS

    View full-size slide

  2. TALE OF A
    PROTOTYPE

    View full-size slide

  3. fishing
    sailing
    science
    oil gas exploration
    shipping
    deep sea mining
    governmental use
    ecological monitoring

    View full-size slide

  4. WHERE'S THE DATA?

    View full-size slide

  5. 11 DIFFERENT SOURCES

    View full-size slide

  6. 500K - 1M RECORDS /
    MONTH / SOURCE

    View full-size slide

  7. VOLUME. VELOCITY. VARIETY.

    View full-size slide

  8. VOLUME
    FACEBOOK - 250 BILLION IMAGES

    View full-size slide

  9. VELOCITY
    FACEBOOK - 100 MILLION IMAGES / DAY

    View full-size slide

  10. VARIETY
    FACEBOOK - IMAGES, POSTS, PERSONAL MESSAGES, FRIENDSHIPS, LOCATIONS

    View full-size slide

  11. EXTREMELY LARGE DATA SETS ANALYSED
    COMPUTATIONALLY TO REVEAL PATTERNS,
    TRENDS, AND ASSOCIATIONS

    View full-size slide

  12. R&D
    DATABASE TYPE DECISION MAKING

    View full-size slide

  13. POSTGRES
    VS
    AMAZON DYNAMODB
    VS
    GOOGLE BIG TABLE

    View full-size slide

  14. AMAZON DYNAMODB

    View full-size slide

  15. GOOGLE BIG TABLE

    View full-size slide

  16. PG FOR THE WIN

    View full-size slide

  17. THE STUPID SIMPLE WAY:
    MEASUREMENT.CREATE()

    View full-size slide

  18. ACTIVERECORDIMPORTER
    MEASUREMENT.IMPORT()

    View full-size slide

  19. WWGD
    WHAT WOULD GEORGE
    DO?

    View full-size slide

  20. HAMMERFIX
    POSTGRES COPY

    View full-size slide

  21. COPY COMMAND
    COPY measurements
    FROM 'measurements.csv'

    View full-size slide

  22. COPY COMMAND
    COPY measurements (temperature, latitude, longitude)
    FROM 'measurements.csv'

    View full-size slide

  23. COPY IN RUBY
    db_conn = ActiveRecord::Base.connection.raw_connection
    db_conn.copy_data('COPY measurements FROM STDIN') do
    CSV.foreach('measurements.csv', headers: true) do |row|
    db_conn.put_copy_data(row.fields + [Time.zone.now])
    end
    end

    View full-size slide

  24. module Oxeanpedia
    module Argo
    class CopyData
    method_object :file_path
    CSV_FIELD_NAMES = [
    'ARGOS_ID',
    'LATITUDE (degree_north)',
    'LONGITUDE (degree_east)',
    'PRES (decibar)',
    'TEMP (degree_Celsius)'
    ]
    def call
    pgconn.copy_data(copy_statement, enco) do
    CSV.foreach(file_path, headers: true) do |row|
    pgconn.put_copy_data(
    ['argo'] + [date_field(row)] + row.fields(*CSV_FIELD_NAMES) + [file_path]
    )
    end
    end
    end
    private
    def date_field(row)
    row['DATE (YYYY/MM/DD HH:MI:SS)'] || row['DATE (YYYY-MM-DDTHH:MI:SSZ)']
    end
    def pgconn
    @pgconn ||= ActiveRecord::Base.connection.raw_connection
    end
    def time
    @time ||= Time.zone.now
    end
    def enco
    PG::TextEncoder::CopyRow.new
    end
    def copy_statement
    <<-HEREDOC.strip_heredoc
    COPY argo_measurements (source, measured_on, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, filename)
    FROM STDIN
    HEREDOC
    end
    end
    end
    end

    View full-size slide

  25. WITH moved_rows AS (
    DELETE FROM argo_measurements am
    WHERE filename = '#{filename}'
    AND latitude IS NOT NULL
    AND longitude IS NOT NULL
    AND temperature_in_celsius IS NOT NULL
    AND depth_in_meters IS NOT NULL
    RETURNING am.*
    )
    INSERT INTO
    measurements (source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, created_at, updated_at, location)
    SELECT
    source,
    source_id,
    latitude,
    longitude,
    depth_in_meters,
    temperature_in_celsius,
    measured_on,
    current_timestamp,
    current_timestamp,
    ST_SetSRID(ST_MakePoint(longitude, latitude),4326) as location
    FROM moved_rows
    ON CONFLICT (latitude, longitude, depth_in_meters, measured_on) DO NOTHING

    View full-size slide

  26. geometry ST_MakePoint(double precision x,
    double precision y);

    View full-size slide

  27. ARCHITECTURE

    View full-size slide

  28. REACT ON RAILS

    View full-size slide

  29. CLIENT: WHERE'S THE
    DESKTOP APP?

    View full-size slide

  30. SERVING THE
    DATA

    View full-size slide

  31. 7K ROWS PER
    REQUEST

    View full-size slide

  32. COMPLETED 200 OK IN 2750MS
    (VIEWS: 2500MS | ACTIVERECORD:
    250MS)

    View full-size slide

  33. > PG string
    > PG Ruby row objects
    > ActiveRecord objects
    > ActiveRecordSerializer objects
    > JSON string as JSON API

    View full-size slide

  34. JSON DIRECTLY FROM
    POSTGRES

    View full-size slide

  35. SELECT
    json_build_object(
    'data',
    json_agg(
    json_build_object(
    'id', m.id,
    'type', 'measurements',
    'attributes', json_build_object(
    'latitude', m.latitude,
    'longitude', m.longitude,
    'depth_in_meters', m.depth_in_meters,
    'temperature_in_celsius', m.temperature_in_celsius,
    'measured_on', m.measured_on
    )
    )
    )
    ) as result
    FROM ..

    View full-size slide

  36. COMPLETED 200 OK IN 302MS
    (VIEWS: 2MS | ACTIVERECORD: 300MS)

    View full-size slide

  37. THAT'S ALL FOLKS!

    View full-size slide