Importing and serving millions of records

5f81e2d2889d7642fd84c8b24db7ee17?s=47 DamirSvrtan
February 01, 2018
32

Importing and serving millions of records

5f81e2d2889d7642fd84c8b24db7ee17?s=128

DamirSvrtan

February 01, 2018
Tweet

Transcript

  1. IMPORTING AND SERVING MILLIONS OF RECORDS

  2. TALE OF A PROTOTYPE

  3. OXEANPEDIA

  4. None
  5. fishing sailing science oil gas exploration shipping deep sea mining

    governmental use ecological monitoring
  6. WHERE'S THE DATA?

  7. 11 DIFFERENT SOURCES

  8. 500K - 1M RECORDS / MONTH / SOURCE

  9. BIG DATA?

  10. None
  11. VVV

  12. VOLUME. VELOCITY. VARIETY.

  13. VOLUME FACEBOOK - 250 BILLION IMAGES

  14. VELOCITY FACEBOOK - 100 MILLION IMAGES / DAY

  15. VARIETY FACEBOOK - IMAGES, POSTS, PERSONAL MESSAGES, FRIENDSHIPS, LOCATIONS

  16. EXTREMELY LARGE DATA SETS ANALYSED COMPUTATIONALLY TO REVEAL PATTERNS, TRENDS,

    AND ASSOCIATIONS
  17. R&D DATABASE TYPE DECISION MAKING

  18. POSTGRES VS AMAZON DYNAMODB VS GOOGLE BIG TABLE

  19. POSTGRES

  20. AMAZON DYNAMODB

  21. GOOGLE BIG TABLE

  22. PG FOR THE WIN

  23. IMPORT

  24. THE STUPID SIMPLE WAY: MEASUREMENT.CREATE()

  25. ACTIVERECORDIMPORTER MEASUREMENT.IMPORT()

  26. WWGD WHAT WOULD GEORGE DO?

  27. HAMMERFIX POSTGRES COPY

  28. COPY COMMAND COPY measurements FROM 'measurements.csv'

  29. COPY COMMAND COPY measurements (temperature, latitude, longitude) FROM 'measurements.csv'

  30. COPY IN RUBY db_conn = ActiveRecord::Base.connection.raw_connection db_conn.copy_data('COPY measurements FROM STDIN')

    do CSV.foreach('measurements.csv', headers: true) do |row| db_conn.put_copy_data(row.fields + [Time.zone.now]) end end
  31. module Oxeanpedia module Argo class CopyData method_object :file_path CSV_FIELD_NAMES =

    [ 'ARGOS_ID', 'LATITUDE (degree_north)', 'LONGITUDE (degree_east)', 'PRES (decibar)', 'TEMP (degree_Celsius)' ] def call pgconn.copy_data(copy_statement, enco) do CSV.foreach(file_path, headers: true) do |row| pgconn.put_copy_data( ['argo'] + [date_field(row)] + row.fields(*CSV_FIELD_NAMES) + [file_path] ) end end end private def date_field(row) row['DATE (YYYY/MM/DD HH:MI:SS)'] || row['DATE (YYYY-MM-DDTHH:MI:SSZ)'] end def pgconn @pgconn ||= ActiveRecord::Base.connection.raw_connection end def time @time ||= Time.zone.now end def enco PG::TextEncoder::CopyRow.new end def copy_statement <<-HEREDOC.strip_heredoc COPY argo_measurements (source, measured_on, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, filename) FROM STDIN HEREDOC end end end end
  32. TEMP TABLE

  33. WITH moved_rows AS ( DELETE FROM argo_measurements am WHERE filename

    = '#{filename}' AND latitude IS NOT NULL AND longitude IS NOT NULL AND temperature_in_celsius IS NOT NULL AND depth_in_meters IS NOT NULL RETURNING am.* ) INSERT INTO measurements (source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, created_at, updated_at, location) SELECT source, source_id, latitude, longitude, depth_in_meters, temperature_in_celsius, measured_on, current_timestamp, current_timestamp, ST_SetSRID(ST_MakePoint(longitude, latitude),4326) as location FROM moved_rows ON CONFLICT (latitude, longitude, depth_in_meters, measured_on) DO NOTHING
  34. GEO WTFS

  35. None
  36. geometry ST_MakePoint(double precision x, double precision y);

  37. ARCHITECTURE

  38. None
  39. REACT ON RAILS

  40. CLIENT: WHERE'S THE DESKTOP APP?

  41. SERVING THE DATA

  42. INDEXING

  43. None
  44. 7K ROWS PER REQUEST

  45. COMPLETED 200 OK IN 2750MS (VIEWS: 2500MS | ACTIVERECORD: 250MS)

  46. > PG string > PG Ruby row objects > ActiveRecord

    objects > ActiveRecordSerializer objects > JSON string as JSON API
  47. PAGINATE?

  48. CACHE?

  49. JSON DIRECTLY FROM POSTGRES

  50. SELECT json_build_object( 'data', json_agg( json_build_object( 'id', m.id, 'type', 'measurements', 'attributes',

    json_build_object( 'latitude', m.latitude, 'longitude', m.longitude, 'depth_in_meters', m.depth_in_meters, 'temperature_in_celsius', m.temperature_in_celsius, 'measured_on', m.measured_on ) ) ) ) as result FROM ..
  51. COMPLETED 200 OK IN 302MS (VIEWS: 2MS | ACTIVERECORD: 300MS)

  52. THAT'S ALL FOLKS!