Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real time data driven applications (and SQL vs NoSQL databases)

E9f5d8d804cef5201511a3c1654df1f1?s=47 lanzani
September 04, 2014

Real time data driven applications (and SQL vs NoSQL databases)

These are the slides of my No-SQL Matters talk I gave in Dublin in September 2014.

I cover what real time data driven applications are, present one of the app build for one of GoDataDriven customers, what challenges arose and what database helped us achieve the level of performance we wanted.



September 04, 2014


  1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP Real time data

    driven applications Giovanni Lanzani Data Whisperer and SQL vs NoSQL databases
  2. Feedback @gglanzani

  3. Real-time, data driven app? •No store and retrieve; •Store, {transform,

    enrich, analyse} and retrieve; •Real-time: retrieve is not a batch process; •App: something your mother could use: SELECT attendees FROM NoSQLMatters WHERE password = '1234';
  4. Get insight about event impact

  5. Get insight about event impact

  6. Get insight about event impact

  7. Get insight about event impact

  8. Get insight about event impact

  9. Challenges 1. Big Data 2. Privacy; 3. Some real-time analysis;

    4. Real-time retrieval.
  10. Is it Big Data?

  11. Is it Big Data? Everybody talks about it Nobody knows

    how to do it Everyone thinks everyone else is doing it, so everyone claims they’re doing it… Dan Ariely
  12. 2. Privacy

  13. 2. Privacy

  14. 3. (Some) real-time analysis

  15. •Harder than it looks; •Large data; •Retrieval is by giving

    date, center location + radius. 4. Real-Time Retrieval
  16. AngularJS python app REST Front-end Back-end JSON Architecture

  17. JS-1

  18. JS-2

  19. date hour id_activity postcode hits delta sbi 2013-01-01 12 1234

    1234AB 35 22 1 2013-01-08 12 1234 1234AB 45 35 1 2013-01-01 11 2345 5555ZB 2 1 2 2013-01-08 11 2345 5555ZB 55 2 2 Data Example
  20. Who has my data?

  21. Who has my data? •First iteration was a (pre)-POC, less

    data (3GB vs 500GB); •Time constraints; •Oeps: everything is a pandas df!
  22. Advantage of “everything is a df” Pro: •Fast!! •Use what

    you know •NO DBA’s! •We all love CSV’s!
  23. Advantage of “everything is a df” Pro: •Fast!! •Use what

    you know •NO DBA’s! •We all love CSV’s! Contra: •Doesn’t scale; •Huge startup time; •NO DBA’s! •We all hate CSV’s!
  24. AngularJS python app REST Front-end Back-end Database JSON ? If

    you don’t
  25. Issues?! •With a radius of 10km, in Amsterdam, you get

    10k postcodes. You need to do this in your SQL: ! ! ! •Index on date and postcode, but single queries running more than 20 minutes. SELECT * FROM datapoints WHERE date IN date_array AND postcode IN postcode_array;
  26. PostGIS is a spatial database extender for PostgreSQL. Supports geographic

    objects allowing location queries: SELECT * FROM datapoints WHERE ST_DWithin(lon, lat, 1500) AND dates IN ('2013-02-30', '2013-02-31'); -- every point within 1.5km -- from (lat, lon) on imaginary dates Postgres + Postgis (2.x)
  27. Other db’s?

  28. How we solved it 1. Align data on disk by

    date; 2. Use the temporary table trick: ! ! ! ! 3. Lose precision: 1234AB→1234 CREATE TEMPORARY TABLE tmp (postcodes STRING NOT NULL PRIMARY KEY); INSERT INTO tmp (postcodes) VALUES postcode_array; ! SELECT * FROM tmp JOIN datapoints d ON d.postcode = tmp.postcodes WHERE d.dt IN dates_array;
  29. Take home messages 1. Geospatial problems are hard and queries

    can be really slow; 2. Not everybody has infinite resources: be smart and KISS! 3. SQL or NoSQL? (Size, schema)
  30. GoDataDriven We’re hiring / Questions? / Thank you! @gglanzani giovannilanzani@godatadriven.com

    Giovanni Lanzani Data Whisperer