Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Qrious about Insights -- Big Data in the Real World

Guy Kloss
February 07, 2017

Qrious about Insights -- Big Data in the Real World

Presentation for the Data Science Research Group Workshop on 7 February 2017 at AUT. The talk centres around the problem in Big Data analytics, tools for overcoming these problems, and the way the company Qrious leverages these to build solutions.

Guy Kloss

February 07, 2017
Tweet

More Decks by Guy Kloss

Other Decks in Technology

Transcript

  1. Qrious about Insights Big Data in the Real World AUT

    DSRG Workshop Guy Kloss [email protected] Enterprise Architect Qrious Limited 7 February 2017
  2. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 2/41
  3. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Who/What is Qrious? We help New Zealand businesses and public sector organisations create value and solve their most pressing business problems by turning data into actionable insight. Guy Kloss | Big Data in the Real World 3/41
  4. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Who/What is Qrious? Backed by Spark Approx. 60 employees Offices in Auckland & Wellington Substantial investment across Data, Platform & People Built from the ground up (new generation technology and working principles) One of the largest Data Science teams in the country with > 80% qualified to Masters & PhD level and over 60 years of combined experience years of combined experience NZs leading data analytics specialist by 2017 Guy Kloss | Big Data in the Real World 4/41
  5. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Our Capabilities Advanced analytics Location insights Big Data platforms Consulting services BI & Warehousing Guy Kloss | Big Data in the Real World 5/41
  6. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Who am I? Chemical Engineer (Masters) Rocket Scientist (German Aerospace Centre) Computer Scientist (PhD) Former lecturer (AUT) Lead Software Developer and Head Crypto Geek @ Mega Enterprise Architect at Qrious Dad, baseballer, diver, . . . general geek! Guy Kloss | Big Data in the Real World 6/41
  7. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 7/41
  8. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Data size Number of records Data volume Guy Kloss | Big Data in the Real World 8/41
  9. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam An exponentially growing data world Primary Memory/Disk Capacity Guy Kloss | Big Data in the Real World 9/41
  10. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam An exponentially growing data world Relative Speeds Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap Guy Kloss | Big Data in the Real World 10/41
  11. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Size Does Matter! Access/processing beyond a single machine (RAM, disk, CPU) Expensive data transfers at volume (latency, throughput) Guy Kloss | Big Data in the Real World 11/41
  12. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Storage Issues Storage, access, index, find Transfer, manage, prevent data loss Guy Kloss | Big Data in the Real World 12/41
  13. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Types of Data Structured Unstructured Graphs Free text . . . Guy Kloss | Big Data in the Real World 13/41
  14. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Correlating . . . co-relating . . . mashing . . . Not single record problem But an m : n problem Guy Kloss | Big Data in the Real World 14/41
  15. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Beyond Exponential Problems are between exponential and hyperexponential → Enabling data processing in an exponential world Guy Kloss | Big Data in the Real World 15/41
  16. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 16/41
  17. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Number of Records > 1 trillion (109) records: Spark’s location based data set Anonymised for privacy (on ingest) Fully encrypted (at rest and in transport) Continuous/stream ingestion Normalisation and segmentation on data set Correlating with external data set → Finding insights in this “hay mountain” Guy Kloss | Big Data in the Real World 17/41
  18. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Data Volume 100s of TB to PB of “Data Lakes” Not just a backup/data grave Fully encrypted (at rest and in transport) Includes data querying and processing capability → Capability to “store everything” (every thing and kind) Guy Kloss | Big Data in the Real World 18/41
  19. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 19/41
  20. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Divide and Conquer Massively parallel processing: MPP Parallelise: Map-Reduce Pipelines: Stream processing Guy Kloss | Big Data in the Real World 20/41
  21. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Leverage Data Locality Bring processing to the data Guy Kloss | Big Data in the Real World 21/41
  22. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam The Right Tools Don’t re-invent the wheel Use existing high performing tools where possible Available high productivity frameworks, making use of high level languages The right tool for the type of data Use the Source, Luke! (Leverage open source based tooling with a community) Guy Kloss | Big Data in the Real World 22/41
  23. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam The Right Data Organisation Row vs. columnar storage → For analytics often better in columnar format Guy Kloss | Big Data in the Real World 23/41
  24. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam In, Out, Cha-Cha-Cha Ingest data from (legacy, external) source systems → ETL – Extract, Transform, Load Make sure the rhythm fits (no missing “Out”) Guy Kloss | Big Data in the Real World 24/41
  25. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 25/41
  26. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Hadoop Hadoop and distributions Processing tools for relational, streaming, batch, graph, text, search, . . . Allocates cluster resources dynamically Data distributed (with redundancy), so compute allocated where data is Guy Kloss | Big Data in the Real World 26/41
  27. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Hadoop Distributions Many Hadoop distributions: Similar to Linux distributions Cloudera Partnership with Qrious “Bronze” partner Ambitions to become “Silver” partner and MSP (managed service provider) Guy Kloss | Big Data in the Real World 27/41
  28. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Basic Hadoop Tool Suite Example: Cloudera Hadoop Distribution Guy Kloss | Big Data in the Real World 28/41
  29. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam MPP Databases DB for massively parallel processing (MPP) Greenplum database and forks (based on PostgreSQL) Guy Kloss | Big Data in the Real World 29/41
  30. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Generic and Specialised DBs Generic RDBMS (where useful) NoSQL Graph DB Other columnar species Guy Kloss | Big Data in the Real World 30/41
  31. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 31/41
  32. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Delivering a Suitable Solution Includes: System management Connectivity Application logic Services Yummy add-ons Guy Kloss | Big Data in the Real World 32/41
  33. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam System Management Framework Security Dedicated sub-networks with specific firewall rules External firewalls User and credentials management Log collector Other security tools . . . System access VPN Remote desktop services Guy Kloss | Big Data in the Real World 33/41
  34. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Connectivity API gateways (Reverse) proxies SFTP Guy Kloss | Big Data in the Real World 34/41
  35. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Application Logic Platfor-as-a-Service (PaaS) Huge benefits of containerising application logic (using Docker) → Much reduced cadence for delivery APIs, Micro-Services Orchestration of Big Data analysis Guy Kloss | Big Data in the Real World 35/41
  36. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Services Solutioning, build Analytics and development Operation and maintenance Guy Kloss | Big Data in the Real World 36/41
  37. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Bonus Points for . . . Provenance (reproducibility, auditability, compliance) AI and ML Blockchain (non-repudiation, trust, “smart contracts”, identity management, federation, . . . ) Guy Kloss | Big Data in the Real World 37/41
  38. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 38/41
  39. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam In the Qrious Pipeline Make Big Data a commodity: Don’t buy, pay what you need! → Big-Data-as-a-Service – BDPaaS Sliced, diced and configured to your needs Straight on bare metal, not VMs (like most cloud hosters) Guy Kloss | Big Data in the Real World 39/41
  40. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Maximising the Jobmarket What skills do you need? RDBMS? SAS? NoSQL DBs? Maybe Hadoop is a good answer? Guy Kloss | Big Data in the Real World 40/41
  41. The Problem Examples The Solution Tools of the Trade Boxing

    up a Solution Flotsam and Jetsam Questions? Parallelise! Guy Kloss [email protected] Just a humble hair–dryer from the 30s: “One of the first machines used for permanent wave hairstyling back in the 1920’s and 1930’s.” Dark Roasted Blend: http://www.darkroastedblend.com/2007/05/ mystery-devices-issue-2.html Guy Kloss | Big Data in the Real World 41/41