Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sharding Weather
 - Practical examples with big data and ElasticSearch

Sharding Weather
 - Practical examples with big data and ElasticSearch

The key challenge of processing weather data are “simply” the big numbers: Multiple numeric weather models with about 250GB of data at least twice a day. A year of world wide observation data. Satellite and radar images every 5 minutes. How to store the data efficiently? And how to query and lighting fast process the weather parameters? This talk gives you insights on one of the “oldest” big data domains in the industry.

You will learn how they worked around these problems in the old days and how new solutions based on sharding and NoSQL offer. We will present practical results from our in-depth PoC’s. You get to know the borders of Elastic Search, options on sharding and how to measure the different setups.

Timmo Freudl-Gierke

September 18, 2015
Tweet

More Decks by Timmo Freudl-Gierke

Other Decks in Technology

Transcript

  1. 1 Sharding Weather
 Practical examples with big data and ElasticSearch

    Timmo Freudl-Gierke
 @timmo_8 September 2015
  2. Agenda • Value Chains of a private Weather Company •

    Weather in numbers • How to forecast weather • Sharding Weather • Elastic Search PoC Measurements • Lessons Learned • Wrap Up & Prospect 2
  3. 6

  4. 7

  5. 9

  6. 10

  7. 11

  8. 12

  9. John von Neumann One of the designers of the electronic

    computer and a participant in the production of the first numerical weather prediction (forecast). 16
  10. ECMWF Model 19 Issues per Day 2 Forecast Periods 57

    Resolution (horizontal) N640 => 0.25° x 0.25° => ~16 km Resolution (Grid Points) 2140702 Elevation Levels 10 Attributes 124 Volumen per day (GB) 1127,31
  11. Observations • WMO Weather Stations • 10k surface • 1k

    upper-air • 7k ships • 1k drifting buoys • 3k aircrafts • MeteoGroup Measurement Network • 1.5k Stations • Radar • Satellite 21
  12. Why not just doing it? • New Meteorological “Magic” necessary

    • “Modern” Technologies required • Turning everything upside down 33
  13. Sharding 41 Sharding is the equivalent of "horizontal partitioning". 


    When you shard a database, you create replica's of the schema, and then divide what data is stored in each shard based on a shard key. https://www.quora.com/Whats-the-difference-between-sharding-and-partition
  14. 51 Sharding Node Sharding Node Sharding Node Sharding Node Sharding

    Node temperature duepoint windspeed probability of rain … Sharding Key = weather parameter (name/id)
  15. Elastic Search • NoSQL Database • search and analytics engine

    • designed for horizontal scalability • developer-friendly query language • structured, unstructured, and time-series data 53
  16. PoC Architecture Physical Deployment View Elastic Search Node JMeter JMeter

    57 Computation Service Elastic Search Node Computation Service Elastic Search Node Elastic Search Node Elastic Search Node JMeter Load Balancer Computation Service 13x 3x 3x
  17. Cluster Setup • JMeter • 1 master: c4.4xlarge • 2

    worker: c4.4xlarge • Docker • Elastic Load Balancer • Computation Service • 3 nodes t2.micro • Spring Boot • Docker • ElasticSearch cluster: • 3 master nodes: m4.xlarge • 10 data nodes: c4.4xlarge • Monthly cost: ~4,000 USD 58
  18. Default Sharding 60 # Documents Doc Size (B) AVG latency


    (ms) Throughput
 (req/s) CPU 1.000.000.000 470 298 263 350% • shard on artificial ID • Document contains • lat / lon • one forecast period • one elevation levels • one weather parameter
  19. Default Sharding 61 Computation Service lat/lon+time get n surrounding 


    grid points • documents are equally distributed • n nodes must deliver data
  20. Shard on Weather Parameter • Sharding Key = weather parameter

    (name/id) • 120 different parameter • Dependencies between parameter • Multiple parameter necessary to compute the “present weather” • Number of parameter required for computation 
 equals number of shards to deliver data 62
  21. Sharding on Geo Location 63 • Sharding Key = Geohash

    • 470B Documents • 34 Shards • One node touched # Documents Requests/s AVG latency
 (ms) Throughput
 (req/s) 48.629.840 2000 26 619 48.629.840 8000 67 1114 Elastic Search does not support sharding on geo location. Custom hash function necessary.
  22. Sharding on Forecast Period 64 • Sharding Key = 


    Forecast Period • 470B Documents • 100 Shards • One to 3 nodes touched # Documents Marvel Requests/s AVG latency
 (ms) Throughput
 (req/s) 48.629.840 ON 8000 28 2610 1.055.126.000 ON 8000 111 683 1.055.126.000 OFF 8000 80 960
  23. Ingestion Numbers • 1:15 hours to download all chunks •

    Max need today: 71k documents / second • Shard Key = 
 Forecast Period 66 # Documents # Params # Elev. AVG 
 (documents/sec) 211.025.200 10 2 ~400k/s 1.055.126.000 50 10 ~67k/s
  24. Elastic Search will do the Job • transport client ->

    node client • dedicated master node • custom geo-hash function 69
  25. AWS & Infrastructure • Infrastructure as code! • Enables faster

    re-configuration & re-deployments • 124 scripts at the end • Embedded (Senior) Operator / DBA is key • Fast deployment pipeline is key 
 for fast evolution of PoC 70
  26. Meteorological Research together with Technology • Sandbox vs production ready

    code • Development Approach 
 (e.g. TDD, Pair Programming) • DevOps => MetDev 71
  27. Wrap Up & Prospect • Insights of a domain which

    affects everyone • Forecast in 200 ms for arbitrary global position • Future • Derived Values • Market specific forecasts • IoT (cars, mobiles, …) • Crowd sourcing 73
  28. 74 Sharding Weather
 Practical examples with big data and ElasticSearch

    Timmo Freudl-Gierke
 @timmo_8 September 2015