Slide 1

Slide 1 text

1 Sharding Weather
 Practical examples with big data and ElasticSearch Timmo Freudl-Gierke
 @timmo_8 September 2015

Slide 2

Slide 2 text

Agenda • Value Chains of a private Weather Company • Weather in numbers • How to forecast weather • Sharding Weather • Elastic Search PoC Measurements • Lessons Learned • Wrap Up & Prospect 2

Slide 3

Slide 3 text

Value Chain of a private Weather Company 3

Slide 4

Slide 4 text

Empowering the world to master the weather 4

Slide 5

Slide 5 text

5 WeatherPro, AlertsPro, RainToday, MeteoEarth

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

7

Slide 8

Slide 8 text

Storm Chaser 8

Slide 9

Slide 9 text

9

Slide 10

Slide 10 text

10

Slide 11

Slide 11 text

11

Slide 12

Slide 12 text

12

Slide 13

Slide 13 text

Skyview (Live Demo) 13

Slide 14

Slide 14 text

Sky View 14

Slide 15

Slide 15 text

Weather in numbers 15

Slide 16

Slide 16 text

John von Neumann One of the designers of the electronic computer and a participant in the production of the first numerical weather prediction (forecast). 16

Slide 17

Slide 17 text

European Centre for Medium- Range Weather Forecasts 17

Slide 18

Slide 18 text

Numeric Weather Prediction Models 18

Slide 19

Slide 19 text

ECMWF Model 19 Issues per Day 2 Forecast Periods 57 Resolution (horizontal) N640 => 0.25° x 0.25° => ~16 km Resolution (Grid Points) 2140702 Elevation Levels 10 Attributes 124 Volumen per day (GB) 1127,31

Slide 20

Slide 20 text

5 models from different providers 20

Slide 21

Slide 21 text

Observations • WMO Weather Stations • 10k surface • 1k upper-air • 7k ships • 1k drifting buoys • 3k aircrafts • MeteoGroup Measurement Network • 1.5k Stations • Radar • Satellite 21

Slide 22

Slide 22 text

Opened last week 22

Slide 23

Slide 23 text

How to forecast weather 23

Slide 24

Slide 24 text

Today 24

Slide 25

Slide 25 text

Gridded Forecast Model 25 Horizontal Grid (Latitude / Longitude) Vertical Grid (Height or Pressure)

Slide 26

Slide 26 text

26 Forecast Periods

Slide 27

Slide 27 text

27 + => High Quality Weather Station Forecast Statistical Correction of Numeric Forecast

Slide 28

Slide 28 text

Challenges with Mobile Clients 28

Slide 29

Slide 29 text

Sunglasses or Umbrella? 29

Slide 30

Slide 30 text

30 Weather Station Smart Phone User

Slide 31

Slide 31 text

31 Tomorrow

Slide 32

Slide 32 text

Demand • High resolution • Hyper Local • On Demand 32

Slide 33

Slide 33 text

Why not just doing it? • New Meteorological “Magic” necessary • “Modern” Technologies required • Turning everything upside down 33

Slide 34

Slide 34 text

A little bit of Meteorological Magic 34

Slide 35

Slide 35 text

Compute on 50+ Grid Points 35

Slide 36

Slide 36 text

36 Topography

Slide 37

Slide 37 text

37 Land Usage Affects http://overpass-turbo.eu

Slide 38

Slide 38 text

Technology Magic: On the fly calculation! 38

Slide 39

Slide 39 text

39 Model Station Observations Magic Station Forecast Forecast
 User Pre-Processing Model Magic Forecast
 User On-the-Fly

Slide 40

Slide 40 text

Sharding Weather 40

Slide 41

Slide 41 text

Sharding 41 Sharding is the equivalent of "horizontal partitioning". 
 When you shard a database, you create replica's of the schema, and then divide what data is stored in each shard based on a shard key. https://www.quora.com/Whats-the-difference-between-sharding-and-partition

Slide 42

Slide 42 text

Linear Scalability 42 “Serve N times more requests just by 
 adding N times more nodes.”

Slide 43

Slide 43 text

Weather Dimensions 43

Slide 44

Slide 44 text

Weather Dimensions 44 Weather Param Location Time

Slide 45

Slide 45 text

45 Location

Slide 46

Slide 46 text

46 Sharding Node

Slide 47

Slide 47 text

47 Sharding Key = geohash

Slide 48

Slide 48 text

48 Time

Slide 49

Slide 49 text

49 Sharding Key = Forecast Period

Slide 50

Slide 50 text

50 Weather Param

Slide 51

Slide 51 text

51 Sharding Node Sharding Node Sharding Node Sharding Node Sharding Node temperature duepoint windspeed probability of rain … Sharding Key = weather parameter (name/id)

Slide 52

Slide 52 text

Elastic Search PoC Measurements 52

Slide 53

Slide 53 text

Elastic Search • NoSQL Database • search and analytics engine • designed for horizontal scalability • developer-friendly query language • structured, unstructured, and time-series data 53

Slide 54

Slide 54 text

Measurement Use Cases 54

Slide 55

Slide 55 text

55 Use Case 1:
 Request Weather Forecast

Slide 56

Slide 56 text

56 Use Case 2: Ingest Weather Forecast Model

Slide 57

Slide 57 text

PoC Architecture Physical Deployment View Elastic Search Node JMeter JMeter 57 Computation Service Elastic Search Node Computation Service Elastic Search Node Elastic Search Node Elastic Search Node JMeter Load Balancer Computation Service 13x 3x 3x

Slide 58

Slide 58 text

Cluster Setup • JMeter • 1 master: c4.4xlarge • 2 worker: c4.4xlarge • Docker • Elastic Load Balancer • Computation Service • 3 nodes t2.micro • Spring Boot • Docker • ElasticSearch cluster: • 3 master nodes: m4.xlarge • 10 data nodes: c4.4xlarge • Monthly cost: ~4,000 USD 58

Slide 59

Slide 59 text

Iterative Optimisation 59 Measure Analyse Results Find & Remove Bottleneck Redeploy

Slide 60

Slide 60 text

Default Sharding 60 # Documents Doc Size (B) AVG latency
 (ms) Throughput
 (req/s) CPU 1.000.000.000 470 298 263 350% • shard on artificial ID • Document contains • lat / lon • one forecast period • one elevation levels • one weather parameter

Slide 61

Slide 61 text

Default Sharding 61 Computation Service lat/lon+time get n surrounding 
 grid points • documents are equally distributed • n nodes must deliver data

Slide 62

Slide 62 text

Shard on Weather Parameter • Sharding Key = weather parameter (name/id) • 120 different parameter • Dependencies between parameter • Multiple parameter necessary to compute the “present weather” • Number of parameter required for computation 
 equals number of shards to deliver data 62

Slide 63

Slide 63 text

Sharding on Geo Location 63 • Sharding Key = Geohash • 470B Documents • 34 Shards • One node touched # Documents Requests/s AVG latency
 (ms) Throughput
 (req/s) 48.629.840 2000 26 619 48.629.840 8000 67 1114 Elastic Search does not support sharding on geo location. Custom hash function necessary.

Slide 64

Slide 64 text

Sharding on Forecast Period 64 • Sharding Key = 
 Forecast Period • 470B Documents • 100 Shards • One to 3 nodes touched # Documents Marvel Requests/s AVG latency
 (ms) Throughput
 (req/s) 48.629.840 ON 8000 28 2610 1.055.126.000 ON 8000 111 683 1.055.126.000 OFF 8000 80 960

Slide 65

Slide 65 text

How long does the ingestion process take? 65

Slide 66

Slide 66 text

Ingestion Numbers • 1:15 hours to download all chunks • Max need today: 71k documents / second • Shard Key = 
 Forecast Period 66 # Documents # Params # Elev. AVG 
 (documents/sec) 211.025.200 10 2 ~400k/s 1.055.126.000 50 10 ~67k/s

Slide 67

Slide 67 text

Lessons Learned 67

Slide 68

Slide 68 text

Know your domain. This is key for sharding. 68

Slide 69

Slide 69 text

Elastic Search will do the Job • transport client -> node client • dedicated master node • custom geo-hash function 69

Slide 70

Slide 70 text

AWS & Infrastructure • Infrastructure as code! • Enables faster re-configuration & re-deployments • 124 scripts at the end • Embedded (Senior) Operator / DBA is key • Fast deployment pipeline is key 
 for fast evolution of PoC 70

Slide 71

Slide 71 text

Meteorological Research together with Technology • Sandbox vs production ready code • Development Approach 
 (e.g. TDD, Pair Programming) • DevOps => MetDev 71

Slide 72

Slide 72 text

Team Vibes 72 Measure Analyse Results Remove Bottleneck Redeploy

Slide 73

Slide 73 text

Wrap Up & Prospect • Insights of a domain which affects everyone • Forecast in 200 ms for arbitrary global position • Future • Derived Values • Market specific forecasts • IoT (cars, mobiles, …) • Crowd sourcing 73

Slide 74

Slide 74 text

74 Sharding Weather
 Practical examples with big data and ElasticSearch Timmo Freudl-Gierke
 @timmo_8 September 2015

Slide 75

Slide 75 text

75 We are looking for Java Super Heroes Scrum RESTful SQL Email:[email protected] Subject: JAVA DEVELOPER - ENERGY