Slide 1

Slide 1 text

T U R N I N G M A S S I V E N A S A D ATA I N T O A P I X E L - P E R F E C T W O R L D AT L A S C L E A R S K I E S

Slide 2

Slide 2 text

M A P B O X W E M A K E C U S T O M M A P S . W E N E E D E D A G O O D S A T E L L I T E B A S E M A P. Satellite Team: Chris Herwig (@hrwgc) – [email protected] Charlie Loyd (@vruba) – [email protected] Bruno Sánchez-Andrade Nuño (@brunosan) – [email protected]

Slide 3

Slide 3 text

B A S E M A P S • Aesthetics and general accuracy both vital • Blue Marble was a big inspiration but not our goal • Goal: Avoid spatial interpolation • Goal: Show peak growth everywhere at once • MODIS was the clear choice

Slide 4

Slide 4 text

A L G O R I T H M

Slide 5

Slide 5 text

E X A M P L E : W E S T A F R I C A

Slide 6

Slide 6 text

Q U A L I T Y • The quality function is the core of this whole process • Algorithm assigns pixels scores based on how much they look like ground cover (as opposed to clouds, smoke, errors, missing data, etc.) • Since it looks at every single input pixel (~5e12 pixels), it has to be very, very simple

Slide 7

Slide 7 text

D E S I R A B L E P I X E L S A P P R O A C H T H E B O T T O M R I G H T ! Saturation (x) and Lightness (y)

Slide 8

Slide 8 text

• Only pixels in a triangle attached to the left edge and reaching across to the right edge are valid in this space, because we’ve defined saturation as the brightest RGB channel minus the dimmest. • Under this definition, there are no highly saturated black or white pixels.

Slide 9

Slide 9 text

Q U A L I T Y

Slide 10

Slide 10 text

T H E S O RT E D S TA C K

Slide 11

Slide 11 text

2

Slide 12

Slide 12 text

8

Slide 13

Slide 13 text

1 3

Slide 14

Slide 14 text

2 6

Slide 15

Slide 15 text

4 1

Slide 16

Slide 16 text

5 8

Slide 17

Slide 17 text

T H E D ATA

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

S O U R C E I M A G E RY • NASA Lance-MODIS Rapid Response 9°×9°s • 374 tiles (row/column subsets overlapping land) • 730–1460 scenes per tile

Slide 20

Slide 20 text

L A N C E M O D I S S U B S E T S

Slide 21

Slide 21 text

E X A M P L E : R O W 1 4 , C O L U M N 1 1 , T E R R A O C T. 1 5 , 2 0 1 2

Slide 22

Slide 22 text

S O U R C E I M A G E RY • Total desired source images: > 400,000 • Timeline was ASAP • We talked with Ryan Boller (NASA GIBS) to figure out best way to handle data acquisition

Slide 23

Slide 23 text

A C Q U I S I T I O N • Lance MODIS was good • GIBS (Global Imagery Browse Service) was better in terms of bulk download speeds • GIBS only had MODIS data back to 5/2012 • Lance MODIS offered pre-cut subsets; GIBS data was via global WMS

Slide 24

Slide 24 text

D O W N L O A D P L A N • Rows 0–7: days of the year 1–80, 265–366 • Rows 8–11: all days of the year • Rows 12+: days of the year 81–264 • All 2011 imagery from Lance • 2012-001–2012-137 from Lance • 2012-138+ from GIBS

Slide 25

Slide 25 text

S O U R C E I M A G E RY • Distributed downloads across 30–45 EC2s, each running 5–15 download processes • 10 second wait time between successive downloads • Random 0–3 second wait time between concurrent downloads (across machines) • Amazon SQS: Task queuing system to ensure we downloaded each file exactly once across our cloud

Slide 26

Slide 26 text

wget \ -T 45 \ --limit-rate=500k \ -O RRGlobal_$LANCE_ID.$DATE.$SATELLITE.250m.jpg \ http://lance-modis.eosdis.nasa.gov/imagery/ subsets/RRGlobal_$LANCE_ID/$DATE/RRGlobal_$LANCE_ID. $DATE.$SATELLITE.250m.jpg L A N C E - M O D I S D O W N L O A D

Slide 27

Slide 27 text

function get_coords () { ROW="$1" COLUMN="$2" ULLON=$(echo $COLUMN*9-180 | bc); LRLON=$(echo $ULLON+9 | bc); LRLAT=$(echo $ROW*9-90 | bc); ULLAT=$(echo $LRLAT+9 | bc); echo "$ULLON $ULLAT $LRLON $LRLAT" } G I B S D O W N L O A D

Slide 28

Slide 28 text

gdal_translate \ -of JPEG \ -outsize 4096 4096 \ -projwin $BBOX \ -co QUALITY=80 \ -a_srs EPSG:4326 \ " http://map1.vis.earthdata.nasa.gov/twms-geo/ twms.cgi? MODIS ${SATELLITE} tileset $DATE " \ "RRGlobal_${LANCE_ID}.${DOY_DATE}.${SATELLITE}.250m.jpg" G I B S D O W N L O A D

Slide 29

Slide 29 text

P R O C E S S I N G

Slide 30

Slide 30 text

P R O C E S S I N G Declouding for all 374 tiles: 639 computing hours ! 40 Amazon EC2 spot instances (M2.2XL) for 16 hours ! Besides a few tests and re-dos, the bulk of the processing happened over a single weekend.

Slide 31

Slide 31 text

E C 2 M U X

Slide 32

Slide 32 text

E C 2 M U X

Slide 33

Slide 33 text

C H A L L E N G E S

Slide 34

Slide 34 text

D AT E L I N E L O N G I T U D E ± 1 8 0 (That’s an image bug, not a deliberate highlight…)

Slide 35

Slide 35 text

R R G L O B A L _ R 1 7 C 0 0 . 2 0 1 2 2 4 6 . T E R R A . 2 5 0 M . J P G

Slide 36

Slide 36 text

R R G L O B A L _ R 1 7 C 0 0 . 2 0 1 2 2 4 6 . T E R R A . 2 5 0 M . J P G

Slide 37

Slide 37 text

C O L O R

Slide 38

Slide 38 text

Rendering color for human consumption is always partly subjective, but some color treatments are more accurate than others.

Slide 39

Slide 39 text

A N TA R C T I C A

Slide 40

Slide 40 text

D E C L O U D E D A N TA R C T I C A

Slide 41

Slide 41 text

L I M A L A N D S A T I M A G E M O S A I C O F A N TA R C T I C A

Slide 42

Slide 42 text

M O A M O S A I C O F A N TA R C T I C A

Slide 43

Slide 43 text

R E S U LT S

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

A P R E V I E W O F M A P B O X ’ S N E X T I T E R AT I O N O F C L O U D L E S S AT L A S ! T H E C L O U D V. C L O U D S

Slide 48

Slide 48 text

Artist’s Rendering of Landsat 7, Credit: NASA/Goddard

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

L A N D S AT V. M O D I S Landsat’s resolution is great compared to MODIS. But it comes with relative difficulties: • Frequency (≤ 1/16 day v. ≤ 2/day) • No atmospheric correction • Tiling and SLC-Off • Archival availability • Raw file size

Slide 51

Slide 51 text

T I L I N G A N D S L C - O F F Because both are essentially missing data problems, we have a technique: treat null bands and missing corners like clouds!

Slide 52

Slide 52 text

AT M O S P H E R I C C O R R E C T I O N • ~2,000,000 source images meet project specs • That’s a lot of LEDAPS! • 200,000 computing hours for atmospheric correction

Slide 53

Slide 53 text

L E D A P S • High-quality software routines improve L1 images substantially • Not enough documentation easily accessible for use of software/underlying algorithm • No Landsat 8 support (yet) • Compiling is a chore -> AWS Cloudformation + AMIs

Slide 54

Slide 54 text

B E F O R E + A F T E R

Slide 55

Slide 55 text

• Fewer Landsat scenes available • Scene bundles come with a lot of extra info • Sometimes scenes with coverage we want weren’t collected or archived • Solution: multiple endpoints AVA I L A B I L I T Y

Slide 56

Slide 56 text

• Individual MODIS images: 2-3mb • Individual Landsat scenes: 200-300mb S I Z E

Slide 57

Slide 57 text

T O TA L L A N D S AT 5 A N D 7 S C E N E S AVA I L A B L E F R O M U S G S T H AT M E E T P R O J E C T S P E C S : 2 , 4 0 2 , 0 5 6

Slide 58

Slide 58 text

4 0 0 T B

Slide 59

Slide 59 text

• Did we mention we don’t have an in-house data center or dedicated servers?

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

C L O U D S H A D O W S Cloud shadows don’t really matter until you’re at about 100 m/px or sharper. ! They trick a naïve algorithm because they’re darker, which usually means clearer, than other pixels. ! So you have to get into fine-tuning the band math, which has other effects too!

Slide 62

Slide 62 text

C L O U D S H A D O W S : B E F O R E

Slide 63

Slide 63 text

C L O U D S H A D O W S : A F T E R

Slide 64

Slide 64 text

W O R K C O N T I N U E S !

Slide 65

Slide 65 text

Q U E S T I O N S ? C O M M E N T S ? Chris Herwig (@hrwgc)
 [email protected] Charlie Loyd (@vruba)
 [email protected] Bruno Sánchez-Andrade Nuño
 (@brunosan) – [email protected]

Slide 66

Slide 66 text

No content