Slide 1

Slide 1 text

WHY$WE$CHOSE$MONGODB$TO$$ PUT$BIG2DATA$‘ON$THE$MAP’$ JUNE$2012$ $ $ $ $ $ $ $ @nknize$ +Nicholas$Knize$

Slide 2

Slide 2 text

“The%3D%UDOP%allows%near%real%2me%visibility%of%all%SOUTHCOM%Directorates%informa2on%in%one% loca2on…this%capability%allows%for%unprecedented%situa2onal%awareness%and%informa2on%sharing”% % % % % % % % % % % % %EGen.%Doug%Frasier% TST PRODUCTS ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 3

Slide 3 text

•  Expose$enterprise$data$in$a$geo2temporal$user$defined$ environment$ •  Provide$a$flexible$and$scalable$spaUal$indexing$framework$ for$heterogeneous$data$$ •  Visualize$spaUally$referenced$data$on$3D$globe$&$2D$maps$ •  Manage$real2Ume$data$feeds$and$mobile$messaging$$ •  View$data$over$geo2recUfied$imagery$with$3D$terrain$ •  Support$mission$planning$and$simulaUon$ •  Provide$real2Ume$collaboraUon$and$sharing$ ISPATIAL OVERVIEW ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 4

Slide 4 text

•  Horizontally$scalable$–$Large$volume$/$elasUc$ •  VerUcally$scalable$–$Heterogeneous$data$types$(“Data$Stack”)$ •  Smartly$Distributed$–$Reduce$the$distance$bits$must$travel$ •  Fault$Tolerant$–$ReplicaUon$Strategy$and$Consistency$model$ •  High$Availability$–$Node$recovery$ •  Fast$–$Reads$or$writes$(can’t$always$have$both)$ BIG DATA STORAGE CHARACTERISTICS ACCOMPLISHING$THE$IMPOSSIBLE$ $$$$Desired$Data$Store$CharacterisUc$for$‘Big$Data’$

Slide 5

Slide 5 text

•  Cassandra$ –  Nice$Bring$Your$Own$Index$(BYOI)$design$ –  …$but$Java,$Java,$Java…$Memory$management$can$be$an$issue$ –  Adding$new$nodes$can$be$a$pain$(Token$Changes,$nodetool)$ –  Key2Value$store…good$for$simple$data$models$ •  Hbase$ –  Nice$BigTable$model$ –  Theory$grounded$heavily$in$C.A.P,$inflexible$trade2offs$ –  Complicated$setup$and$maintenance$$$ •  CouchDB$ –  Provides$some$GeoSpaUal$funcUonality$ –  HEAVILY$dependent$on$Map2Reduce$model$(complicated$design)$ –  Erlang$based$–$poor$mulU2threaded$heap$management$ $ NOSQL OPTIONS ACCOMPLISHING$THE$IMPOSSIBLE$ Subset$of$Evaluated$NoSQL$OpUons$

Slide 6

Slide 6 text

$$$$Why$MongoDB$for$Thermopylae?$ •  Documents$based$on$Javascript$Object$NotaUon$(JSON)$–$A$GEOJSON$ match$made$in$heaven!$ $ •  C++$2$No$Garbage$CollecUon$Overhead!$$Efficient$memory$management$ design$reduces$disk$swapping$and$paging$ •  Disk$storage$is$memory$mapped,$enabling$fast$swapping$when$necessary$$ $ •  Built$in$auto2failover$with$replica$sets$and$fast$recovery$with$journaling$ •  Tunable$Consistency$–$Consistency$defined$at$applicaUon$layer$ •  Schema$Flexible$–$friendly$properUes$of$SQL$enable$easy$port$ •  Provided$iniUal$spaUal$indexing$support$–$Point$based$limited!$ $ WHY TST LIKES MONGODB ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 7

Slide 7 text

MONGODB SPATIAL INDEXER ACCOMPLISHING$THE$IMPOSSIBLE$ $$$...$The$SpaUal$Indexer$wasn’t$quite$right$ •  MongoDB$(like$nearly$all$relaUonal$DBs)$uses$a$b2Tree$$ –  Data$structure$for$storing$sorted$data$in$log$Ume$ –  Great$for$indexing$numerical$and$text$documents$(anribute$data)$ –  Cannot$store$mulU2dimension$(>2)$data$–$NOT$COMPLEX$GEOMETRY$ FRIENDLY$

Slide 8

Slide 8 text

DIMENSIONALITY REDUCTION ACCOMPLISHING$THE$IMPOSSIBLE$ How$does$MongoDB$solve$the$dimensionality$problem?$$ •  Space$Filling$(Z)$Curve$$ –  A$conUnuous$line$that$ intersects$every$point$in$a$ two2dimensional$plane$ •  Use$Geohash$to$ represent$lat/lon$values$ –  Interleave$the$bits$of$a$ lat/long$pair$ –  Base32$encode$the$result$

Slide 9

Slide 9 text

GEOHASH BTREE ISSUES ACCOMPLISHING$THE$IMPOSSIBLE$ •  Neighbors$aren’t$so$ close!$ –  Neighboring$points$on$the$ Geoid$may$end$up$on$ opposite$ends$of$the$ plane$ –  Impacts$search$efficiency$ •  What$about$Geometry?$ –  Doesn’t$support$>$2D$ –  Mongo$uses$MulU2 LocaUon$documents$ which$really$just$indexes$ mulUple$points$that$link$ back$to$a$single$document$ $$$$Issues$with$the$Geohash$b2Tree$approach$

Slide 10

Slide 10 text

Case 3: Case 4: Multi-Location Document (aka. Polygon) Search Polygon Case 1: Case 2: Success! Success! Fail! Fail! Mongo$MulU2locaUon$Document$Clipping$Issues$ ($within$search$doesn’t$always$work$w/$mulU2locaUon)$ MULTI-LOCATION CLIPPING ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 11

Slide 11 text

•  Constrain$the$system$to$single$point$searches$ –  MulU2dimension$support$will$be$exponenUally$complex$(won’t$scale)$ $ $ •  Interpolate$points$along$the$edge$of$the$shape$ –  MulU2dimension$support$will$be$exponenUally$complex$(won’t$scale)$ •  Customize$the$spaUal$indexer$ –  Selected$approach$ SOLUTIONS TO GEOHASH PROBLEM ACCOMPLISHING$THE$IMPOSSIBLE$ $$$$PotenUal$SoluUons$

Slide 12

Slide 12 text

CUSTOM TUNED SPATIAL INDEXER ACCOMPLISHING$THE$IMPOSSIBLE$ Thermopylae$Custom$Tuned$MongoDB$$$$$$for$Geo$ TST$Leverage’s$Gunman’s$1984$Research$in$R/R*$Trees$ •  R2Trees$organize$any2dimensional$data$by$represenUng$ the$data$as$a$minimum$bounding$box.$$ •  Each$node$bounds$its$children.$$A$node$can$have$many$ objects$in$it$(max:$m$$$min:$$ceil(m/2)%)$ •  Splits$and$merges$opUmized$by$minimizing$overlaps$ •  The$leaves$point$to$the$actual$objects$(stored$on$disk$ probably)$ •  Height$balanced$–$search$is$always$O(log$n)$$

Slide 13

Slide 13 text

SpaUal$Indexing$at$Scale$with$R2Trees$ RTREE THEORY ACCOMPLISHING$THE$IMPOSSIBLE$ SpaUal$data$represented$as$minimum$bounding$rectangles$(22dimension),$ cubes$(32dimension),$hexadecant$(42dimension)! ! Index$represented$as:$$$$$where:$ ! !I$=$(I0 ,$I1 ,$…$In )$$$:$$n$=$number$of$dimensions$ !Each$I$is$a$set$in$the$form$of$[min,max]$describing$MBR$range$along$a$dimension$ $ $! !

Slide 14

Slide 14 text

R*-Tree Spatial Index Example •  Sample insertion result for 4th order tree •  Objectives: 1.  Minimize area 2.  Minimize overlaps 3.  Minimize margins 4.  Maximize inner node utilization a b c d e f g h i j k l m n o p R*-TREE INDEX OBJECTIVES ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 15

Slide 15 text

Insert •  Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. –  Which leaf to insert into? –  How to split a node? R*-TREE INSERT EXAMPLE ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 16

Slide 16 text

Insert—Leaf Selection •  Follow a path from root to leaf. •  At each node move into subtree whose MBR area increases least with addition of new rectangle. m n o p

Slide 17

Slide 17 text

Insert—Leaf Selection •  Insert into m. m

Slide 18

Slide 18 text

Insert—Leaf Selection •  Insert into n. n

Slide 19

Slide 19 text

Insert—Leaf Selection •  Insert into o. o

Slide 20

Slide 20 text

Insert—Leaf Selection •  Insert into p. p

Slide 21

Slide 21 text

m n o p a! a! a! x a b c d e f g h i j k l m n o p Query •  Start at root •  Find all overlapping MBRs •  Search subtrees recursively

Slide 22

Slide 22 text

Query •  Search m. m n o p a! a! x x a b c d e f g h i j k l m n o p a! a! a b c d e g

Slide 23

Slide 23 text

R*2Tree$Leverages$B2Tree$Base$Data$Structures$(buckets)$ R*-TREE MONGODB IMPLEMENTATION ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 24

Slide 24 text

Geo2Sharding$–$(in%work)$ $ $Scalable$Distributed$R*$Tree$(SD2r*Tree)$ Balanced$binary$tree,$ distributed$on$a$set$of$ servers:$ $ •  Each$internal$node$has$ exactly$two$children$ $ •  Each$leaf$node$stores$a$ subset$of$the$indexed$ dataset$ $ •  At$each$node,$the$height$ of$the$subtrees$differ$by$ at$most$one$ $ •  Each$server$stores$one$ data$node$and$one$ “rouUng”$node$ GEO-SHARDING ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 25

Slide 25 text

d0! d1! r1! d0! Data!Node! Spa.al!! Coverage! a! a! b! c! c! b! d0! r1! a! b! c! c! b! d2! d1! e! d! d! r2! e! SD2r*Tree$Data$Structure$IllustraUon$$ •  di$ =$Data$Node$(Chunk)$ •  ri$ =$Coverage$Node$ $ Leveraged$work$from$Litwin,$Mouza,$Rigaux$2007$ SD-r*Tree DATA STRUCTURE ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 26

Slide 26 text

SD2r*Tree$Structure$DistribuUon$ d0! r1! a! b! c! c! b! d2! d1! e! d! d! r2! e! r2! d1! d2! d0! r1! GeoShard!2! GeoShard!3! GeoShard!1! mongos! SD-r*TREE STRUCTURE DISTRIBUTION ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 27

Slide 27 text

GeoSharding$AlternaUve$–$3D$/$4D$Hilbert$Scanning$Order$ GEO-SHARDING ALTERNATIVE ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 28

Slide 28 text

Next$Steps:$Beyond$42Dimensions$2$X2Tree$ (Berchtold,$Keim,$Kriegel$–$1996)$$ Normal Internal Nodes Supernodes Data Nodes •  Avoid$MBR$overlaps$ $ •  Avoid$node$splits$(main$cause$for$high$overlap)$ $ •  Introduce$new$node$structure:$Supernodes!–$Large$Directory$nodes$of$variable$size$ BEYOND 4-DIMENSIONS ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 29

Slide 29 text

X-TREE PERFORMANCE ACCOMPLISHING$THE$IMPOSSIBLE$ X2Tree$Performance$Results$ (Berchtold,$Keim,$Kriegel$–$1996)$$

Slide 30

Slide 30 text

T2Sciences$Custom$Tuned$SpaUal$Indexer$ •  OpUmized$SpaUal$Search$–$Finds$intersecUng$MBR$and$recurses$into$ those$nodes$ $ •  OpUmized$SpaUal$Inserts$–$Uses$the$Hilbert$Value$of$MBR$centroid$to$ guide$search$$ –  28%$reducUon$in$number$of$nodes$touched$ •  OpUmize$Deletes$–$Leverages$R*$split/merge$approach$for$rebalancing$ tree$when$nodes$become$over/under2full$ •  Low$maintenance$–$Leverages$MongoDB’s$automaUc$data$compacUon$ and$parUUoning$ CONCLUSION ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 31

Slide 31 text

Example$Use$Case$–$OSINT$(Foursquare$Data)$ •  Sample Foursquare data set mashed with Government Intel Data •  1 million Geo Document test (points and polys) •  4 server replica set •  ~350ms query response •  ~300% improvement over PostGIS EXAMPLE ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 32

Slide 32 text

Community$Support$ •  Thermopylae$contributes$fixes$to$the$codebase$ –  hnp://github.com/mongodb$ •  TST$will$work$with$10gen$to$fold$into$the$baseline$ $ •  AcUve$developer$collaboraUon$ –  IRC:$#mongodb$$$freenode.net$ $ FIND US ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 33

Slide 33 text

$ THANK$YOU$ QuesUons?$ $ Nicholas$Knize$ [email protected]$ THANK YOU ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 34

Slide 34 text

$ Backup$ $

Slide 35

Slide 35 text

Thermopylae$Sciences$&$Technology$–$Who$are$we?$ •  Advanced$technology$w/$160+$employees$ •  Core$customers$in$naUonal$security,$venues$and$ events,$military$and$police,$and$city$planning$ •  Partnered$with$Google$and$imagery$providers$ •  Long$term$relaUonship$focused$–$TS/SCI$Staff$ $$$$$$$$TST$+$10gen$+$Google$=$Game2changing$approach$ WHO ARE THESE GUYS? ACCOMPLISHING$THE$IMPOSSIBLE$ ENTERPRISE PARTNER

Slide 36

Slide 36 text

Key$Customers$2$Government $$ •  US$Dept$of$State$Bureau$of$DiplomaUc$Security$ –  Build$and$support$30$TB$Google$Earth$Globe$with$mulU2 terabytes$of$individual$globes$sent$to$embassies$throughout$ the$world.$$Integrated$Google$Earth$and$iSpaUal$framework.$ •  US$Army$Intelligence$Security$Command$ –  Provide$experUse$in$managing$technology$integraUon$–$ prime$contractor$providing$operaUons,$intelligence,$and$IT$ support$worldwide.$$Partners$include$IBM,$Lockheed$MarUn,$ Google,$MIT,$Carnegie$Mellon.$$Integrated$Google$Earth$and$ iSpaUal$framework.$ •  US$Southern$Command$ –  Coordinate$Intelligence$management$systems$spaUal$data$ collecUon,$indexing,$and$distribuUon.$$Integrated$Google$ Earth,$iSpaUal,$and$iHarvest.$ –  Index$large$volume$imagery$and$expose$it$for$different$ services$(Air$Force,$Navy,$Army,$Marines,$Coast$Guard)$ $ GOVERNMENT CUSTOMERS ACCOMPLISHING$THE$IMPOSSIBLE$

Slide 37

Slide 37 text

COMMERCIAL CUSTOMERS ACCOMPLISHING$THE$IMPOSSIBLE$ Key$Customers$2$Commercial$$ Cleveland! Cavaliers! USGIF! Las!Vegas! Motor!Speedway! Bal.more! Grand!Prix! iSpaUal$framework$serves$thousands$of$mobile$devices$

Slide 38

Slide 38 text

•  Banle$tested,$Banle$proven$–$RelaUonal$Model$dates$back$to$1969$ •  Plethora$of$RelaUonal$Experience$–$Full2Time$DBAs,$Training$&$Certs$ •  Company$Backed$–$Safe$choice$for$business$/$mission$criUcal$systems$ •  Fewer$AlternaUves$–$Non2relaUonal$is$a$5$year$old$know2it2all$ •  Mostly$Standardized$–$SQL$ISO/IEC$9075$Accepted$Standard$ •  TheoreUcally$Sound$–$Based$on$100$years$of$First2Order$Logic$theory$ RDBMS STRENGTHS ACCOMPLISHING$THE$IMPOSSIBLE$ $$$$RDBMS$Strengths$

Slide 39

Slide 39 text

•  Atomicity$–$If$one$fails,$we$all$fail!$$$$ •  Consistency$–$All$data$constraints$(normalized$schema)$cascades,$ triggers,$etc.$must$be$met$before$transacUon$succeeds.$(LATENCY)$ •  IsolaUon$–$SynchronizaUon,$no$operaUon$can$see$a$transacUon$that$ hasn’t$yet$completed$ •  Durability$–$Once$a$transacUon$is$commined$it$will$remain$commined$ even$in$power$loss$crashes$or$other$hardware$errors.$ ACID THEORY ACCOMPLISHING$THE$IMPOSSIBLE$ $$$$RelaUonal$on$ACID$

Slide 40

Slide 40 text

$ •  Writes$are$accomplished$using$in2place$update$on$disk$(crazy$disk$ swapping$rate)$ $ •  Table$joins,$updates,$and$large$queries$quickly$outgrow$disk$cache$ requiring$many$random$disk$seeks$(performance$bonleneck!!)$ •  Strict$consistency$requirements$impacts$scalability$(e.g.$Postgres$ uses$MulUversion$Consistency,$commonly$resulUng$in$stale$data)$ •  As$data$centers$grow,$the$probability$of$node$failure$(due$to$Disk$ Writes,$Consistency,$and$Atomic$operaUons)$increases$ $ RDBMS WEAKNESSES ACCOMPLISHING$THE$IMPOSSIBLE$ RDBMS$Weaknesses$

Slide 41

Slide 41 text

Why$NoSQL?!?$ (CAVEATS)$ •  Use$the$right$tool$for$the$job$ WHY NOSQL? ACCOMPLISHING$THE$IMPOSSIBLE$ •  Understand$your$needs!$ •  RelaUonal$is$not$always$bad$ Engineering!with!Constraints! Unbounded!Engineering!