Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Inspector - Shape + Address consensus

Building Inspector - Shape + Address consensus

7aff8f547184534da3ca2e14e63a68a8?s=128

Mauricio Giraldo

October 21, 2014
Tweet

Transcript

  1. mauricio giraldo arteaga @mgiraldo NYPL Labs

  2. None
  3. bon jour

  4. my name is mauricio

  5. None
  6. research and circulating library system spanning the Bronx, Staten Island

    and Manhattan boroughs in NYC
  7. None
  8. NYPL Labs

  9. None
  10. i’m going to talk about maps

  11. The Great Map Data Extraction

  12. an adventure in three acts and a prologue and an

    epilogue
  13. prologue

  14. The Lionel Pincus and Princess Firyal Map Division

  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. 500,000+ maps 20,000+ books & atlases

  22. None
  23. None
  24. None
  25. None
  26. None
  27. year

  28. street names year

  29. use type street names year

  30. use type street names name year

  31. material use type street names name year

  32. material use type street names name class year

  33. material use type street names address name class year

  34. material use type street names address floors name class year

  35. material use type street names address floors name class year

    skylights
  36. material use type street names address floors name class year

    skylights backyards
  37. material use type street names address floors name class geo

    location year skylights backyards
  38. footprint material use type street names address floors name class

    geo location year skylights backyards
  39. footprint material use type street names address floors name class

    geo location year skylights backyards
  40. we got these for several decades since the 1800s and

    by 1950 every town in the US with a population of 2,000 had been mapped
  41. data trapped in a legacy format

  42. we want all the data!

  43. f**k yeah historical data!

  44. citysdk.waag.org/buildings

  45. citysdk.waag.org/buildings

  46. NYU Stern / Imaginaria3D

  47. NYU Stern / Imaginaria3D

  48. maps.google.com

  49. maps.google.com

  50. None
  51. data

  52. it all starts with a photograph

  53. None
  54. but it is “just a photo” but it is only

    a few clicks away
  55. None
  56. maps.nypl.org/warper

  57. None
  58. None
  59. geo-rectification or: “make it match Open Street Map”

  60. None
  61. None
  62. *this is a simulation. actual process is intensive. consult your

    mathematician before trying
  63. None
  64. None
  65. vectorization or: “draw the building shapes”

  66. None
  67. results from maps.nypl.org/warper

  68. hand-crafted, artisanal, locally-sourced data

  69. 500,000+ maps 20,000+ books & atlases

  70. 500,000+ maps 20,000+ books & atlases* *imagine how many pages

    an atlas has
  71. in the order of dozens of millions building footprints if

    counting NYC only
  72. None
  73. ~120k footprints produced in three years by staff and volunteers

  74. None
  75. this will take us a few millenia* *actual number taken

    out from a hat
  76. there has to be a better way

  77. act i: will there be polygons?

  78. requests to geo companies went unanswered

  79. None
  80. can we automate this?

  81. None
  82. ¿¡quoi!? @mgiraldo

  83. None
  84. None
  85. None
  86. None
  87. what is a building?

  88. None
  89. completely enclosed by black lines

  90. completely enclosed by black lines dashed lines are not walls

  91. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2)
  92. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2) < 3,000m2 (~27,000ft2)
  93. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
  94. process

  95. github.com/NYPL/map-vectorizer

  96. None
  97. None
  98. None
  99. None
  100. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
  101. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
  102. provide the best (possible) input image

  103. None
  104. None
  105. None
  106. None
  107. differences in resampling cubic nearest neighbor

  108. differences in resampling cubic nearest neighbor

  109. make the image a binary bitmap or: “black and white”

  110. None
  111. None
  112. polygonize or: “convert contiguous pixels to a single line shape”

  113. None
  114. ! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test

  115. ! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test

  116. ! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test

  117. ! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test

  118. None
  119. no no no no no

  120. no no no no no yes yes

  121. simplify* *for those polygons that we care about

  122. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔
  123. None
  124. None
  125. alpha shape *code basically stolen wholesale from rpubs.com/geospacedman/alphasimple

  126. ﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  127. ﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  128. ﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  129. we need a set of points

  130. None
  131. pts = spsample(polygon, n=1000, type="hexagonal")

  132. pts = spsample(polygon, n=1000, type="regular")

  133. pts = spsample(polygon, n=1000, type="random")

  134. now we alpha shaping

  135. x.as = ashape(pts@coords, alpha=2.0)

  136. x.as = ashape(pts@coords, alpha=2.0)

  137. x.as = ashape(pts@coords, alpha=2.0)

  138. there are other point reduction algorithms like Ramer-Douglas-Peucker or Whyatt

    Curve Simplification
  139. separate the buildings from the chaff

  140. completely enclosed by black lines dashed lines are not walls

    > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔ ✔ ✔
  141. None
  142. None
  143. [218, 211, 209]

  144. [218, 211, 209] paper [199, 179, 173], [179, 155, 157],

    [206, 193, 189], [199, 195, 163], [207, 204, 179], [195, 189, 154], [209, 203, 181], [255, 225, 40], [194, 198, 192], [161, 175, 190], [137, 174, 163], [166, 176, 172], [149, 156, 141] [205, 200, 186] not paper
  145. None
  146. None
  147. None
  148. this is good enough for our use case

  149. None
  150. None
  151. None
  152. ✔ ✔ ✔ ✔ ✔ completely enclosed by black lines

    dashed lines are not walls > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
  153. computer-vision for attribute recognition *bonus quest

  154. None
  155. None
  156. None
  157. 66,056 footprints produced in one day for an 1859 atlas

    of Manhattan
  158. caveats: ! adjacency not enforced false positives/negatives buildings may also

    overlap
  159. act ii: the vectorizer needs to prove itself

  160. None
  161. None
  162. None
  163. None
  164. multiple inspections for each item and let consensus surface on

    its own
  165. footprint validation or: “tell us what the computer got right

    or wrong“
  166. are people willing to spend time checking building footprints? insurance

    atlases are not exactly the coolest type of maps
  167. None
  168. buildinginspector.nypl.org

  169. github.com/NYPL/building-inspector

  170. None
  171. None
  172. None
  173. None
  174. about a month later…

  175. None
  176. None
  177. None
  178. None
  179. 420k+ flags* 70k+ unique polygons ! consensus: ~84% YES, 7%

    FIX, 9% NO *a “flag” is a YES/NO/FIX by one person for a given polygon
  180. seems people are willing after all… we — our contributors

  181. seems people are willing after all… we — our contributors

  182. act iii: the return of the inspector

  183. footprint material use type street names address floors name class

    geo location year skylights backyards
  184. divide and conquer

  185. footprint material use type street names address floors name class

    geo location year skylights backyards
  186. three new tasks for now… we really want it all!

  187. None
  188. footprint material use type street names address floors name class

    geo location year skylights backyards
  189. check

  190. check YES

  191. check YES address color

  192. check YES FIX address color

  193. check YES FIX address color fix

  194. check YES FIX address color fix

  195. check YES FIX address color fix *footprints marked as “NO”

    go to building heaven
  196. check YES FIX address color fix *footprints marked as “NO”

    go to building heaven
  197. fix

  198. fix

  199. address

  200. address

  201. classify color

  202. classify color

  203. 865k+ flags

  204. check YES FIX address color fix

  205. check YES FIX address color fix for 80k+ unique polygons

    77k+ 5k+ 42k+ 26k+
  206. epilogue

  207. address and shape consensus or: how to determine what the

    right building footprint and address looks like?
  208. None
  209. None
  210. all points are useful inclusiveness above all

  211. None
  212. None
  213. None
  214. None
  215. None
  216. None
  217. None
  218. None
  219. DBSCAN for the win citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980

  220. bit.ly/nypl-consensus

  221. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  222. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  223. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  224. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  225. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  226. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  227. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  228. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

  229. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

    ﹡ + +
  230. 246 246 246 414 246 414 414 246 414 414

    414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
  231. 246 246 246 414 246 414 414 246 414 414

    414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
  232. 246 414 + +

  233. None
  234. None
  235. DBSCAN for shapes also!

  236. None
  237. None
  238. None
  239. None
  240. None
  241. None
  242. all points are still useful

  243. None
  244. ﹡ ﹡

  245. ﹡ ﹡ ﹡

  246. ﹡ ﹡ ﹡ ﹡

  247. ﹡ ﹡ ﹡

  248. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

    ﹡ ﹡ ﹡ ﹡ ﹡
  249. ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡

    ﹡ ﹡ ﹡ ﹡ ﹡
  250. + + + + + + +

  251. + + + + + + +

  252. + + + + + + +

  253. + + + + + + +

  254. + + + + + + +

  255. + + + + + + +

  256. + + + + + + +

  257. + + + + + + +

  258. None
  259. None
  260. None
  261. None
  262. None
  263. None
  264. None
  265. resulting data available via an API

  266. resulting data available via an API in 100% recyclable GeoJSON

  267. None
  268. photographing

  269. photographing ↓

  270. photographing ↓ geo-rectification

  271. photographing ↓ geo-rectification ↓

  272. photographing ↓ geo-rectification ↓ vectorization

  273. photographing ↓ geo-rectification ↓ vectorization ↓

  274. photographing ↓ geo-rectification ↓ vectorization ↓ inspection

  275. photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓

  276. photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /

    fix / color / address
  277. photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /

    fix / color / address ↓
  278. photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /

    fix / color / address ↓ consensus
  279. photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /

    fix / color / address ↓ consensus ↓
  280. photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /

    fix / color / address ↓ consensus ↓ data release
  281. not the end

  282. None
  283. None
  284. None
  285. ¡merci beaucoup! mauricio giraldo arteaga @mgiraldo NYPL Labs slides at:

    bit.ly/nypl-ehess images from: NYPL digital collections - Wikimedia Commons
 Christopher Cannon - Flickr user wallyg - Giphy