Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building Inspector - Shape + Address consensus
Search
Mauricio Giraldo
October 21, 2014
Technology
0
220
Building Inspector - Shape + Address consensus
Mauricio Giraldo
October 21, 2014
Tweet
Share
More Decks by Mauricio Giraldo
See All by Mauricio Giraldo
Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales
mgiraldo
0
370
From food to buildings and beyond: what happens when a library opens its digital collections to human-computer collaboration
mgiraldo
2
190
Aprendizajes de trabajo en bibliotecas digitales
mgiraldo
0
160
building inspector
mgiraldo
0
100
Talk at the NYU ITP Data Art class / Spring 2017
mgiraldo
0
170
Humanidades Digitales en los laboratorios de la Biblioteca Pública de New York
mgiraldo
0
110
FOSS4G Nara/Tokyo
mgiraldo
0
2k
Human-Computer Collaboration at NYPL Labs
mgiraldo
2
480
NYPL Labs @ Eyeo Festival 2015
mgiraldo
1
750
Other Decks in Technology
See All in Technology
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
AI Agent Agentic Workflow の可観測性 / Observability of AI Agent Agentic Workflow
yuzujoe
4
2.1k
AI アクセラレータチップ AWS Trainium/Inferentia に 今こそ入門
yoshimi0227
1
260
モノタロウ x クリエーションラインで実現する チームトポロジーにおける プラットフォームチーム・ ストリームアラインドチームの 効果的なコラボレーション
creationline
0
980
手軽に作れる電卓を作って イベントソーシングに親しもう CQRS+ESカンファレンス2026
akinoriakatsuka
0
450
歴史から学ぶ、Goのメモリ管理基礎
logica0419
14
2.9k
Qiita Bash アドカレ LT #1
okaru
0
190
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
2
930
田舎で20年スクラム(後編):一個人が企業で長期戦アジャイルに挑む意味
chinmo
1
1.6k
善意の活動は、なぜ続かなくなるのか ーふりかえりが"構造を変える判断"になった半年間ー
matsukurou
0
580
ソフトとハード両方いけるデータ人材の育て方
waiwai2111
1
480
WebDriver BiDi 2025年のふりかえり
yotahada3
1
240
Featured
See All Featured
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
700
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
110
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
340
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
58
41k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
120
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
150
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
The untapped power of vector embeddings
frankvandijk
1
1.5k
Transcript
mauricio giraldo arteaga @mgiraldo NYPL Labs
None
bon jour
my name is mauricio
None
research and circulating library system spanning the Bronx, Staten Island
and Manhattan boroughs in NYC
None
NYPL Labs
None
i’m going to talk about maps
The Great Map Data Extraction
an adventure in three acts and a prologue and an
epilogue
prologue
The Lionel Pincus and Princess Firyal Map Division
None
None
None
None
None
None
500,000+ maps 20,000+ books & atlases
None
None
None
None
None
year
street names year
use type street names year
use type street names name year
material use type street names name year
material use type street names name class year
material use type street names address name class year
material use type street names address floors name class year
material use type street names address floors name class year
skylights
material use type street names address floors name class year
skylights backyards
material use type street names address floors name class geo
location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
we got these for several decades since the 1800s and
by 1950 every town in the US with a population of 2,000 had been mapped
data trapped in a legacy format
we want all the data!
f**k yeah historical data!
citysdk.waag.org/buildings
citysdk.waag.org/buildings
NYU Stern / Imaginaria3D
NYU Stern / Imaginaria3D
maps.google.com
maps.google.com
None
data
it all starts with a photograph
None
but it is “just a photo” but it is only
a few clicks away
None
maps.nypl.org/warper
None
None
geo-rectification or: “make it match Open Street Map”
None
None
*this is a simulation. actual process is intensive. consult your
mathematician before trying
None
None
vectorization or: “draw the building shapes”
None
results from maps.nypl.org/warper
hand-crafted, artisanal, locally-sourced data
500,000+ maps 20,000+ books & atlases
500,000+ maps 20,000+ books & atlases* *imagine how many pages
an atlas has
in the order of dozens of millions building footprints if
counting NYC only
None
~120k footprints produced in three years by staff and volunteers
None
this will take us a few millenia* *actual number taken
out from a hat
there has to be a better way
act i: will there be polygons?
requests to geo companies went unanswered
None
can we automate this?
None
¿¡quoi!? @mgiraldo
None
None
None
None
what is a building?
None
completely enclosed by black lines
completely enclosed by black lines dashed lines are not walls
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
process
github.com/NYPL/map-vectorizer
None
None
None
None
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
provide the best (possible) input image
None
None
None
None
differences in resampling cubic nearest neighbor
differences in resampling cubic nearest neighbor
make the image a binary bitmap or: “black and white”
None
None
polygonize or: “convert contiguous pixels to a single line shape”
None
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
None
no no no no no
no no no no no yes yes
simplify* *for those polygons that we care about
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔
None
None
alpha shape *code basically stolen wholesale from rpubs.com/geospacedman/alphasimple
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
we need a set of points
None
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="random")
now we alpha shaping
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
there are other point reduction algorithms like Ramer-Douglas-Peucker or Whyatt
Curve Simplification
separate the buildings from the chaff
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔ ✔ ✔
None
None
[218, 211, 209]
[218, 211, 209] paper [199, 179, 173], [179, 155, 157],
[206, 193, 189], [199, 195, 163], [207, 204, 179], [195, 189, 154], [209, 203, 181], [255, 225, 40], [194, 198, 192], [161, 175, 190], [137, 174, 163], [166, 176, 172], [149, 156, 141] [205, 200, 186] not paper
None
None
None
this is good enough for our use case
None
None
None
✔ ✔ ✔ ✔ ✔ completely enclosed by black lines
dashed lines are not walls > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
computer-vision for attribute recognition *bonus quest
None
None
None
66,056 footprints produced in one day for an 1859 atlas
of Manhattan
caveats: ! adjacency not enforced false positives/negatives buildings may also
overlap
act ii: the vectorizer needs to prove itself
None
None
None
None
multiple inspections for each item and let consensus surface on
its own
footprint validation or: “tell us what the computer got right
or wrong“
are people willing to spend time checking building footprints? insurance
atlases are not exactly the coolest type of maps
None
buildinginspector.nypl.org
github.com/NYPL/building-inspector
None
None
None
None
about a month later…
None
None
None
None
420k+ flags* 70k+ unique polygons ! consensus: ~84% YES, 7%
FIX, 9% NO *a “flag” is a YES/NO/FIX by one person for a given polygon
seems people are willing after all… we — our contributors
seems people are willing after all… we — our contributors
act iii: the return of the inspector
footprint material use type street names address floors name class
geo location year skylights backyards
divide and conquer
footprint material use type street names address floors name class
geo location year skylights backyards
three new tasks for now… we really want it all!
None
footprint material use type street names address floors name class
geo location year skylights backyards
check
check YES
check YES address color
check YES FIX address color
check YES FIX address color fix
check YES FIX address color fix
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
fix
fix
address
address
classify color
classify color
865k+ flags
check YES FIX address color fix
check YES FIX address color fix for 80k+ unique polygons
77k+ 5k+ 42k+ 26k+
epilogue
address and shape consensus or: how to determine what the
right building footprint and address looks like?
None
None
all points are useful inclusiveness above all
None
None
None
None
None
None
None
None
DBSCAN for the win citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980
bit.ly/nypl-consensus
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 414 + +
None
None
DBSCAN for shapes also!
None
None
None
None
None
None
all points are still useful
None
﹡
﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
None
None
None
None
None
None
None
resulting data available via an API
resulting data available via an API in 100% recyclable GeoJSON
None
photographing
photographing ↓
photographing ↓ geo-rectification
photographing ↓ geo-rectification ↓
photographing ↓ geo-rectification ↓ vectorization
photographing ↓ geo-rectification ↓ vectorization ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓ data release
not the end
None
None
None
¡merci beaucoup! mauricio giraldo arteaga @mgiraldo NYPL Labs slides at:
bit.ly/nypl-ehess images from: NYPL digital collections - Wikimedia Commons Christopher Cannon - Flickr user wallyg - Giphy