Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building Inspector - Shape + Address consensus
Search
Mauricio Giraldo
October 21, 2014
Technology
0
220
Building Inspector - Shape + Address consensus
Mauricio Giraldo
October 21, 2014
Tweet
Share
More Decks by Mauricio Giraldo
See All by Mauricio Giraldo
Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales
mgiraldo
0
360
From food to buildings and beyond: what happens when a library opens its digital collections to human-computer collaboration
mgiraldo
2
190
Aprendizajes de trabajo en bibliotecas digitales
mgiraldo
0
160
building inspector
mgiraldo
0
97
Talk at the NYU ITP Data Art class / Spring 2017
mgiraldo
0
170
Humanidades Digitales en los laboratorios de la Biblioteca Pública de New York
mgiraldo
0
110
FOSS4G Nara/Tokyo
mgiraldo
0
2k
Human-Computer Collaboration at NYPL Labs
mgiraldo
2
480
NYPL Labs @ Eyeo Festival 2015
mgiraldo
1
740
Other Decks in Technology
See All in Technology
MapKitとオープンデータで実現する地図情報の拡張と可視化
zozotech
PRO
1
140
Reinforcement Fine-tuning 基礎〜実践まで
ch6noota
0
180
エンジニアリングをやめたくないので問い続ける
estie
2
1.2k
AWS Trainium3 をちょっと身近に感じたい
bigmuramura
1
140
因果AIへの招待
sshimizu2006
0
970
コミューンのデータ分析AIエージェント「Community Sage」の紹介
fufufukakaka
0
490
プロンプトやエージェントを自動的に作る方法
shibuiwilliam
2
1.7k
文字列の並び順 / Unicode Collation
tmtms
3
570
多様なデジタルアイデンティティを攻撃からどうやって守るのか / 20251212
ayokura
0
430
学習データって増やせばいいんですか?
ftakahashi
2
320
ガバメントクラウド利用システムのライフサイクルについて
techniczna
0
190
評価駆動開発で不確実性を制御する - MLflow 3が支えるエージェント開発
databricksjapan
1
160
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Become a Pro
speakerdeck
PRO
31
5.7k
The Pragmatic Product Professional
lauravandoore
37
7.1k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
970
Six Lessons from altMBA
skipperchong
29
4.1k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Documentation Writing (for coders)
carmenintech
76
5.2k
Transcript
mauricio giraldo arteaga @mgiraldo NYPL Labs
None
bon jour
my name is mauricio
None
research and circulating library system spanning the Bronx, Staten Island
and Manhattan boroughs in NYC
None
NYPL Labs
None
i’m going to talk about maps
The Great Map Data Extraction
an adventure in three acts and a prologue and an
epilogue
prologue
The Lionel Pincus and Princess Firyal Map Division
None
None
None
None
None
None
500,000+ maps 20,000+ books & atlases
None
None
None
None
None
year
street names year
use type street names year
use type street names name year
material use type street names name year
material use type street names name class year
material use type street names address name class year
material use type street names address floors name class year
material use type street names address floors name class year
skylights
material use type street names address floors name class year
skylights backyards
material use type street names address floors name class geo
location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
we got these for several decades since the 1800s and
by 1950 every town in the US with a population of 2,000 had been mapped
data trapped in a legacy format
we want all the data!
f**k yeah historical data!
citysdk.waag.org/buildings
citysdk.waag.org/buildings
NYU Stern / Imaginaria3D
NYU Stern / Imaginaria3D
maps.google.com
maps.google.com
None
data
it all starts with a photograph
None
but it is “just a photo” but it is only
a few clicks away
None
maps.nypl.org/warper
None
None
geo-rectification or: “make it match Open Street Map”
None
None
*this is a simulation. actual process is intensive. consult your
mathematician before trying
None
None
vectorization or: “draw the building shapes”
None
results from maps.nypl.org/warper
hand-crafted, artisanal, locally-sourced data
500,000+ maps 20,000+ books & atlases
500,000+ maps 20,000+ books & atlases* *imagine how many pages
an atlas has
in the order of dozens of millions building footprints if
counting NYC only
None
~120k footprints produced in three years by staff and volunteers
None
this will take us a few millenia* *actual number taken
out from a hat
there has to be a better way
act i: will there be polygons?
requests to geo companies went unanswered
None
can we automate this?
None
¿¡quoi!? @mgiraldo
None
None
None
None
what is a building?
None
completely enclosed by black lines
completely enclosed by black lines dashed lines are not walls
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
process
github.com/NYPL/map-vectorizer
None
None
None
None
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
provide the best (possible) input image
None
None
None
None
differences in resampling cubic nearest neighbor
differences in resampling cubic nearest neighbor
make the image a binary bitmap or: “black and white”
None
None
polygonize or: “convert contiguous pixels to a single line shape”
None
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
None
no no no no no
no no no no no yes yes
simplify* *for those polygons that we care about
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔
None
None
alpha shape *code basically stolen wholesale from rpubs.com/geospacedman/alphasimple
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
we need a set of points
None
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="random")
now we alpha shaping
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
there are other point reduction algorithms like Ramer-Douglas-Peucker or Whyatt
Curve Simplification
separate the buildings from the chaff
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔ ✔ ✔
None
None
[218, 211, 209]
[218, 211, 209] paper [199, 179, 173], [179, 155, 157],
[206, 193, 189], [199, 195, 163], [207, 204, 179], [195, 189, 154], [209, 203, 181], [255, 225, 40], [194, 198, 192], [161, 175, 190], [137, 174, 163], [166, 176, 172], [149, 156, 141] [205, 200, 186] not paper
None
None
None
this is good enough for our use case
None
None
None
✔ ✔ ✔ ✔ ✔ completely enclosed by black lines
dashed lines are not walls > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
computer-vision for attribute recognition *bonus quest
None
None
None
66,056 footprints produced in one day for an 1859 atlas
of Manhattan
caveats: ! adjacency not enforced false positives/negatives buildings may also
overlap
act ii: the vectorizer needs to prove itself
None
None
None
None
multiple inspections for each item and let consensus surface on
its own
footprint validation or: “tell us what the computer got right
or wrong“
are people willing to spend time checking building footprints? insurance
atlases are not exactly the coolest type of maps
None
buildinginspector.nypl.org
github.com/NYPL/building-inspector
None
None
None
None
about a month later…
None
None
None
None
420k+ flags* 70k+ unique polygons ! consensus: ~84% YES, 7%
FIX, 9% NO *a “flag” is a YES/NO/FIX by one person for a given polygon
seems people are willing after all… we — our contributors
seems people are willing after all… we — our contributors
act iii: the return of the inspector
footprint material use type street names address floors name class
geo location year skylights backyards
divide and conquer
footprint material use type street names address floors name class
geo location year skylights backyards
three new tasks for now… we really want it all!
None
footprint material use type street names address floors name class
geo location year skylights backyards
check
check YES
check YES address color
check YES FIX address color
check YES FIX address color fix
check YES FIX address color fix
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
fix
fix
address
address
classify color
classify color
865k+ flags
check YES FIX address color fix
check YES FIX address color fix for 80k+ unique polygons
77k+ 5k+ 42k+ 26k+
epilogue
address and shape consensus or: how to determine what the
right building footprint and address looks like?
None
None
all points are useful inclusiveness above all
None
None
None
None
None
None
None
None
DBSCAN for the win citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980
bit.ly/nypl-consensus
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 414 + +
None
None
DBSCAN for shapes also!
None
None
None
None
None
None
all points are still useful
None
﹡
﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
None
None
None
None
None
None
None
resulting data available via an API
resulting data available via an API in 100% recyclable GeoJSON
None
photographing
photographing ↓
photographing ↓ geo-rectification
photographing ↓ geo-rectification ↓
photographing ↓ geo-rectification ↓ vectorization
photographing ↓ geo-rectification ↓ vectorization ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓ data release
not the end
None
None
None
¡merci beaucoup! mauricio giraldo arteaga @mgiraldo NYPL Labs slides at:
bit.ly/nypl-ehess images from: NYPL digital collections - Wikimedia Commons Christopher Cannon - Flickr user wallyg - Giphy