Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building Inspector - Shape + Address consensus
Search
Mauricio Giraldo
October 21, 2014
Technology
0
200
Building Inspector - Shape + Address consensus
Mauricio Giraldo
October 21, 2014
Tweet
Share
More Decks by Mauricio Giraldo
See All by Mauricio Giraldo
Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales
mgiraldo
0
310
From food to buildings and beyond: what happens when a library opens its digital collections to human-computer collaboration
mgiraldo
2
150
Aprendizajes de trabajo en bibliotecas digitales
mgiraldo
0
130
building inspector
mgiraldo
0
83
Talk at the NYU ITP Data Art class / Spring 2017
mgiraldo
0
140
Humanidades Digitales en los laboratorios de la Biblioteca Pública de New York
mgiraldo
0
72
FOSS4G Nara/Tokyo
mgiraldo
0
1.7k
Human-Computer Collaboration at NYPL Labs
mgiraldo
2
400
NYPL Labs @ Eyeo Festival 2015
mgiraldo
1
630
Other Decks in Technology
See All in Technology
「我々はどこに向かっているのか」を問い続けるための仕組みづくり / Establishing a System for Continuous Inquiry about where we are
daitasu
0
170
Docker互換のセキュアなコンテナ実行環境「Podman」超入門
devops_vtj
6
3.2k
20240725 LLMによるDXのビジョンと、今何からやるべきか @Azure OpenAI Service Dev Day
nrryuya
3
1.2k
AWSサービスメニュー開発をしていてAWSを好きだ!と感じた瞬間
toru_kubota
0
130
AOAI Dev Day - Opening Session
yoshidashingo
2
440
年間一億円削減した時系列データベースのアーキテクチャ改善~不確実性の高いプロジェクトへの挑戦~
lycorptech_jp
PRO
3
2.9k
エンジニア向け会社紹介資料
caddi_eng
14
220k
【基調講演】変える、今ここから ― IoTとAIで紡ぐ未来
soracom
PRO
0
320
dxd2024-生成AIに振り回された3か月間の成功と失敗/dxd2024-link-and-motivation
lmi
2
260
公共領域から学ぶ クラウド移行についてエンジニアが意識していること
kawakawa2222
0
140
ペパボのオブザーバビリティ研修2024 説明資料
kesompochy
0
1.1k
大規模ドラレコデータ収集・機械学習基盤を支える AWS CDK 〜導入・運用事例紹介〜
pemugi
0
110
Featured
See All Featured
It's Worth the Effort
3n
181
27k
Intergalactic Javascript Robots from Outer Space
tanoku
266
26k
GraphQLとの向き合い方2022年版
quramy
36
13k
How to Ace a Technical Interview
jacobian
274
23k
Speed Design
sergeychernyshev
9
270
BBQ
matthewcrist
82
9k
Rails Girls Zürich Keynote
gr2m
93
13k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
245
1.2M
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
149
45k
From Idea to $5000 a Month in 5 Months
shpigford
377
46k
Six Lessons from altMBA
skipperchong
24
3.2k
How STYLIGHT went responsive
nonsquared
93
5k
Transcript
mauricio giraldo arteaga @mgiraldo NYPL Labs
None
bon jour
my name is mauricio
None
research and circulating library system spanning the Bronx, Staten Island
and Manhattan boroughs in NYC
None
NYPL Labs
None
i’m going to talk about maps
The Great Map Data Extraction
an adventure in three acts and a prologue and an
epilogue
prologue
The Lionel Pincus and Princess Firyal Map Division
None
None
None
None
None
None
500,000+ maps 20,000+ books & atlases
None
None
None
None
None
year
street names year
use type street names year
use type street names name year
material use type street names name year
material use type street names name class year
material use type street names address name class year
material use type street names address floors name class year
material use type street names address floors name class year
skylights
material use type street names address floors name class year
skylights backyards
material use type street names address floors name class geo
location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
we got these for several decades since the 1800s and
by 1950 every town in the US with a population of 2,000 had been mapped
data trapped in a legacy format
we want all the data!
f**k yeah historical data!
citysdk.waag.org/buildings
citysdk.waag.org/buildings
NYU Stern / Imaginaria3D
NYU Stern / Imaginaria3D
maps.google.com
maps.google.com
None
data
it all starts with a photograph
None
but it is “just a photo” but it is only
a few clicks away
None
maps.nypl.org/warper
None
None
geo-rectification or: “make it match Open Street Map”
None
None
*this is a simulation. actual process is intensive. consult your
mathematician before trying
None
None
vectorization or: “draw the building shapes”
None
results from maps.nypl.org/warper
hand-crafted, artisanal, locally-sourced data
500,000+ maps 20,000+ books & atlases
500,000+ maps 20,000+ books & atlases* *imagine how many pages
an atlas has
in the order of dozens of millions building footprints if
counting NYC only
None
~120k footprints produced in three years by staff and volunteers
None
this will take us a few millenia* *actual number taken
out from a hat
there has to be a better way
act i: will there be polygons?
requests to geo companies went unanswered
None
can we automate this?
None
¿¡quoi!? @mgiraldo
None
None
None
None
what is a building?
None
completely enclosed by black lines
completely enclosed by black lines dashed lines are not walls
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
process
github.com/NYPL/map-vectorizer
None
None
None
None
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
provide the best (possible) input image
None
None
None
None
differences in resampling cubic nearest neighbor
differences in resampling cubic nearest neighbor
make the image a binary bitmap or: “black and white”
None
None
polygonize or: “convert contiguous pixels to a single line shape”
None
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
None
no no no no no
no no no no no yes yes
simplify* *for those polygons that we care about
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔
None
None
alpha shape *code basically stolen wholesale from rpubs.com/geospacedman/alphasimple
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
we need a set of points
None
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="random")
now we alpha shaping
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
there are other point reduction algorithms like Ramer-Douglas-Peucker or Whyatt
Curve Simplification
separate the buildings from the chaff
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔ ✔ ✔
None
None
[218, 211, 209]
[218, 211, 209] paper [199, 179, 173], [179, 155, 157],
[206, 193, 189], [199, 195, 163], [207, 204, 179], [195, 189, 154], [209, 203, 181], [255, 225, 40], [194, 198, 192], [161, 175, 190], [137, 174, 163], [166, 176, 172], [149, 156, 141] [205, 200, 186] not paper
None
None
None
this is good enough for our use case
None
None
None
✔ ✔ ✔ ✔ ✔ completely enclosed by black lines
dashed lines are not walls > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
computer-vision for attribute recognition *bonus quest
None
None
None
66,056 footprints produced in one day for an 1859 atlas
of Manhattan
caveats: ! adjacency not enforced false positives/negatives buildings may also
overlap
act ii: the vectorizer needs to prove itself
None
None
None
None
multiple inspections for each item and let consensus surface on
its own
footprint validation or: “tell us what the computer got right
or wrong“
are people willing to spend time checking building footprints? insurance
atlases are not exactly the coolest type of maps
None
buildinginspector.nypl.org
github.com/NYPL/building-inspector
None
None
None
None
about a month later…
None
None
None
None
420k+ flags* 70k+ unique polygons ! consensus: ~84% YES, 7%
FIX, 9% NO *a “flag” is a YES/NO/FIX by one person for a given polygon
seems people are willing after all… we — our contributors
seems people are willing after all… we — our contributors
act iii: the return of the inspector
footprint material use type street names address floors name class
geo location year skylights backyards
divide and conquer
footprint material use type street names address floors name class
geo location year skylights backyards
three new tasks for now… we really want it all!
None
footprint material use type street names address floors name class
geo location year skylights backyards
check
check YES
check YES address color
check YES FIX address color
check YES FIX address color fix
check YES FIX address color fix
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
fix
fix
address
address
classify color
classify color
865k+ flags
check YES FIX address color fix
check YES FIX address color fix for 80k+ unique polygons
77k+ 5k+ 42k+ 26k+
epilogue
address and shape consensus or: how to determine what the
right building footprint and address looks like?
None
None
all points are useful inclusiveness above all
None
None
None
None
None
None
None
None
DBSCAN for the win citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980
bit.ly/nypl-consensus
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 414 + +
None
None
DBSCAN for shapes also!
None
None
None
None
None
None
all points are still useful
None
﹡
﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
None
None
None
None
None
None
None
resulting data available via an API
resulting data available via an API in 100% recyclable GeoJSON
None
photographing
photographing ↓
photographing ↓ geo-rectification
photographing ↓ geo-rectification ↓
photographing ↓ geo-rectification ↓ vectorization
photographing ↓ geo-rectification ↓ vectorization ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓ data release
not the end
None
None
None
¡merci beaucoup! mauricio giraldo arteaga @mgiraldo NYPL Labs slides at:
bit.ly/nypl-ehess images from: NYPL digital collections - Wikimedia Commons Christopher Cannon - Flickr user wallyg - Giphy