$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
55
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
45
Python static typing with MyPy
kartones
0
71
High-impact refactors keeping the lights on
kartones
0
67
Remote Work
kartones
0
94
Intro to GameBoy Development
kartones
0
100
Myths & The Real World of OpenSource Development
kartones
0
49
CartoDB Tech Intro
kartones
0
49
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
tsgolintはいかにしてtypescript-goの非公開APIを呼び出しているのか
syumai
7
2.2k
20 years of Symfony, what's next?
fabpot
2
360
LLMで複雑な検索条件アセットから脱却する!! 生成的検索インタフェースの設計論
po3rin
3
740
251126 TestState APIってなんだっけ?Step Functionsテストどう変わる?
east_takumi
0
320
Integrating WordPress and Symfony
alexandresalome
0
150
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
160
Why Kotlin? 電子カルテを Kotlin で開発する理由 / Why Kotlin? at Henry
agatan
2
7.3k
ゲームの物理 剛体編
fadis
0
350
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
190
マスタデータ問題、マイクロサービスでどう解くか
kts
0
100
Flutter On-device AI로 완성하는 오프라인 앱, 박제창 @DevFest INCHEON 2025
itsmedreamwalker
1
110
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
110
Featured
See All Featured
Raft: Consensus for Rubyists
vanstee
141
7.2k
Speed Design
sergeychernyshev
33
1.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.2k
How to train your dragon (web standard)
notwaldorf
97
6.4k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Optimizing for Happiness
mojombo
379
70k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]