Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
50
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
68
High-impact refactors keeping the lights on
kartones
0
64
Remote Work
kartones
0
88
Intro to GameBoy Development
kartones
0
96
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
46
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfold' relates to 'iterate'"
philipschwarz
PRO
0
160
『Python → TypeScript』オンボーディング奮闘記
takumi_tatsuno
1
140
#QiitaBash TDDでAIに設計イメージを伝える
ryosukedtomita
2
1.6k
Cloudflare Realtime と Workers でつくるサーバーレス WebRTC
nekoya3
0
260
PT AI без купюр
v0lka
0
200
技術的負債と戦略的に戦わざるを得ない場合のオブザーバビリティ活用術 / Leveraging Observability When Strategically Dealing with Technical Debt
yoshiyoshifujii
0
170
AIにコードを生成するコードを作らせて、再現性を担保しよう! / Let AI generate code to ensure reproducibility
yamachu
7
6.1k
Interface vs Types ~型推論が過多推論~
hirokiomote
1
240
TypeScript エンジニアが Android 開発の世界に飛び込んだ話
yuisakamoto
6
1k
ts-morph実践:型を利用するcodemodのテクニック
ypresto
1
570
Practical Domain-Driven Design - Workshop at NDC 2025
mufrid
0
140
漸進。
ssssota
0
1.4k
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
271
27k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
How to Ace a Technical Interview
jacobian
276
23k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
Practical Orchestrator
shlominoach
188
11k
Producing Creativity
orderedlist
PRO
346
40k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
42
2.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
329
24k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.6k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks! kartones@cartodb.com