Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
52
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
69
High-impact refactors keeping the lights on
kartones
0
65
Remote Work
kartones
0
92
Intro to GameBoy Development
kartones
0
98
Myths & The Real World of OpenSource Development
kartones
0
47
CartoDB Tech Intro
kartones
0
48
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
What's new in Spring Modulith?
olivergierke
1
150
CSC509 Lecture 06
javiergs
PRO
0
260
エンジニアインターン「Treasure」とHonoの2年、そして未来へ / Our Journey with Hono Two Years at Treasure and Beyond
carta_engineering
0
130
その面倒な作業、「Dart」にやらせませんか? Flutter開発者のための業務効率化
yordgenome03
1
130
Catch Up: Go Style Guide Update
andpad
0
230
Advance Your Career with Open Source
ivargrimstad
0
550
10年もののAPIサーバーにおけるCI/CDの改善の奮闘
mbook
0
830
詳しくない分野でのVibe Codingで困ったことと学び/vibe-coding-in-unfamiliar-area
shibayu36
3
5k
デミカツ切り抜きで面倒くさいことはPythonにやらせよう
aokswork3
0
240
3年ぶりにコードを書いた元CTOが Claude Codeと30分でMVPを作った話
maikokojima
0
280
技術的負債の正体を知って向き合う / Facing Technical Debt
irof
0
170
Six and a half ridiculous things to do with Quarkus
hollycummins
0
170
Featured
See All Featured
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
GraphQLとの向き合い方2022年版
quramy
49
14k
Unsuck your backbone
ammeep
671
58k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Statistics for Hackers
jakevdp
799
220k
Practical Orchestrator
shlominoach
190
11k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
33
2.3k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
189
55k
Facilitating Awesome Meetings
lara
56
6.6k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]