Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data on the Web
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Will Farrington
June 08, 2011
Technology
3
200
Data on the Web
It's an intro to data on the web for some folks new to web development.
Will Farrington
June 08, 2011
Tweet
Share
More Decks by Will Farrington
See All by Will Farrington
test-queue makes your tests run fast
wfarr
0
460
Incident Response Done Right: From First Page to Postmortem
wfarr
0
580
Boxen: PuppetConf 2013
wfarr
6
890
Puppet at GitHub: PuppetConf 2013
wfarr
21
2.2k
Puppet at GitHub (PuppetCamp Raleigh 2013)
wfarr
1
480
Boxen: PuppetCamp SF 2013
wfarr
5
1k
Boxen: MWRC
wfarr
5
260
Boxen: PuppetCamp ATL
wfarr
0
300
BOXEN
wfarr
43
5.5k
Other Decks in Technology
See All in Technology
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
210
Meshy Proプラン課金した
henjin0
0
250
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
490
Cosmos World Foundation Model Platform for Physical AI
takmin
0
430
IaaS/SaaS管理における SREの実践 - SRE Kaigi 2026
bbqallstars
4
1.7k
なぜ今、コスト最適化(倹約)が必要なのか? ~AWSでのコスト最適化の進め方「目的編」~
htan
1
110
広告の効果検証を題材にした因果推論の精度検証について
zozotech
PRO
0
140
ZOZOにおけるAI活用の現在 ~開発組織全体での取り組みと試行錯誤~
zozotech
PRO
5
4.9k
制約が導く迷わない設計 〜 信頼性と運用性を両立するマイナンバー管理システムの実践 〜
bwkw
3
890
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
140
CDKで始めるTypeScript開発のススメ
tsukuboshi
1
340
ブロックテーマでサイトをリニューアルした話 / 2026-01-31 Kansai WordPress Meetup
torounit
0
460
Featured
See All Featured
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
0
100
Designing for Timeless Needs
cassininazir
0
130
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
380
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.8k
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
79
AI: The stuff that nobody shows you
jnunemaker
PRO
2
250
Six Lessons from altMBA
skipperchong
29
4.1k
First, design no harm
axbom
PRO
2
1.1k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
430
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
930
Transcript
Data on the Web Will Farrington
File I/O
File I/O • Minimal reusability • No "correct" format •
Hard to maintain • Prone to problems caused by encoding changes
Standardize
CSV
Comma Separated Values
CSV • Used for tabular data • Small footprint •
Widely recognized and supported format • Many different flavors • Support in database systems and spreadsheets
Example CSV Id,Name,Desc,Points,Due 1,Homework 1,Nothing special,15,6/7/2011 15,"Project, número uno",,100,6/21/2001
XML
Extensible Markup Language
XML • Open, standard specification • Unicode-friendly • Came to
prominence with Java and .NET • Widely used on the web • Good at representing tree-like data
Example XML <?xml version="1.0" encoding="UTF-8"?> <statuses type="array"> <status> <created_at>Tue Jun
07 21:30:50 +0000 2011</created_at> <id>78212343649140736</id> <text>@skalnik Looks good.</text> <source><a href="http://itunes.apple.com/us/app/twitter/id409789998?mt=12" rel="nofollow">Twitter for Mac</a></source> <truncated>false</truncated> <favorited>false</favorited> <in_reply_to_status_id>78211453777231872</in_reply_to_status_id> <in_reply_to_user_id>15878923</in_reply_to_user_id> <in_reply_to_screen_name>skalnik</in_reply_to_screen_name> <retweet_count>0</retweet_count> <retweeted>false</retweeted> <user> <id>10403812</id> </user> <geo/> <coordinates/> <place/> <contributors/> </status> </statuses>
Criticisms of XML • Very verbose • Parsers can be
extremely complicated • Does not map well to some type systems • Does not represent highly structured data well
JSON
JavaScript Object Notation
JSON • Based on a subset of JavaScript circa 2003
• Lightweight • Simple to parse • Designed to be human-readable • Well-suited to structured data as well as trees
Example JSON [{ "coordinates":null, "created_at":"Tue Jun 07 21:30:50 +0000 2011",
"truncated":false, "favorited":false, "contributors":null, "text":"@skalnik Looks good.", "id":78212343649140736, "retweet_count":0, "geo":null, "retweeted":false, "in_reply_to_user_id":15878923, "source":"<a href=\"http://itunes.apple.com/us/app/twitter/id409789998?mt=12\" rel= \"nofollow\">Twitter for Mac</a>", "place":null, "in_reply_to_screen_name":"skalnik", "user":{"id":10403812}, "in_reply_to_status_id":78211453777231872 }]
More on JSON • eval() (is bad) • JSON.parse() •
Built-in browser support • Popular for AJAX: both single-domain and cross-domain
JSONP • JSON with Padding • Used for cross-domain requests
• Alternative to Cross-Origin Resource Sharing • Only supports GET
BSON • Binary JSON • Superset of JSON • Used
by MongoDB for storage of binary data
YAML
YAML Ain't Markup Language
YAML • Not often used over the network • Popular
for configuration files • Human-readable • Data-oriented • No execution means no injection
Example YAML --- - coordinates: created_at: Tue Jun 07 21:30:50
+0000 2011 truncated: false favorited: false contributors: text: "@skalnik Looks good." id: 78212343649140736 retweet_count: 0 geo: retweeted: false in_reply_to_user_id: 15878923 source: <a href="http://itunes.apple.com/us/app/twitter/id409789998?mt=12" rel="nofollow">Twitter for Mac</a> place: in_reply_to_screen_name: skalnik user: id: 10403812 in_reply_to_status_id: 78211453777231872
What to do with all these formats?
APIs
Application Programming Interfaces
APIs • Websites tell you what formats they support •
Websites document their URL structure • Developers use these APIs to integrate products • You can even consume your own APIs
But...
Not everyone offers APIs
What do?
Screen-scraping
Screen-scraping • Requests the full HTML for a page •
Parses out the content you want • Slow • Website layout may change and break yours
Demo!
Questions?
Will Farrington
[email protected]
http://speakerdeck.com/u/wfarr