Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data on the Web
Search
Will Farrington
June 08, 2011
Technology
3
190
Data on the Web
It's an intro to data on the web for some folks new to web development.
Will Farrington
June 08, 2011
Tweet
Share
More Decks by Will Farrington
See All by Will Farrington
test-queue makes your tests run fast
wfarr
0
450
Incident Response Done Right: From First Page to Postmortem
wfarr
0
570
Boxen: PuppetConf 2013
wfarr
6
870
Puppet at GitHub: PuppetConf 2013
wfarr
21
2.2k
Puppet at GitHub (PuppetCamp Raleigh 2013)
wfarr
1
460
Boxen: PuppetCamp SF 2013
wfarr
5
970
Boxen: MWRC
wfarr
5
230
Boxen: PuppetCamp ATL
wfarr
0
280
BOXEN
wfarr
43
5.5k
Other Decks in Technology
See All in Technology
GraphRAG グラフDBを使ったLLM生成(自作漫画DBを用いた具体例を用いて)
seaturt1e
1
150
IBC 2025 動画技術関連レポート / IBC 2025 Report
cyberagentdevelopers
PRO
2
210
re:Inventに行くまでにやっておきたいこと
nagisa53
0
580
ヘンリー会社紹介資料(エンジニア向け) / company deck for engineer
henryofficial
0
400
JSConf JPのwebsiteをGatsbyからNext.jsに移行した話 - Next.jsの多言語静的サイトと課題
leko
2
190
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
14
82k
組織全員で向き合うAI Readyなデータ利活用
gappy50
3
1.2k
オブザーバビリティと育てた ID管理・認証認可基盤の歩み / The Journey of an ID Management, Authentication, and Authorization Platform Nurtured with Observability
kaminashi
1
1k
ゼロコード計装導入後のカスタム計装でさらに可観測性を高めよう
sansantech
PRO
1
500
OSSで50の競合と戦うためにやったこと
yamadashy
3
1k
AI連携の新常識! 話題のMCPをはじめて学ぶ!
makoakiba
0
130
もう外には出ない。より快適なフルリモート環境を目指して
mottyzzz
13
11k
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
697
190k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.5k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
How to Ace a Technical Interview
jacobian
280
24k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
930
Build your cross-platform service in a week with App Engine
jlugia
233
18k
How to train your dragon (web standard)
notwaldorf
97
6.3k
Designing Experiences People Love
moore
142
24k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Speed Design
sergeychernyshev
32
1.2k
Transcript
Data on the Web Will Farrington
File I/O
File I/O • Minimal reusability • No "correct" format •
Hard to maintain • Prone to problems caused by encoding changes
Standardize
CSV
Comma Separated Values
CSV • Used for tabular data • Small footprint •
Widely recognized and supported format • Many different flavors • Support in database systems and spreadsheets
Example CSV Id,Name,Desc,Points,Due 1,Homework 1,Nothing special,15,6/7/2011 15,"Project, número uno",,100,6/21/2001
XML
Extensible Markup Language
XML • Open, standard specification • Unicode-friendly • Came to
prominence with Java and .NET • Widely used on the web • Good at representing tree-like data
Example XML <?xml version="1.0" encoding="UTF-8"?> <statuses type="array"> <status> <created_at>Tue Jun
07 21:30:50 +0000 2011</created_at> <id>78212343649140736</id> <text>@skalnik Looks good.</text> <source><a href="http://itunes.apple.com/us/app/twitter/id409789998?mt=12" rel="nofollow">Twitter for Mac</a></source> <truncated>false</truncated> <favorited>false</favorited> <in_reply_to_status_id>78211453777231872</in_reply_to_status_id> <in_reply_to_user_id>15878923</in_reply_to_user_id> <in_reply_to_screen_name>skalnik</in_reply_to_screen_name> <retweet_count>0</retweet_count> <retweeted>false</retweeted> <user> <id>10403812</id> </user> <geo/> <coordinates/> <place/> <contributors/> </status> </statuses>
Criticisms of XML • Very verbose • Parsers can be
extremely complicated • Does not map well to some type systems • Does not represent highly structured data well
JSON
JavaScript Object Notation
JSON • Based on a subset of JavaScript circa 2003
• Lightweight • Simple to parse • Designed to be human-readable • Well-suited to structured data as well as trees
Example JSON [{ "coordinates":null, "created_at":"Tue Jun 07 21:30:50 +0000 2011",
"truncated":false, "favorited":false, "contributors":null, "text":"@skalnik Looks good.", "id":78212343649140736, "retweet_count":0, "geo":null, "retweeted":false, "in_reply_to_user_id":15878923, "source":"<a href=\"http://itunes.apple.com/us/app/twitter/id409789998?mt=12\" rel= \"nofollow\">Twitter for Mac</a>", "place":null, "in_reply_to_screen_name":"skalnik", "user":{"id":10403812}, "in_reply_to_status_id":78211453777231872 }]
More on JSON • eval() (is bad) • JSON.parse() •
Built-in browser support • Popular for AJAX: both single-domain and cross-domain
JSONP • JSON with Padding • Used for cross-domain requests
• Alternative to Cross-Origin Resource Sharing • Only supports GET
BSON • Binary JSON • Superset of JSON • Used
by MongoDB for storage of binary data
YAML
YAML Ain't Markup Language
YAML • Not often used over the network • Popular
for configuration files • Human-readable • Data-oriented • No execution means no injection
Example YAML --- - coordinates: created_at: Tue Jun 07 21:30:50
+0000 2011 truncated: false favorited: false contributors: text: "@skalnik Looks good." id: 78212343649140736 retweet_count: 0 geo: retweeted: false in_reply_to_user_id: 15878923 source: <a href="http://itunes.apple.com/us/app/twitter/id409789998?mt=12" rel="nofollow">Twitter for Mac</a> place: in_reply_to_screen_name: skalnik user: id: 10403812 in_reply_to_status_id: 78211453777231872
What to do with all these formats?
APIs
Application Programming Interfaces
APIs • Websites tell you what formats they support •
Websites document their URL structure • Developers use these APIs to integrate products • You can even consume your own APIs
But...
Not everyone offers APIs
What do?
Screen-scraping
Screen-scraping • Requests the full HTML for a page •
Parses out the content you want • Slow • Website layout may change and break yours
Demo!
Questions?
Will Farrington
[email protected]
http://speakerdeck.com/u/wfarr