Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scrape web contents in Clojure
Search
ayato
January 09, 2016
Programming
97
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
How to scrape web contents in Clojure
ayato
January 09, 2016
More Decks by ayato
See All by ayato
マイクロサービス内で動くAPIをF#で書いている
ayato0211
1
1.5k
Clojureという言語が私逹にもたらしたもの
ayato0211
6
3.2k
3年間考え続けてきたWebアプリケーションにおけるテストの話
ayato0211
3
300
Re:REPL-Driven Development
ayato0211
3
1.4k
Meta Template Engine
ayato0211
2
1.2k
超変換! Hiccup data structure!!
ayato0211
2
660
About Integrant
ayato0211
0
600
Muscle Assert
ayato0211
0
320
Clojureを用いたWebアプリケーション開発
ayato0211
2
3.2k
Other Decks in Programming
See All in Programming
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
660
エンジニアと一緒にテストコードの設計と実装を改善した話
mototakatsu
0
170
Lessons from Spec-Driven Development
simas
PRO
0
190
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
2
640
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
350
「なぜそう決めたのか」を残し続ける仕組み ― Notion AI カスタムエージェント × Slack連携による設計判断の自動記録 - NIKKEI Tech Talk #47
niftycorp
PRO
0
160
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
13
4k
Developing with AI Agents — Codex, Claude Code & Cowork Practical Guide
x5gtrn
PRO
0
1.3k
Honoでのサプライチェーン侵害対策 〜 3つのライブラリに学ぶ
yusukebe
2
340
Signal Forms: Details & Live Coding @enterJS 2026 in Mannheim
manfredsteyer
PRO
0
130
Go1.27で導入されるジェネリクスメソッドでできること
mackee
0
120
AI 時代のソフトウェア設計の学び方
masuda220
PRO
29
12k
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
940
Designing for Performance
lara
611
70k
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
840
HDC tutorial
michielstock
2
710
Side Projects
sachag
455
43k
Un-Boring Meetings
codingconduct
0
310
Discover your Explorer Soul
emna__ayadi
2
1.1k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.4k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
350
What's in a price? How to price your products and services
michaelherold
247
13k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
230
Transcript
)PXUPTDSBQF XFCDPOUFOUT JO$MPKVSF !@BZBUP@Q
͋ͽʔ $MPKVSJBO $ZCP[V4UBSUVQT *OD
8IBUJTXFCTDSBQJOH ΣϒεΫϨΠϐϯά 8FCTDSBQJOH ͱɺ ΣϒαΠτ͔ΒใΛநग़͢Δ ίϯϐϡʔλιϑτΣΞٕज़ͷ͜ͱɻ CZXJLJQFEJB
1SPCMFNT 8FCίϯςϯπߏʹ͍ۙܗΛ͍ͯ͠Δ ࣅ͍ͯΔϖʔδ͕ࢁ͋Δ͕ඍົʹҧ͏ ߏΛ୧Δ࠶ؼతͳίʔυΛॻ͘ඞཁ͕͋Δ ͍͍ͩͨ໘͍͘͞
4LZTDSBQFS ߏΛ࠶ؼతʹ୧ͬͯ͘ΕΔ ϖʔδͷλΠϓຖʹॲཧํ๏͚ͩॻ͚͍͍ ԆγʔέϯεΛฦͯ͘͠ΕΔ Ωϟογϡػߏ͕͍͍ͭͯΔ εΫϨΠϐϯά෦&OMJWFґଘ IUUQTHJUIVCDPNOBUIFMMTLZTDSBQFS
(defn seed [username from until] (let [url (str "http://twilog.org/" username)]
[{:username username :from from :until until :url url :processor ::user-page}])) (s/defprocessor user-page :cache-template "twilog/:username" :process-fn (fn [res {:keys [username]}] (let [not-registered (seq (html/select res [:div.box-info.box-icon])) not-found (seq (html/select res [:div.box-attention.box-icon]))] (cond not-registered [{:msg "This account was not registered."}] not-found [{:msg "This account was not found."}] :else [{:url (str "http://twilog.org/" username "/archives") :processor ::archives-page}])))) &YBNQMF
(defn scrape [username & [{:as options :keys [html-cache processed-cache from
until] :or {html-cache true processed-cache true from "00000000" until "99999999"}}]] (let [handler (create-handler identity options)] (handler (s/scrape (seed username from until) :html-cache html-cache :processed-cache processed-cache)))) &YBNQMF
$PODMVTJPO 4LZTDSBQFSΛ͏ͱͤʹͳΕΔ $MPKVSF࠷ߴʂ
Enjoy Clojure