Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to scrape web contents in Clojure

ayato
January 09, 2016

How to scrape web contents in Clojure

ayato

January 09, 2016
Tweet

More Decks by ayato

Other Decks in Programming

Transcript

  1. (defn seed [username from until] (let [url (str "http://twilog.org/" username)]

    [{:username username :from from :until until :url url :processor ::user-page}])) (s/defprocessor user-page :cache-template "twilog/:username" :process-fn (fn [res {:keys [username]}] (let [not-registered (seq (html/select res [:div.box-info.box-icon])) not-found (seq (html/select res [:div.box-attention.box-icon]))] (cond not-registered [{:msg "This account was not registered."}] not-found [{:msg "This account was not found."}] :else [{:url (str "http://twilog.org/" username "/archives") :processor ::archives-page}])))) &YBNQMF
  2. (defn scrape [username & [{:as options :keys [html-cache processed-cache from

    until] :or {html-cache true processed-cache true from "00000000" until "99999999"}}]] (let [handler (create-handler identity options)] (handler (s/scrape (seed username from until) :html-cache html-cache :processed-cache processed-cache)))) &YBNQMF