Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Casual Log Collection and Querying with fluent-plugin-riak

Casual Log Collection and Querying with fluent-plugin-riak

My talk at RubyKaigi 2013 http://rubykaigi.org/2013/talk/S70

E1923013dacab39eb231a2fffbf7b33c?s=128

UENISHI Kota

June 01, 2013
Tweet

Transcript

  1. Casual Log Collection and Querying with fluent-plugin-riak @kuenishi from @basho

    2013/6/1 RubyKaigi
  2. Who the hell are you? •UENISHI, Kota (@kuenishi) •Basho Japan

    KK •devoted to Distributed Systems for ~6 yrs •msgpack-erlang, Jubatus
  3. Casual Log Collection •Aggregate Every Log with Fluentd •Put Them

    all into <Some Storage You Like> •Ask your Query to <Some Storage You Like>
  4. Whole Sketch

  5. fluentd: casual log collector http://www.flickr.com/photos/markchadwick/8757802771/ http://www.flickr.com/photos/usdagov/5681152426/ before: logs are scattered

    all over the servers in chaos after: all logs flows cleanly via fluentd in order
  6. Nagios MongoDB Hadoop Alerting Amazon S3 Analysis Archiving MySQL Apache

    Frontend Access logs syslogd App logs System logs Backend Databases
  7. Nagios MongoDB Hadoop Alerting Amazon S3 Analysis Archiving MySQL Apache

    Frontend Access logs syslogd App logs System logs Backend Databases filter / buffer / routing
  8. Nagios MongoDB Hadoop Alerting Amazon S3 Analysis Archiving MySQL Apache

    Frontend Access logs syslogd App logs System logs Backend Databases filter / buffer / routing Riak
  9. what’s ? •Distributed Key-Value Store •Focused on •Availability •Scalability •Easy

    Operation, ҆຾ (Sleep)
  10. when Riak? •Hadoop is too much •MongoDB is too small

    •Document DB aspect of Riak •put them all into Riak
  11. Not Only KVS •Aspect of Document Database •MapReduce in JavaScript

    / Erlang
  12. Buy it if interested

  13. fluent-plugin-riak JSON

  14. fluent.conf <match apache.**> type riak # define the cluster via

    pb ports nodes 192.168.0.1:8087 192.168.0.2:8087 </match>
  15. log everything as JSON { "host":"103.5.142.5", "user":"-", "method":"PUT", "path":"/buckets/moriyoshi/object/riaklogo.png", "code":"200",

    "size":"0", "referer":"", "agent":"", "time":"2013-05-27T05:42:09Z", "tag":"riak.cluster2" }, ...
  16. How to Query

  17. Ruby Cluent for Querying irb> q = client.bucket(‘fluentlog’) irb> q

    = q.map(“function(v){ return [v]; }”).reduce(“function(values){ return values; }“, :keep => false) irb> r = q.run()
  18. Debug distributed JS http://www.flickr.com/photos/heatsink/110859301/

  19. Any Other Rubyish way? http://www.flickr.com/photos/snazzyshot/5366645175/

  20. ripple

  21. github.com/basho/ripple •a rich Ruby toolkit for Riak, consists of •Riak

    client •Riak-sessions •Ripple
  22. http://www.flickr.com/photos/toco/2612055052/

  23. None
  24. None
  25. Mohair: Not Only NoSQL http://www.flickr.com/photos/frank-wouters/2464743512/

  26. JSON { "host":"103.5.142.5", "user":"-", "method":"PUT", "path":"/buckets/moriyoshi/object/riaklogo.png", "code":"200", "size":"0", "referer":"", "agent":"",

    "time":"2013-05-27T05:42:09Z", "tag":"riak.cluster2" }, ...
  27. SQL create table apachelogs { host varchar(16), user varchar(256), method

    varchar(5), path varchar(1024), code integer, size integer, referer text, agent varchar(1024), time timestamp, tag varchar(1024) }
  28. “Mohair” for Querying > select * from fluentlog \ where

    method = “GET” group by host
  29. Converting SQL to MapReduce •SQL -(parslet)-> JS -> Riak mapred

    •where sentence is at Map •group by, count(-) is at Reduce
  30. Chef’s Capricious Roadmap •Secondary Index Support •Query Optimization •types: timestamp,

    float •nested columns •insert / delete
  31. check it out! github: basho/riak kuenishi/fluent-plugin-riak kuenishi/mohair (kuenishi/fluent-logger-erlang)

  32. Conclusion •NoSQL is not NoSQL any more •put’em all into

    Riak via Fluentd •Query via SQL with Mohair •waiting for pull requests
  33. Questions? •riak-users-jp@lists.basho.com •Riak Meetup (7/10) •Riak SCR (twice in a

    month) •ιϑτ΢ΣΞσβΠϯ7݄߸(nginx/riak) •σʔλϕʔεΤϯδχΞཆ੒ಡຊ