Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Map-Reduce for BarCamp Tampa 2011

Map-Reduce for BarCamp Tampa 2011

Bryce "BonzoESC" Kerley

October 26, 2011
Tweet

More Decks by Bryce "BonzoESC" Kerley

Other Decks in Programming

Transcript

  1. You can’t buy a faster CPU Saturday, October 22, 11

    The obvious solution is to buy more of them.
  2. Distributed algorithms Saturday, October 22, 11 An algorithm that can

    be run multiple places at the same time; bitcoin mining, SETI@home
  3. Your database is a POS Saturday, October 22, 11 SQL

    databases are consistent, so it’s hard to make them distributed (the features that make them consistent make them less partition-tolerant or less available.)
  4. CAP Theorem Consistency: (all nodes see the same data at

    the same time) Availability: (a guarantee that every request receives a response about whether it was successful or failed) Partition tolerance: (the system continues to operate despite arbitrary message loss) http://en.wikipedia.org/wiki/CAP_theorem Saturday, October 22, 11 PICK TWO
  5. This is my ledger Saturday, October 22, 11 You want

    your ledger to be consistent; you’d rather it be temporarily unavailable than it be wrong. Use a good SQL like Postgres or Oracle (if you’re rich).
  6. This is my list of status updates Saturday, October 22,

    11 If people go to Facebook and it’s missing a different half of their updates every time they visit, and sometimes it’s completely down, they won’t be back. If it’s missing the most recent one, they won’t notice. Use something “eventually consistent.”
  7. Map For each object in collection: Transform into new object

    Return the transformed collection “Capitalize every letter in this sentence.” Saturday, October 22, 11
  8. Reduce For each object in collection: Include each object into

    an aggregate Return aggregate “Add all the numbers in this array.” Saturday, October 22, 11
  9. Riak Disclaimer: I am an employee of Basho Technologies and

    talking about Riak is my job Saturday, October 22, 11
  10. Saturday, October 22, 11 A popular bitcoin trading site that

    shall remain unnamed had their database leak a while ago, including password hashes. Somebody took the liberty of cracking a bunch of these. Let’s see what the most common three-character combinations in them are.
  11. Map function(v) { var password = v.values[0].data; var trigrams =

    []; var length = password.length; var trigram_count = length - 3; for (var i = 0; i < trigram_count; i++) { var slice = password.slice(i, i+3); var gram = {'keys':[slice]}; gram[slice] = 1; trigrams.push(gram); } return trigrams; } Saturday, October 22, 11 We want to break each password into a list of trigrams.
  12. Reduce function(grams) { var accum = {keys: []}; var pile_count

    = grams.length; for (var i = 0; i < pile_count; i++) { var pile = grams[i]; var key_count = pile.keys.length; for (var j = 0; j < key_count; j++) { var gram = pile.keys[j]; if(!!accum[gram]) { accum[gram] += pile[gram]; } else { accum.keys.push(gram); accum[gram] = pile[gram]; } } } return [accum]; } Saturday, October 22, 11 Then we want to count them.
  13. Map-Reduce Query { "inputs": "passwords", "query": [ { "map": {

    "language": "javascript", "source": " function(v) { …" } }, { "reduce": { "language": "javascript", "source": " function(grams) {…" } } ] } Saturday, October 22, 11 Then we want to count them.