and values • Each document is identified by an ID and has a revision • Documents can have file attachments • Stored as JSON, so it’s easy to interface with
you must supply a revision number • Your change will be rejected if the revision is not the latest • All writes are serialized • No need for locks, but puts some responsibility on developers
<key, value> pairs • Example: Apache access.log • Each line is a record • Extract client IP address and number of bytes transferred • Emit IP address as key, number of bytes as value • For hourly rotating logs, the job can be split across 24 nodes* * In pratice, it’s a lot smarter than that
and all values for this specific key • Even if there are many Mappers on many computers; the results are aggregated before they are handed to Reducers • Example: Apache access.log • The Reducer is called once for each client IP (that’s our key), with a list of values (transferred bytes) • We simply sum up the bytes to get the total traffic per IP!