$ ls -l s3log.txt
-rw-r--r-- 1 mwunsch RTRHQ\Domain Users 13553855 May 16 14:08 s3log.txt
Slide 3
Slide 3 text
13553855
Slide 4
Slide 4 text
13M
Slide 5
Slide 5 text
That’s BIG DATA right?
Slide 6
Slide 6 text
I’m going to need
an Hadoop.
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
What about
small data?
Slide 9
Slide 9 text
What about
small data?
Slide 10
Slide 10 text
small
data
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
Alfred Aho
Peter Weinberger
Brian Kernighan
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
Alfred V. Aho
"AWK is a language for processing text files. A
file is treated as a sequence of records, and by
default each line is a record. Each line is broken
up into a sequence of fields, so we can think of
the first word in a line as the first field, the second
word as the second field, and so on. An AWK
program is of a sequence of pattern-action
statements. AWK reads the input a line at a time.
A line is scanned for each pattern in the program,
and for each pattern that matches, the associated
action is executed.”
Slide 15
Slide 15 text
AWK
- a language for processing text files
- each line is a record
- each line is broken up into a sequence of fields
- pattern-action statements
- for each pattern that matches, the associated
action is executed
Slide 16
Slide 16 text
condition { action }
An AWK program is of a sequence of pattern-action
statements.
Slide 17
Slide 17 text
awk
Slide 18
Slide 18 text
awk
nawk
gawk
mawk
jawk
Slide 19
Slide 19 text
YOU ALREADY
HAVE IT!
Slide 20
Slide 20 text
docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html
Field Name Example
Bucket Owner 79a59df900b949e55d96a1e6…
Bucket mybucket
Time [06/Feb/2014:00:00:38 +0000]
Remote IP 192.0.2.3
Requester 79a59df900b949e55d96a1e698f…
Request ID 3E57427F33A59F07
Operation REST.PUT.OBJECT
Key /photos/2014/08/puppy.jpg
Request-URI "GET /mybucket/photos/2014/08/puppy.jpg?x-foo=bar"
HTTP status 200
Error Code NoSuchBucket
Bytes Sent 2662992
Object Size 3462992
Total Time 70
Turn-Around Time 10
Referrer "http://www.amazon.com/webservices"
User-Agent "curl/7.15.1"
Version Id 3HL4kqtJvjVBH40Nrjfkd
Field Name Example Apache Format String
Bucket Owner 79a59df900b949e55d96a1e6…
Bucket mybucket
Time [06/Feb/2014:00:00:38 +0000] %t
Remote IP 192.0.2.3 %h
Requester 79a59df900b949e55d96a1e698f… %u
Request ID 3E57427F33A59F07
Operation REST.PUT.OBJECT
Key /photos/2014/08/puppy.jpg
Request-URI "GET /mybucket/photos/2014/08…” \”%r\”
HTTP status 200 %>s
Error Code NoSuchBucket
Bytes Sent 2662992 %b
Object Size 3462992
Total Time 70
Turn-Around Time 10
Referrer “http://www.amazon.com/…” \”%{Referer}i\”
User-Agent "curl/7.15.1" \”%{User-agent}i\”
Version Id 3HL4kqtJvjVBH40Nrjfkd
Slide 23
Slide 23 text
Map()
Slide 24
Slide 24 text
{ action }
condition
Slide 25
Slide 25 text
{ action }
Slide 26
Slide 26 text
{ print }
Slide 27
Slide 27 text
Each line is broken up into a sequence of fields…
3252c3… www.abstractfactory.tv [09/May/2
Slide 28
Slide 28 text
Each line is broken up into a sequence of fields…
FS
0897 10897 18 17 "-" "Podcasts/2.0.2" -