& Awk Everything a PHP developer should know about sed & awk Edition: PHPBenelux, jan 29, 2011, Antwerp http://en.wikipedia.org/wiki/File:Slender_Loris.jpg woensdag 25 april 12
Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. woensdag 25 april 12
Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. Blogs: http://www.adayinthelifeof.nl woensdag 25 april 12
Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. Blogs: http://www.adayinthelifeof.nl http://www.enrise.com/blog woensdag 25 april 12
easier to use... ✓ Might be faster to write... ✓ Might be better suited for the job... But why learn new stuff when you can do it in PHP? woensdag 25 april 12
easier to use... ✓ Might be faster to write... ✓ Might be better suited for the job... ✓ More efficient But why learn new stuff when you can do it in PHP? woensdag 25 april 12
to tell you HOW to use Sed & Awk. I want to tell you that for certain jobs, tools like Sed & Awk are much better suited than PHP. ! woensdag 25 april 12
to tell you HOW to use Sed & Awk. I want to tell you that for certain jobs, tools like Sed & Awk are much better suited than PHP. Know the capabilities of your tools and you become a better developer... ! woensdag 25 april 12
zones ✓Useful for data manipulation ✓They work well together ✓Both have a similar processing method ✓Both rely heavily on regular expressions woensdag 25 april 12
zones ✓Useful for data manipulation ✓They work well together ✓Both have a similar processing method ✓Both rely heavily on regular expressions ✓Nobody really harvest their power woensdag 25 april 12
Changing IP addresses or other data through many files. Only change data in certain blocks of code/data (for instance, CSV, TXT, SQL files, docblocks etc) • complex find & replace • mutation of large datasets woensdag 25 april 12
retrieval of data Useful for: Changing IP addresses or other data through many files. Only change data in certain blocks of code/data (for instance, CSV, TXT, SQL files, docblocks etc) Only print the next 10 lines after each 404 code read from an apache log file or print all docblocks and function headers • complex find & replace • mutation of large datasets woensdag 25 april 12
sed when: • When you need to change one or two items • When you need aggregation or variables Use sed when: • When you need to change hundreds or thousands of files • “Complex” mutations • Fast “one liners” in scripts woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed 's/foo/bar/g' foo.txt bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed 's/foo/FOO/2' foo.txt foo bar FOO bar foo bar foo bar FOO bar bar foo foo FOO bar woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 s/foo/bar/g' foo.txt bar bar bar bar bar bar bar bar bar bar bar foo foo foo bar woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,/^$/ s/foo/bar/g' foo.txt bar bar bar bar bar bar foo bar foo bar bar foo foo foo bar woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '/^$/,$ s/foo/bar/g' foo.txt foo bar foo bar foo bar bar bar bar bar bar bar bar bar bar woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 ! s/foo/bar/g' foo.txt foo bar foo bar foo bar foo bar foo bar bar bar bar bar bar woensdag 25 april 12
3: change foo’s into bar’s and prepend ‘Line’ to the line Multiple commands per range: sed ‘1,3 { s/foo/bar/g ; s/.*/Line &/ ; }’ file sed ‘1,3 { s/foo/bar/g s/.*/Line &/ }’ file woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 { s/foo/bar/g ; s/.*/Line &/ ; }' foo.txt Line bar bar bar Line bar bar bar Line bar bar bar bar bar foo foo foo bar woensdag 25 april 12
3: change foo’s into bar’s on line 5 to 7: change bar’s into foo’s on all lines: add ‘Line’ in front of the line Multiple ranges: sed ‘1,3 s/foo/bar/g 5,7 s/bar/foo/g s/(.*)/Line \1/’ file woensdag 25 april 12
bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 s/foo/bar/g ; 5,7 s/bar/foo/g ; s/.*/Line: &/' foo.txt Line: bar bar bar Line: bar bar bar Line: bar bar bar Line: bar bar foo Line: foo foo foo woensdag 25 april 12
1,3 { s/foo/bar/g ; p ; } 4,$ { s/bar/foo/g ; p ; } ’ file SED line 1 to 3 will replace ‘foo’ to ‘bar’ and print the line to output woensdag 25 april 12
1,3 { s/foo/bar/g ; p ; } 4,$ { s/bar/foo/g ; p ; } ’ file SED line 4 to the end will replace ‘bar’ to ‘foo’ and print the line to output woensdag 25 april 12
bar foo bar bar bar foo foo foo bar foo bar bar bar foo foo foo bar foo bar foo bar foo bar bar bar foo foo bar bar foo foo cut this line is not added $ sed -n '/^cut/ q ; 1,3 { s/foo/bar/g ; p ; } ; 4,$ { s/bar/foo/g ; p ; }' file.txt bar bar bar bar bar bar bar bar bar bar bar bar foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo woensdag 25 april 12
d n N p q r # = append ‘text’ to output substitute data transform data (like ‘tr’) delete line (don’t print) (print) and goto next line (print) and add next line to pattern space print pattern space quit processing copy contents of file to pattern space comment prints current line number woensdag 25 april 12
\ mi \ fa so \ la \ si do re \ sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do re \ solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re \ mi \ do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi \ do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do re mi \ solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi \ fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
\ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt so \ do re mi fa solfege.txt sed script Pattern buffer Output woensdag 25 april 12
filename sed -n '1 ! G ; h ; $ p' filename sed 's/ *$//' filename Delete all lines containing ‘regex’ Reverse all lines in a file (makes use of the hold buffer) Remove all additional white spaces woensdag 25 april 12
Peter J. Weinberger, Brain W. Kernighan • Written in 1977 at AT&T Bell Laboratories • Multiple versions: AWK, NAWK, GAWK, MAWK and more... woensdag 25 april 12
Peter J. Weinberger, Brain W. Kernighan • Written in 1977 at AT&T Bell Laboratories • Multiple versions: AWK, NAWK, GAWK, MAWK and more... • Pattern-directed scanning and processing language... woensdag 25 april 12
do re mi fa sol la ti do $ awk ' BEGIN { print "start" } /o/ { print "I just saw an o in " $0 } END { print "the end" }' solfege.txt start I just saw an o in do I just saw an o in sol I just saw an o in do the end woensdag 25 april 12
You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) • $1 holds first field, $2 second field etc... woensdag 25 april 12
You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) • $1 holds first field, $2 second field etc... • $NF holds number of fields in record woensdag 25 april 12
You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) • $1 holds first field, $2 second field etc... • $NF holds number of fields in record • $NR holds CURRENT record woensdag 25 april 12
-F\ '$9 ~ /4[0-9][0-9]/ { FS="\""; $0=$0; print $6; FS=" " }' apache.log Googlebot/2.1 (+http://www.googlebot.com/bot.html) Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 libwww-perl/5.805 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Googlebot/2.1 (+http://www.googlebot.com/bot.html) Googlebot/2.1 (+http://www.googlebot.com/bot.html) Googlebot/2.1 (+http://www.googlebot.com/bot.html) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Print the “user agents” from the logfile who triggered a 4xx code woensdag 25 april 12
= array(); foreach( file( './apache.log' ) as $line ) { list( , , , , , , , , $status, $bytes ) = explode( ' ', $line ); if( !isset( $hash[$status] ) ) { $hash[$status] = 0; } $hash[$status] += $bytes; } print_r($hash); Awk one liner compared to PHP: credits to @RichardJ #pfz channel Not a whole lot different, but already more complex and this was just a simple example... $ awk -F\ '{ totals[$9] += $10; } END { for (i in totals) { printf "%d : %d bytes\n", i, totals[i]; } }' apache.log woensdag 25 april 12
= fopen("php://stdin", "r"); while (!feof($stdin)) { $line = fgets($stdin); if (preg_match("/^.o/", $line)) continue; print $line; } ?> Sed one liner compared to PHP: credits to @RichardJ #pfz channel Much more work.... sed ‘/^.o/d’ file woensdag 25 april 12
developer • parse php-errors files, syslog files, apache’s http access logs. • Conversion of files you get from your customers, who always assume you can do magic with a gazzillion GB’s of (unsorted) data (and now you can). woensdag 25 april 12
• are powerful for simple one-liners but can also be used for complex programs • integrates perfectly with other (unix) tools like uniq, sort, cut, find, grep, cat, etc... woensdag 25 april 12
• are powerful for simple one-liners but can also be used for complex programs • integrates perfectly with other (unix) tools like uniq, sort, cut, find, grep, cat, etc... • are a great way to automate complex and/ or repetitive (editing) tasks woensdag 25 april 12
your comfort zone for other (better) tools. http://files.sharenator.com/slender_loris_Worlds_strangest_looking_animals-s300x451-2279-580.jpg woensdag 25 april 12
your comfort zone for other (better) tools. • Can you think of examples where you would use Sed or Awk (instead of php?) http://files.sharenator.com/slender_loris_Worlds_strangest_looking_animals-s300x451-2279-580.jpg woensdag 25 april 12