Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sed & Awk - the dynamic duo - PHPBenelux 2011

Sed & Awk - the dynamic duo - PHPBenelux 2011

Joshua Thijssen

January 29, 2011
Tweet

More Decks by Joshua Thijssen

Other Decks in Programming

Transcript

  1. Sed & Awk - http://joind.in/2489 Harness the power of Sed

    & Awk Everything a PHP developer should know about sed & awk Edition: PHPBenelux, jan 29, 2011, Antwerp http://en.wikipedia.org/wiki/File:Slender_Loris.jpg woensdag 25 april 12
  2. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) woensdag 25 april 12
  3. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. woensdag 25 april 12
  4. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. woensdag 25 april 12
  5. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. Blogs: http://www.adayinthelifeof.nl woensdag 25 april 12
  6. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. Blogs: http://www.adayinthelifeof.nl http://www.enrise.com/blog woensdag 25 april 12
  7. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. Blogs: http://www.adayinthelifeof.nl http://www.enrise.com/blog Email: [email protected] woensdag 25 april 12
  8. Sed & Awk - http://joind.in/2489 First, who am I? Joshua

    Thijssen (32) Senior Software Engineer currently working at Enrise (4worx) Development in PHP, Python, Perl, C, java, assembly. Certified MySQL DBE, MySQL DBA, LPIC-1, LPIC-2, Zend PHP5, Zend PHP5.3, Zend Framework, Ubuntu professional. Blogs: http://www.adayinthelifeof.nl http://www.enrise.com/blog Email: [email protected] Twitter: @jaytaph woensdag 25 april 12
  9. Sed & Awk - http://joind.in/2489 Comfort zones • SQL (where,

    order, limit, join) You are doing this already: woensdag 25 april 12
  10. Sed & Awk - http://joind.in/2489 Comfort zones • SQL (where,

    order, limit, join) • Frameworks (Zend, Symfony, etc) You are doing this already: woensdag 25 april 12
  11. Sed & Awk - http://joind.in/2489 Comfort zones • SQL (where,

    order, limit, join) • Frameworks (Zend, Symfony, etc) • JQuery, Dojo, etc You are doing this already: woensdag 25 april 12
  12. Sed & Awk - http://joind.in/2489 Comfort zones But why learn

    new stuff when you can do it in PHP? woensdag 25 april 12
  13. Sed & Awk - http://joind.in/2489 Comfort zones ✓ Might be

    easier to use... But why learn new stuff when you can do it in PHP? woensdag 25 april 12
  14. Sed & Awk - http://joind.in/2489 Comfort zones ✓ Might be

    easier to use... ✓ Might be faster to write... But why learn new stuff when you can do it in PHP? woensdag 25 april 12
  15. Sed & Awk - http://joind.in/2489 Comfort zones ✓ Might be

    easier to use... ✓ Might be faster to write... ✓ Might be better suited for the job... But why learn new stuff when you can do it in PHP? woensdag 25 april 12
  16. Sed & Awk - http://joind.in/2489 Comfort zones ✓ Might be

    easier to use... ✓ Might be faster to write... ✓ Might be better suited for the job... ✓ More efficient But why learn new stuff when you can do it in PHP? woensdag 25 april 12
  17. Sed & Awk - http://joind.in/2489 Comfort zones I don’t want

    to tell you HOW to use Sed & Awk. ! woensdag 25 april 12
  18. Sed & Awk - http://joind.in/2489 Comfort zones I don’t want

    to tell you HOW to use Sed & Awk. I want to tell you that for certain jobs, tools like Sed & Awk are much better suited than PHP. ! woensdag 25 april 12
  19. Sed & Awk - http://joind.in/2489 Comfort zones I don’t want

    to tell you HOW to use Sed & Awk. I want to tell you that for certain jobs, tools like Sed & Awk are much better suited than PHP. Know the capabilities of your tools and you become a better developer... ! woensdag 25 april 12
  20. Sed & Awk - http://joind.in/2489 Why Sed & Awk? Comfort

    zones ✓Useful for data manipulation woensdag 25 april 12
  21. Sed & Awk - http://joind.in/2489 Why Sed & Awk? Comfort

    zones ✓Useful for data manipulation ✓They work well together woensdag 25 april 12
  22. Sed & Awk - http://joind.in/2489 Why Sed & Awk? Comfort

    zones ✓Useful for data manipulation ✓They work well together ✓Both have a similar processing method woensdag 25 april 12
  23. Sed & Awk - http://joind.in/2489 Why Sed & Awk? Comfort

    zones ✓Useful for data manipulation ✓They work well together ✓Both have a similar processing method ✓Both rely heavily on regular expressions woensdag 25 april 12
  24. Sed & Awk - http://joind.in/2489 Why Sed & Awk? Comfort

    zones ✓Useful for data manipulation ✓They work well together ✓Both have a similar processing method ✓Both rely heavily on regular expressions ✓Nobody really harvest their power woensdag 25 april 12
  25. Sed & Awk - http://joind.in/2489 SED • is a Stream

    EDitor • applies rules based on a stream of data (per line) woensdag 25 april 12
  26. Sed & Awk - http://joind.in/2489 SED • is a Stream

    EDitor • applies rules based on a stream of data (per line) • there is no turning back into the stream (going forward only) woensdag 25 april 12
  27. Sed & Awk - http://joind.in/2489 Why use SED? Useful for:

    Changing IP addresses or other data through many files. • mutation of large datasets woensdag 25 april 12
  28. Sed & Awk - http://joind.in/2489 Why use SED? Useful for:

    Changing IP addresses or other data through many files. Only change data in certain blocks of code/data (for instance, CSV, TXT, SQL files, docblocks etc) • complex find & replace • mutation of large datasets woensdag 25 april 12
  29. Sed & Awk - http://joind.in/2489 Why use SED? • complex

    retrieval of data Useful for: Changing IP addresses or other data through many files. Only change data in certain blocks of code/data (for instance, CSV, TXT, SQL files, docblocks etc) Only print the next 10 lines after each 404 code read from an apache log file or print all docblocks and function headers • complex find & replace • mutation of large datasets woensdag 25 april 12
  30. Sed & Awk - http://joind.in/2489 When use SED? Don’t use

    sed when: Use sed when: woensdag 25 april 12
  31. Sed & Awk - http://joind.in/2489 When use SED? Don’t use

    sed when: • When you need to change one or two items • When you need aggregation or variables Use sed when: • When you need to change hundreds or thousands of files • “Complex” mutations • Fast “one liners” in scripts woensdag 25 april 12
  32. Sed & Awk - http://joind.in/2489 SED sed ‘s/foo/bar/g’ old >

    new changes ‘foo’ into ‘bar’ throughout the file ‘old’ and places output into file ‘new’ Most common example: woensdag 25 april 12
  33. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed 's/foo/bar/g' foo.txt bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar woensdag 25 april 12
  34. Sed & Awk - http://joind.in/2489 SED sed ‘s/foo//g’ old >

    new Another common example: deletes ‘foo’ throughout the file ‘old’ and places output into file ‘new’ woensdag 25 april 12
  35. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed 's/foo//g' foo.txt bar bar bar bar bar bar bar woensdag 25 april 12
  36. Sed & Awk - http://joind.in/2489 SED sed ‘s/foo/FOO/2’ old >

    new A bit more advanced: changes the second ‘foo’ on each line into ‘FOO’ woensdag 25 april 12
  37. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed 's/foo/FOO/2' foo.txt foo bar FOO bar foo bar foo bar FOO bar bar foo foo FOO bar woensdag 25 april 12
  38. Sed & Awk - http://joind.in/2489 SED changes all ‘foo’s to

    ‘bar’s on lines 1 to 3 Sed can use address ranges: sed ‘1,3 s/foo/bar/g’ file woensdag 25 april 12
  39. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 s/foo/bar/g' foo.txt bar bar bar bar bar bar bar bar bar bar bar foo foo foo bar woensdag 25 april 12
  40. Sed & Awk - http://joind.in/2489 SED changes all ‘foo’s to

    ‘bar’s on lines 1 to the first empty line But you can also use a regex: sed ‘1,/^$/ s/foo/bar/g’ file woensdag 25 april 12
  41. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,/^$/ s/foo/bar/g' foo.txt bar bar bar bar bar bar foo bar foo bar bar foo foo foo bar woensdag 25 april 12
  42. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '/^$/,$ s/foo/bar/g' foo.txt foo bar foo bar foo bar bar bar bar bar bar bar bar bar bar woensdag 25 april 12
  43. Sed & Awk - http://joind.in/2489 SED changes all ‘foo’s to

    ‘bar’s on every line except lines 1 to 3 A ! negates the address range: sed ‘1,3 ! s/foo/bar/g’ file woensdag 25 april 12
  44. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 ! s/foo/bar/g' foo.txt foo bar foo bar foo bar foo bar foo bar bar bar bar bar bar woensdag 25 april 12
  45. Sed & Awk - http://joind.in/2489 SED for line 1 to

    3: change foo’s into bar’s and prepend ‘Line’ to the line Multiple commands per range: sed ‘1,3 { s/foo/bar/g ; s/.*/Line &/ ; }’ file sed ‘1,3 { s/foo/bar/g s/.*/Line &/ }’ file woensdag 25 april 12
  46. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 { s/foo/bar/g ; s/.*/Line &/ ; }' foo.txt Line bar bar bar Line bar bar bar Line bar bar bar bar bar foo foo foo bar woensdag 25 april 12
  47. Sed & Awk - http://joind.in/2489 SED on line 1 to

    3: change foo’s into bar’s on line 5 to 7: change bar’s into foo’s on all lines: add ‘Line’ in front of the line Multiple ranges: sed ‘1,3 s/foo/bar/g 5,7 s/bar/foo/g s/(.*)/Line \1/’ file woensdag 25 april 12
  48. Sed & Awk - http://joind.in/2489 SED $ cat foo.txt foo

    bar foo bar foo bar foo bar foo bar bar foo foo foo bar $ sed '1,3 s/foo/bar/g ; 5,7 s/bar/foo/g ; s/.*/Line: &/' foo.txt Line: bar bar bar Line: bar bar bar Line: bar bar bar Line: bar bar foo Line: foo foo foo woensdag 25 april 12
  49. Sed & Awk - http://joind.in/2489 SED -n means ‘don’t print’

    lines sed -n ‘ /^cut/ q 1,3 { s/foo/bar/g ; p ; } 4,$ { s/bar/foo/g ; p ; } ’ file woensdag 25 april 12
  50. Sed & Awk - http://joind.in/2489 sed -n ‘ /^cut/ q

    1,3 { s/foo/bar/g ; p ; } 4,$ { s/bar/foo/g ; p ; } ’ file SED if a line starts with ‘cut’, end processing woensdag 25 april 12
  51. Sed & Awk - http://joind.in/2489 sed -n ‘ /^cut/ q

    1,3 { s/foo/bar/g ; p ; } 4,$ { s/bar/foo/g ; p ; } ’ file SED line 1 to 3 will replace ‘foo’ to ‘bar’ and print the line to output woensdag 25 april 12
  52. Sed & Awk - http://joind.in/2489 sed -n ‘ /^cut/ q

    1,3 { s/foo/bar/g ; p ; } 4,$ { s/bar/foo/g ; p ; } ’ file SED line 4 to the end will replace ‘bar’ to ‘foo’ and print the line to output woensdag 25 april 12
  53. Sed & Awk - http://joind.in/2489 SED $ cat file.txt foo

    bar foo bar bar bar foo foo foo bar foo bar bar bar foo foo foo bar foo bar foo bar foo bar bar bar foo foo bar bar foo foo cut this line is not added $ sed -n '/^cut/ q ; 1,3 { s/foo/bar/g ; p ; } ; 4,$ { s/bar/foo/g ; p ; }' file.txt bar bar bar bar bar bar bar bar bar bar bar bar foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo woensdag 25 april 12
  54. Sed & Awk - http://joind.in/2489 SED commands a s y

    d n N p q r # = append ‘text’ to output substitute data transform data (like ‘tr’) delete line (don’t print) (print) and goto next line (print) and add next line to pattern space print pattern space quit processing copy contents of file to pattern space comment prints current line number woensdag 25 april 12
  55. Sed & Awk - http://joind.in/2489 Commenting out sed '/^\[testing/,/^\[/ s/.*/#

    --- &/' application.ini [production] ... bootstrap.path = APPLICATION_PATH "/Bootstrap.php" bootstrap.class = "Bootstrap" resources.frontController.controllerDirectory = APPLICATION_PATH "/controllers" [staging : production] [testing : production] phpSettings.display_startup_errors = 1 phpSettings.display_errors = 1 [development : production] phpSettings.display_startup_errors = 1 ... woensdag 25 april 12
  56. Sed & Awk - http://joind.in/2489 Commenting out sed '/^\[testing/,/^\[/ s/.*/#

    --- &/' application.ini [production] ... bootstrap.path = APPLICATION_PATH "/Bootstrap.php" bootstrap.class = "Bootstrap" resources.frontController.controllerDirectory = APPLICATION_PATH "/controllers" [staging : production] # --- [testing : production] # --- phpSettings.display_startup_errors = 1 # --- phpSettings.display_errors = 1 # --- # ---[development : production] phpSettings.display_startup_errors = 1 ... woensdag 25 april 12
  57. Sed & Awk - http://joind.in/2489 Commenting out sed '/^\[testing/,/^\[/ {

    /^\[/ b /^$/ b s/.*/# --- &/ }' application.ini [production] ... bootstrap.path = APPLICATION_PATH "/Bootstrap.php" bootstrap.class = "Bootstrap" resources.frontController.controllerDirectory = APPLICATION_PATH "/controllers" [staging : production] [testing : production] # --- phpSettings.display_startup_errors = 1 # --- phpSettings.display_errors = 1 [development : production] phpSettings.display_startup_errors = 1 ... woensdag 25 april 12
  58. Sed & Awk - http://joind.in/2489 SED flow control b <label>

    : <label> t <label> Unconditionally branch to <label> Set <label> Conditionally branch to <label> woensdag 25 april 12
  59. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  60. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  61. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  62. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  63. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do re \ sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  64. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do re \ solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  65. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re \ mi \ do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  66. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi \ do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  67. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt do re mi \ solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  68. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi \ fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  69. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  70. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  71. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  72. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt re mi fa do solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  73. Sed & Awk - http://joind.in/2489 SED flow control do re

    \ mi \ fa so \ la \ si do sed ' :loop /\\$/ N s/\\\n */ / t loop ' solfege.txt so \ do re mi fa solfege.txt sed script Pattern buffer Output woensdag 25 april 12
  74. Sed & Awk - http://joind.in/2489 Sed not powerful? sed '/regex/d'

    filename sed -n '1 ! G ; h ; $ p' filename sed 's/ *$//' filename Delete all lines containing ‘regex’ Reverse all lines in a file (makes use of the hold buffer) Remove all additional white spaces woensdag 25 april 12
  75. Sed & Awk - http://joind.in/2489 Sed not powerful? http://uuner.doslash.org/forfun/ Meet

    Sedtris: A fully functional Tetris clone written in SED woensdag 25 april 12
  76. Sed & Awk - http://joind.in/2489 So use sed when: •

    Repetitive work on many files woensdag 25 april 12
  77. Sed & Awk - http://joind.in/2489 So use sed when: •

    Repetitive work on many files • “complex” mutations woensdag 25 april 12
  78. Sed & Awk - http://joind.in/2489 So use sed when: •

    Repetitive work on many files • “complex” mutations • Fast “oneliners” in scripts etc.. woensdag 25 april 12
  79. Sed & Awk - http://joind.in/2489 AWK • AWK is a

    full-fledged programming language. woensdag 25 april 12
  80. Sed & Awk - http://joind.in/2489 AWK • AWK is a

    full-fledged programming language. • There is NO way I can teach you AWK in +- 20 minutes woensdag 25 april 12
  81. Sed & Awk - http://joind.in/2489 AWK • AWK is a

    full-fledged programming language. • There is NO way I can teach you AWK in +- 20 minutes • But i’ll try... woensdag 25 april 12
  82. Sed & Awk - http://joind.in/2489 AWK • Alfred V. Aho,

    Peter J. Weinberger, Brain W. Kernighan woensdag 25 april 12
  83. Sed & Awk - http://joind.in/2489 AWK • Alfred V. Aho,

    Peter J. Weinberger, Brain W. Kernighan • Written in 1977 at AT&T Bell Laboratories woensdag 25 april 12
  84. Sed & Awk - http://joind.in/2489 AWK • Alfred V. Aho,

    Peter J. Weinberger, Brain W. Kernighan • Written in 1977 at AT&T Bell Laboratories • Multiple versions: AWK, NAWK, GAWK, MAWK and more... woensdag 25 april 12
  85. Sed & Awk - http://joind.in/2489 AWK • Alfred V. Aho,

    Peter J. Weinberger, Brain W. Kernighan • Written in 1977 at AT&T Bell Laboratories • Multiple versions: AWK, NAWK, GAWK, MAWK and more... • Pattern-directed scanning and processing language... woensdag 25 april 12
  86. Sed & Awk - http://joind.in/2489 AWK • [condition] { actions

    } • 2 special “patterns” : BEGIN and END woensdag 25 april 12
  87. Sed & Awk - http://joind.in/2489 Simple AWK $ cat solfege.txt

    do re mi fa sol la ti do $ awk ' BEGIN { print "start" } /o/ { print "I just saw an o in " $0 } END { print "the end" }' solfege.txt start I just saw an o in do I just saw an o in sol I just saw an o in do the end woensdag 25 april 12
  88. Sed & Awk - http://joind.in/2489 Apache logfile (combined) 72.30.161.230 -

    - [18/Jan/2011:20:28:09 +0100] "GET /robots.txt HTTP/1.0" 200 387 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/ slurp)" 72.30.161.230 - - [18/Jan/2011:20:28:10 +0100] "GET / HTTP/1.0" 200 7235 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/ slurp)" Awk processes through “records” and “fields” you can control the record and field separators woensdag 25 april 12
  89. Sed & Awk - http://joind.in/2489 Apache logfile (combined) 72.30.161.230 -

    - [18/Jan/2011:20:28:09 +0100] "GET /robots.txt HTTP/1.0" 200 387 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" Field separator “space” 72.30.161.230 - - [18/Jan/2011:20:28:10 +0100] "GET / HTTP/1.0" 200 7235 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" woensdag 25 april 12
  90. Sed & Awk - http://joind.in/2489 Apache logfile (combined) 72.30.161.230 -

    - [18/Jan/2011:20:28:09 +0100] GET /robots.txt HTTP/1.0 200 387 - Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Field separator “double quote” (“) 72.30.161.230 - - [18/Jan/2011:20:28:10 +0100] "GET / HTTP/1.0" 200 7235 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" woensdag 25 april 12
  91. Sed & Awk - http://joind.in/2489 Some global AWK knowledge •

    You can set the field and record separator woensdag 25 april 12
  92. Sed & Awk - http://joind.in/2489 Some global AWK knowledge •

    You can set the field and record separator • $FS=”|”; $RS=”\t” woensdag 25 april 12
  93. Sed & Awk - http://joind.in/2489 Some global AWK knowledge •

    You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) woensdag 25 april 12
  94. Sed & Awk - http://joind.in/2489 Some global AWK knowledge •

    You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) • $1 holds first field, $2 second field etc... woensdag 25 april 12
  95. Sed & Awk - http://joind.in/2489 Some global AWK knowledge •

    You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) • $1 holds first field, $2 second field etc... • $NF holds number of fields in record woensdag 25 april 12
  96. Sed & Awk - http://joind.in/2489 Some global AWK knowledge •

    You can set the field and record separator • $FS=”|”; $RS=”\t” • $0 holds the complete record (line) • $1 holds first field, $2 second field etc... • $NF holds number of fields in record • $NR holds CURRENT record woensdag 25 april 12
  97. Sed & Awk - http://joind.in/2489 Apache logfile (combined) $ awk

    -F\" '{ print $6 }' apache.log Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) NS8/0.9.6 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) NS8/0.9.6 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Print the “user agents” from the logfile woensdag 25 april 12
  98. Sed & Awk - http://joind.in/2489 Apache logfile (combined) $ awk

    -F\" '{ print $6 }' apache.log | sort | uniq -c 2 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) NS8/0.9.6 1 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 7 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Print the “user agents” from the logfile and count them (through external tools) woensdag 25 april 12
  99. Sed & Awk - http://joind.in/2489 Apache logfile (combined) $ awk

    -F\ '{ totals[$9] += $10; } END { for (i in totals) { printf "%d : %d bytes\n", i, totals[i]; } }' apache.log 200 : 26197250 bytes 206 : 180578 bytes 301 : 31072 bytes 302 : 2991 bytes 304 : 44715 bytes 404 : 82866 bytes 500 : 361783 bytes Print the total bytes send out per status code woensdag 25 april 12
  100. Sed & Awk - http://joind.in/2489 Apache logfile (combined) $ awk

    -F\ '$9 ~ /4[0-9][0-9]/ { FS="\""; $0=$0; print $6; FS=" " }' apache.log Googlebot/2.1 (+http://www.googlebot.com/bot.html) Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 libwww-perl/5.805 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Googlebot/2.1 (+http://www.googlebot.com/bot.html) Googlebot/2.1 (+http://www.googlebot.com/bot.html) Googlebot/2.1 (+http://www.googlebot.com/bot.html) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Print the “user agents” from the logfile who triggered a 4xx code woensdag 25 april 12
  101. Sed & Awk - http://joind.in/2489 Apache logfile (combined) <?php $hash

    = array(); foreach( file( './apache.log' ) as $line ) { list( , , , , , , , , $status, $bytes ) = explode( ' ', $line ); if( !isset( $hash[$status] ) ) { $hash[$status] = 0; } $hash[$status] += $bytes; } print_r($hash); Awk one liner compared to PHP: credits to @RichardJ #pfz channel Not a whole lot different, but already more complex and this was just a simple example... $ awk -F\ '{ totals[$9] += $10; } END { for (i in totals) { printf "%d : %d bytes\n", i, totals[i]; } }' apache.log woensdag 25 april 12
  102. Sed & Awk - http://joind.in/2489 Apache logfile (combined) <?php $stdin

    = fopen("php://stdin", "r"); while (!feof($stdin)) { $line = fgets($stdin); if (preg_match("/^.o/", $line)) continue; print $line; } ?> Sed one liner compared to PHP: credits to @RichardJ #pfz channel Much more work.... sed ‘/^.o/d’ file woensdag 25 april 12
  103. Sed & Awk - http://joind.in/2489 Practical uses for a (PHP)

    developer • parse php-errors files, syslog files, apache’s http access logs. • Conversion of files you get from your customers, who always assume you can do magic with a gazzillion GB’s of (unsorted) data (and now you can). woensdag 25 april 12
  104. Sed & Awk - http://joind.in/2489 In conclusion: sed & awk

    • are powerful for simple one-liners but can also be used for complex programs woensdag 25 april 12
  105. Sed & Awk - http://joind.in/2489 In conclusion: sed & awk

    • are powerful for simple one-liners but can also be used for complex programs • integrates perfectly with other (unix) tools like uniq, sort, cut, find, grep, cat, etc... woensdag 25 april 12
  106. Sed & Awk - http://joind.in/2489 In conclusion: sed & awk

    • are powerful for simple one-liners but can also be used for complex programs • integrates perfectly with other (unix) tools like uniq, sort, cut, find, grep, cat, etc... • are a great way to automate complex and/ or repetitive (editing) tasks woensdag 25 april 12
  107. Sed & Awk - http://joind.in/2489 In conclusion • Look outside

    your comfort zone for other (better) tools. http://files.sharenator.com/slender_loris_Worlds_strangest_looking_animals-s300x451-2279-580.jpg woensdag 25 april 12
  108. Sed & Awk - http://joind.in/2489 In conclusion • Look outside

    your comfort zone for other (better) tools. • Can you think of examples where you would use Sed or Awk (instead of php?) http://files.sharenator.com/slender_loris_Worlds_strangest_looking_animals-s300x451-2279-580.jpg woensdag 25 april 12
  109. Sed & Awk - http://joind.in/2489 Thank you for your attention!

    Don’t forget to rate my talk on joind.in http://joind.in/2489 http://farm5.static.flickr.com/4078/4790219776_2fe3c9af95_b.jpg woensdag 25 april 12