Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Command line & Data Science

Yatish Mehta
October 28, 2014
54

Command line & Data Science

Yatish Mehta

October 28, 2014
Tweet

Transcript

  1. • pip install csvkit • cat leads.csv | csvlook •

    csvstat leads.csv • csvgrep -c 6 -m samplecompany.com | csvlook 1.csvkit
  2. 2. grep,sed,sort,uniq • cat wiki.txt | grep -oE '\w+' |

    tee words • < words grep '^a' | sort | uniq -c | sort -r • sed ’s/data/tata/g’ wiki.txt > wiki2.txt
  3. • brew install jq • < data.json jq ‘.[]’ •

    < data.json jq ‘.[] | select(.age>22)’ • cat data.json | jq '.[] | {isActive: ._id, name: .name}' 3. jq JSON processor
  4. 4. qstats • qstats one_hundred_milion.dat Min.        

     44.947   1st  Qu.    93.2553   Median      100.001   Mean          100.001   3rd  Qu.    106.747   Max.          156.997   Range        112.05   Std  Dev.  10.0002   Length      100000000   • Faster than awk, sort, R