Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My !!con talk

My !!con talk

Sasha Laundy

May 17, 2015
Tweet

More Decks by Sasha Laundy

Other Decks in Technology

Transcript

  1. VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;

    ! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
  2. I get this for FREE! • Mappin’ & reducin’ •

    HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
  3. ??

  4. in my PIGSCRIPTS I had to worry about a spinning

    METAL PLATTER somewhere in VIRGINIA!!!!
  5. • Various schema? MONGO • Fast search? ELASTICSEARCH • Keep

    history? DATOMIC • Want very fast analytics queries? REDSHIFT.
  6. Redshift has lots more… • NODES so you can compute

    in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
  7. Really fast! …how fast? • 21,454,134 rows • COUNT(*) •

    Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
  8. 376x isn’t cool. You know what’s cool? 100,000x Instead of

    native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x