Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My !!con talk

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

My !!con talk

Avatar for Sasha Laundy

Sasha Laundy

May 17, 2015
Tweet

More Decks by Sasha Laundy

Other Decks in Technology

Transcript

  1. VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;

    ! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
  2. I get this for FREE! • Mappin’ & reducin’ •

    HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
  3. ??

  4. in my PIGSCRIPTS I had to worry about a spinning

    METAL PLATTER somewhere in VIRGINIA!!!!
  5. • Various schema? MONGO • Fast search? ELASTICSEARCH • Keep

    history? DATOMIC • Want very fast analytics queries? REDSHIFT.
  6. Redshift has lots more… • NODES so you can compute

    in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
  7. Really fast! …how fast? • 21,454,134 rows • COUNT(*) •

    Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
  8. 376x isn’t cool. You know what’s cool? 100,000x Instead of

    native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x