Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data at FanDuel

John S.
September 30, 2014

Data at FanDuel

How we use AWS’s ElasticMapReduce, and Redshift at FanDuel.

John S.

September 30, 2014
Tweet

More Decks by John S.

Other Decks in Programming

Transcript

  1. 3

  2. 5

  3. 6

  4. 7

  5. 8

  6. 9

  7. SELECT `user_id`, `sport`, TO_DATE(`date`) AS `d`, SUM(IF(`fee` > 0, 0,

    1)) AS `free_plays`, SUM(IF(`fee` > 0, 1, 0)) AS `paid_plays` FROM `entry` GROUP BY `user_id`, `sport`, `d` ; 11
  8. 19

  9. 25 • Rows stored sequentially on disk • Find disk

    location from index • Seek to location, retrieve full row Row–orientated Columnar • Columns stored separately on disk • Read full column • Compression (run–length) is easier with columns of a single type
  10. 26 SELECT * FROM user WHERE id = 21803; Row–orientated

    Columnar SELECT AVG(amount) FROM deposits WHERE completed = today;
  11. 28

  12. 30 SELECT MIN(c), MAX(c) FROM user; ! SELECT MIN(c), MAX(c)

    FROM user WHERE c > previous_max_c; ! SELECT MIN(c), MAX(c) FROM user WHERE updated_at > previous_max_updated_at;
  13. hive> SELECT COUNT(*) FROM action_log; Total MapReduce jobs = 1

    Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_1410783831919_0150, Tracking URL = http://172.18.0.152:9046/ proxy/application_1410783831919_0150/ Kill Command = /home/hadoop/bin/hadoop job -kill job_1410783831919_0150 Hadoop job information for Stage-1: number of mappers: 17; number of reducers: 1 2014-09-23 16:42:12,527 Stage-1 map = 0%, reduce = 0% 2014-09-23 16:42:16,648 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 19.14 sec 2014-09-23 16:42:17,746 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 51.65 sec 2014-09-23 16:42:18,783 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 51.65 sec 2014-09-23 16:42:19,818 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 53.22 sec MapReduce Total cumulative CPU time: 53 seconds 220 msec Ended Job = job_1410783831919_0150 Counters: MapReduce Jobs Launched: Job 0: Map: 17 Reduce: 1 Cumulative CPU: 53.22 sec HDFS Read: 4459072001 HDFS Write: 8 SUCCESS Total MapReduce CPU Time Spent: 53 seconds 220 msec OK Lots Time taken: 20.462 seconds, Fetched: 1 row(s) 35
  14. 51