Pro Yearly is on sale from $80 to $50! »

Robust Causal Transfer Learning

Robust Causal Transfer Learning

79fc9094f8a58c94e1c0e9f7f25fc7d5?s=128

Pooyan Jamshidi

September 25, 2020
Tweet

Transcript

  1. Robust Causal Transfer Learning Identifying Causal Invariances Pooyan Jamshidi UofSC

    @pooyanjamshidi
  2. Artificial Intelligence and Systems Laboratory (AISys Lab) Machine Learning Computer

    Systems Autonomy Learning-enabled Autonomous Systems https://pooyanjamshidi.github.io/AISys/ 2
  3. Research Directions at AISys 3 Theory:
 - Transfer Learning
 -

    Causal Invariances
 - Structure Learning
 - Concept Learning
 - Physics-Informed
 
 Applications:
 - Systems
 - Autonomy
 - Robotics Well-known Physics Big Data Limited known Physics Small Data Causal AI
  4. Today’s most popular systems are configurable 4 built

  5. 5

  6. Systems are becoming increasingly more configurable and, therefore, more difficult

    to understand their performance behavior 6 010 7/2012 7/2014 e time 1/1999 1/2003 1/2007 1/2011 0 1/2014 Release time 006 1/2010 1/2014 2.2.14 2.3.4 35 se time ache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Number of parameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15] 2180 218 = 2162 Increase in size of configuration space
  7. Empirical observations confirm that systems are becoming increasingly more configurable

    7 nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu kar.Pasupathy, Rukma.Talwadker}@netapp.com prevalent, but also severely software. One fundamental y of configuration, reflected parameters (“knobs”). With m software to ensure high re- aunting, error-prone task. nderstanding a fundamental users really need so many answer, we study the con- including thousands of cus- m (Storage-A), and hundreds ce system software projects. ng findings to motivate soft- ore cautious and disciplined these findings, we provide ich can significantly reduce A as an example, the guide- ters and simplify 19.7% of on existing users. Also, we tion methods in the context 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Number of parameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Number of parameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Number of parameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Number of parameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
  8. – HDFS-4304 “all constants should be configurable, even if we

    can’t see any reason to configure them.”
  9. Configurations determine the performance behavior 9 void Parrot_setenv(. . .

    name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy
  10. Configuration options interact, therefore, creating a non-linear and complex performance

    behavior • Non-linear • Non-convex • Multi-modal 10 number of counters number of splitters latency (ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 2 4 3 6 8 4 10 12 5 14 16 6 18
  11. 11 How do we understand performance behavior of real-world highly-configurable

    systems that scale well… … and enable developers/users to reason about qualities (performance, energy) and to make tradeoff?
  12. SocialSensor as a case study to motivate configuration optimization •Identifying

    trending topics •Identifying user defined topics •Social media search 12
  13. SocialSensor is a data processing pipeline 13 Content Analysis Orchestrator

    Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items Fetch Internet
  14. They expected to see an increase in their user base

    significantly over a short period 14 Content Analysis Orchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Store Push Store Crawled items Fetch Internet 100X 10X Real time Fetch
  15. 15 How can we gain a better performance without using

    more resources?
  16. 16 Let’s try out different system configurations!

  17. Opportunity: Data processing engines in the pipeline were all configurable

    > 100 > 100 > 100 2300
  18. More combinations than estimated atoms in the universe

  19. 0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000

    4000 5000 Average write latency ( s) The default configuration is typically bad and the optimal configuration is noticeably better than median 19 Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst • Noticeably faster than median
  20. 100X more user cloud resources reduced 20% outperform expert recommendation

  21. Identifying the root cause of performance faults is difficult •

    Code was transplanted from TX1 to TX2 • TX2 is more powerful, but software was 2x slower than TX1 • Three misconfigurations: ◦ Wrong compilation flags for compiling CUDA (didn't use 'dynamic' flag) ◦ Wrong CPU/GPU modes (didn't use TX2 optimized cores) ◦ Wrong Fan mode (didn't change to handle thermal throttling) Fig 1. Performance fault on NVIDIA TX2 https://forums.developer.nvidia.com/t/50477 21
  22. Fixing performance faults is difficult • These were not in

    the default settings • Took 1 month to fix in the end... • We need to do this better Fig 1. Performance fault on NVIDIA TX2 https://forums.developer.nvidia.com/t/50477 22
  23. Performance distributions are multi-modal and have long tails • Certain

    configurations can cause performance to take abnormally large values
 • Faulty configurations take the tail values (worse than 99.99th percentile)
 • Certain configurations can cause faults on multiple performance objectives. 
 23
  24. Debugging based on statistical correlation could be misleading GPU Growth

    = 33% GPU Growth = 66% Swap = 1Gb Swap = 4Gb Swap Mem Swap Mem Swap Mem GPU Growth GPU Growth GPU Growth Latency Latency Latency GPU Growth Swap Mem Latency Latency Latency Latency • Correlation between GPU Growth and Latency is as strong as Swap Mem and Latency, but considerably less noisy. • Therefore, a feature selection method based on correlation while ignoring the causal structure prefer GPU Growth as the predictor for Latency which is misleading. 24
  25. Why knowing about the underlying causal structure matters A transfer

    learning scenario • The relation between X1 and X2 is about equally strong as the relation between X2 and X3, but more noisy. • {X3} and {X1, X3} are preferred over {X1}, because predicting Y from X1 leads to: • A larger variance than predicting Y from X3 • A larger bias than predicting Y from both X1 and X3. 25 Magliacane, Sara, et al. "Domain adaptation by using causal inference to predict invariant conditional distributions." Advances in Neural Information Processing Systems. 2018.
  26. CAUPER: Causal Performance Debugger localizes and repairs performance faults approx.

    50 samples 26
  27. CAUPER is centered around causal structure discovery 27

  28. • We measure the Individual Treatment Effect of each repair:

    • The difference between the probability that the performance fault is fixed after a repair and the probability that the performance fault is still faulty after a repair . • Larger the value, more likely we are to repair the fault. • We pick the repair with the largest ITE. CAUPER iteratively explore potential performance repairs 28
  29. CAUPER is able to find more accurate causes comparing with

    an statistical debugging 29
  30. CAUPER is able to find comparable and even better repairs

    for performance faults comparing with performance optimization 30 X5k X10k X20k X50k Workload 0.4 0.5 0.6 0.7 0.8 0.9 Latency-Gain CAUPER SMAC X5k X10k X20k X50k Workload ≠0.2 ≠0.1 0.0 0.1 0.2 0.3 Heat-Gain CAUPER SMAC X5k X10k X20k X50k Workload ≠1.0 ≠0.5 0.0 0.5 Energy-Gain CAUPER SMAC
  31. CAUPER is able to find repairs with lower costs 31

    X5k X10k X20k X50k Workload 0 2500 5000 7500 10000 12500 15000 Time CAUPER SMAC
  32. Team effort 32 Rahul Krishna Columbia Shahriar Iqbal UofSC M.

    A. Javidian Purdue Baishakhi Ray Columbia Christian Kästner CMU Norbert Siegmund Leipzig Miguel Velez CMU Sven Apel Saarland Lars Kotthoff Wyoming Marco Valtorta UofSC
  33. 33