Slide 1

Slide 1 text

Robust Causal Transfer Learning Identifying Causal Invariances Pooyan Jamshidi UofSC @pooyanjamshidi

Slide 2

Slide 2 text

Artificial Intelligence and Systems Laboratory (AISys Lab) Machine Learning Computer Systems Autonomy Learning-enabled Autonomous Systems https://pooyanjamshidi.github.io/AISys/ 2

Slide 3

Slide 3 text

Research Directions at AISys 3 Theory:
 - Transfer Learning
 - Causal Invariances
 - Structure Learning
 - Concept Learning
 - Physics-Informed
 
 Applications:
 - Systems
 - Autonomy
 - Robotics Well-known Physics Big Data Limited known Physics Small Data Causal AI

Slide 4

Slide 4 text

Today’s most popular systems are configurable 4 built

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

Systems are becoming increasingly more configurable and, therefore, more difficult to understand their performance behavior 6 010 7/2012 7/2014 e time 1/1999 1/2003 1/2007 1/2011 0 1/2014 Release time 006 1/2010 1/2014 2.2.14 2.3.4 35 se time ache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Number of parameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15] 2180 218 = 2162 Increase in size of configuration space

Slide 7

Slide 7 text

Empirical observations confirm that systems are becoming increasingly more configurable 7 nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu kar.Pasupathy, Rukma.Talwadker}@netapp.com prevalent, but also severely software. One fundamental y of configuration, reflected parameters (“knobs”). With m software to ensure high re- aunting, error-prone task. nderstanding a fundamental users really need so many answer, we study the con- including thousands of cus- m (Storage-A), and hundreds ce system software projects. ng findings to motivate soft- ore cautious and disciplined these findings, we provide ich can significantly reduce A as an example, the guide- ters and simplify 19.7% of on existing users. Also, we tion methods in the context 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Number of parameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Number of parameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Number of parameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Number of parameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]

Slide 8

Slide 8 text

– HDFS-4304 “all constants should be configurable, even if we can’t see any reason to configure them.”

Slide 9

Slide 9 text

Configurations determine the performance behavior 9 void Parrot_setenv(. . . name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy

Slide 10

Slide 10 text

Configuration options interact, therefore, creating a non-linear and complex performance behavior • Non-linear • Non-convex • Multi-modal 10 number of counters number of splitters latency (ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 2 4 3 6 8 4 10 12 5 14 16 6 18

Slide 11

Slide 11 text

11 How do we understand performance behavior of real-world highly-configurable systems that scale well… … and enable developers/users to reason about qualities (performance, energy) and to make tradeoff?

Slide 12

Slide 12 text

SocialSensor as a case study to motivate configuration optimization •Identifying trending topics •Identifying user defined topics •Social media search 12

Slide 13

Slide 13 text

SocialSensor is a data processing pipeline 13 Content Analysis Orchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items Fetch Internet

Slide 14

Slide 14 text

They expected to see an increase in their user base significantly over a short period 14 Content Analysis Orchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Store Push Store Crawled items Fetch Internet 100X 10X Real time Fetch

Slide 15

Slide 15 text

15 How can we gain a better performance without using more resources?

Slide 16

Slide 16 text

16 Let’s try out different system configurations!

Slide 17

Slide 17 text

Opportunity: Data processing engines in the pipeline were all configurable > 100 > 100 > 100 2300

Slide 18

Slide 18 text

More combinations than estimated atoms in the universe

Slide 19

Slide 19 text

0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000 4000 5000 Average write latency ( s) The default configuration is typically bad and the optimal configuration is noticeably better than median 19 Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst • Noticeably faster than median

Slide 20

Slide 20 text

100X more user cloud resources reduced 20% outperform expert recommendation

Slide 21

Slide 21 text

Identifying the root cause of performance faults is difficult ● Code was transplanted from TX1 to TX2 ● TX2 is more powerful, but software was 2x slower than TX1 ● Three misconfigurations: ○ Wrong compilation flags for compiling CUDA (didn't use 'dynamic' flag) ○ Wrong CPU/GPU modes (didn't use TX2 optimized cores) ○ Wrong Fan mode (didn't change to handle thermal throttling) Fig 1. Performance fault on NVIDIA TX2 https://forums.developer.nvidia.com/t/50477 21

Slide 22

Slide 22 text

Fixing performance faults is difficult ● These were not in the default settings ● Took 1 month to fix in the end... ● We need to do this better Fig 1. Performance fault on NVIDIA TX2 https://forums.developer.nvidia.com/t/50477 22

Slide 23

Slide 23 text

Performance distributions are multi-modal and have long tails • Certain configurations can cause performance to take abnormally large values
 • Faulty configurations take the tail values (worse than 99.99th percentile)
 • Certain configurations can cause faults on multiple performance objectives. 
 23

Slide 24

Slide 24 text

Debugging based on statistical correlation could be misleading GPU Growth = 33% GPU Growth = 66% Swap = 1Gb Swap = 4Gb Swap Mem Swap Mem Swap Mem GPU Growth GPU Growth GPU Growth Latency Latency Latency GPU Growth Swap Mem Latency Latency Latency Latency ● Correlation between GPU Growth and Latency is as strong as Swap Mem and Latency, but considerably less noisy. ● Therefore, a feature selection method based on correlation while ignoring the causal structure prefer GPU Growth as the predictor for Latency which is misleading. 24

Slide 25

Slide 25 text

Why knowing about the underlying causal structure matters A transfer learning scenario • The relation between X1 and X2 is about equally strong as the relation between X2 and X3, but more noisy. • {X3} and {X1, X3} are preferred over {X1}, because predicting Y from X1 leads to: • A larger variance than predicting Y from X3 • A larger bias than predicting Y from both X1 and X3. 25 Magliacane, Sara, et al. "Domain adaptation by using causal inference to predict invariant conditional distributions." Advances in Neural Information Processing Systems. 2018.

Slide 26

Slide 26 text

CAUPER: Causal Performance Debugger localizes and repairs performance faults approx. 50 samples 26

Slide 27

Slide 27 text

CAUPER is centered around causal structure discovery 27

Slide 28

Slide 28 text

● We measure the Individual Treatment Effect of each repair: ● The difference between the probability that the performance fault is fixed after a repair and the probability that the performance fault is still faulty after a repair . ● Larger the value, more likely we are to repair the fault. ● We pick the repair with the largest ITE. CAUPER iteratively explore potential performance repairs 28

Slide 29

Slide 29 text

CAUPER is able to find more accurate causes comparing with an statistical debugging 29

Slide 30

Slide 30 text

CAUPER is able to find comparable and even better repairs for performance faults comparing with performance optimization 30 X5k X10k X20k X50k Workload 0.4 0.5 0.6 0.7 0.8 0.9 Latency-Gain CAUPER SMAC X5k X10k X20k X50k Workload ≠0.2 ≠0.1 0.0 0.1 0.2 0.3 Heat-Gain CAUPER SMAC X5k X10k X20k X50k Workload ≠1.0 ≠0.5 0.0 0.5 Energy-Gain CAUPER SMAC

Slide 31

Slide 31 text

CAUPER is able to find repairs with lower costs 31 X5k X10k X20k X50k Workload 0 2500 5000 7500 10000 12500 15000 Time CAUPER SMAC

Slide 32

Slide 32 text

Team effort 32 Rahul Krishna Columbia Shahriar Iqbal UofSC M. A. Javidian Purdue Baishakhi Ray Columbia Christian Kästner CMU Norbert Siegmund Leipzig Miguel Velez CMU Sven Apel Saarland Lars Kotthoff Wyoming Marco Valtorta UofSC

Slide 33

Slide 33 text

33