Robust Causal Transfer Learning

Robust Causal Transfer Learning Identifying Causal Invariances Pooyan Jamshidi UofSC
@pooyanjamshidi

Artiﬁcial Intelligence and Systems Laboratory (AISys Lab) Machine Learning Computer
Systems Autonomy Learning-enabled Autonomous Systems https://pooyanjamshidi.github.io/AISys/ 2

Research Directions at AISys 3 Theory:  - Transfer Learning  -
Causal Invariances  - Structure Learning  - Concept Learning  - Physics-Informed    Applications:  - Systems  - Autonomy  - Robotics Well-known Physics Big Data Limited known Physics Small Data Causal AI

Today’s most popular systems are conﬁgurable 4 built

Systems are becoming increasingly more configurable and, therefore, more difficult
to understand their performance behavior 6 010 7/2012 7/2014 e time 1/1999 1/2003 1/2007 1/2011 0 1/2014 Release time 006 1/2010 1/2014 2.2.14 2.3.4 35 se time ache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Number of parameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15] 2180 218 = 2162 Increase in size of configuration space

Empirical observations confirm that systems are becoming increasingly more configurable
7 nia San Diego, ‡Huazhong Univ. of Science & Technology, †NetApp, Inc tixu, longjin, xuf001, yyzhou}@cs.ucsd.edu kar.Pasupathy, Rukma.Talwadker}@netapp.com prevalent, but also severely software. One fundamental y of configuration, reflected parameters (“knobs”). With m software to ensure high re- aunting, error-prone task. nderstanding a fundamental users really need so many answer, we study the con- including thousands of cus- m (Storage-A), and hundreds ce system software projects. ng findings to motivate soft- ore cautious and disciplined these findings, we provide ich can significantly reduce A as an example, the guide- ters and simplify 19.7% of on existing users. Also, we tion methods in the context 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 400 500 600 700 Storage-A Number of parameters Release time 1/1999 1/2003 1/2007 1/2011 0 100 200 300 400 500 5.6.2 5.5.0 5.0.16 5.1.3 4.1.0 4.0.12 3.23.0 1/2014 MySQL Number of parameters Release time 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Number of parameters Release time Apache 1/2006 1/2008 1/2010 1/2012 1/2014 0 40 80 120 160 200 2.0.0 1.0.0 0.19.0 0.1.0 Hadoop Number of parameters Release time MapReduce HDFS [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]

– HDFS-4304 “all constants should be conﬁgurable, even if we
can’t see any reason to conﬁgure them.”

Conﬁgurations determine the performance behavior 9 void Parrot_setenv(. . .
name,. . . value){ #ifdef PARROT_HAS_SETENV my_setenv(name, value, 1); #else int name_len=strlen(name); int val_len=strlen(value); char* envs=glob_env; if(envs==NULL){ return; } strcpy(envs,name); strcpy(envs+name_len,"="); strcpy(envs+name_len + 1,value); putenv(envs); #endif } #ifdef LINUX extern int Parrot_signbit(double x){ endif else PARROT_HAS_SETENV LINUX Speed Energy

Conﬁguration options interact, therefore, creating a non-linear and complex performance
behavior • Non-linear • Non-convex • Multi-modal 10 number of counters number of splitters latency (ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 2 4 3 6 8 4 10 12 5 14 16 6 18

11 How do we understand performance behavior of real-world highly-conﬁgurable
systems that scale well… … and enable developers/users to reason about qualities (performance, energy) and to make tradeoff?

SocialSensor as a case study to motivate conﬁguration optimization •Identifying
trending topics •Identifying user deﬁned topics •Social media search 12

SocialSensor is a data processing pipeline 13 Content Analysis Orchestrator
Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Fetch Store Push Store Crawled items Fetch Internet

They expected to see an increase in their user base
signiﬁcantly over a short period 14 Content Analysis Orchestrator Crawling Search and Integration Tweets: [5k-20k/min] Every 10 min: [100k tweets] Tweets: [10M] Store Push Store Crawled items Fetch Internet 100X 10X Real time Fetch

15 How can we gain a better performance without using
more resources?

16 Let’s try out diﬀerent system conﬁgurations!

Opportunity: Data processing engines in the pipeline were all conﬁgurable
> 100 > 100 > 100 2300

More combinations than estimated atoms in the universe

0 500 1000 1500 Throughput (ops/sec) 0 1000 2000 3000
4000 5000 Average write latency ( s) The default configuration is typically bad and the optimal configuration is noticeably better than median 19 Default Configuration Optimal Configuration better better • Default is bad • 2X-10X faster than worst • Noticeably faster than median

100X more user cloud resources reduced 20% outperform expert recommendation

Identifying the root cause of performance faults is difficult •
Code was transplanted from TX1 to TX2 • TX2 is more powerful, but software was 2x slower than TX1 • Three misconfigurations: ◦ Wrong compilation flags for compiling CUDA (didn't use 'dynamic' flag) ◦ Wrong CPU/GPU modes (didn't use TX2 optimized cores) ◦ Wrong Fan mode (didn't change to handle thermal throttling) Fig 1. Performance fault on NVIDIA TX2 https://forums.developer.nvidia.com/t/50477 21

Fixing performance faults is difficult • These were not in
the default settings • Took 1 month to fix in the end... • We need to do this better Fig 1. Performance fault on NVIDIA TX2 https://forums.developer.nvidia.com/t/50477 22

Performance distributions are multi-modal and have long tails • Certain
configurations can cause performance to take abnormally large values  • Faulty configurations take the tail values (worse than 99.99th percentile)  • Certain configurations can cause faults on multiple performance objectives.   23

Debugging based on statistical correlation could be misleading GPU Growth
= 33% GPU Growth = 66% Swap = 1Gb Swap = 4Gb Swap Mem Swap Mem Swap Mem GPU Growth GPU Growth GPU Growth Latency Latency Latency GPU Growth Swap Mem Latency Latency Latency Latency • Correlation between GPU Growth and Latency is as strong as Swap Mem and Latency, but considerably less noisy. • Therefore, a feature selection method based on correlation while ignoring the causal structure prefer GPU Growth as the predictor for Latency which is misleading. 24

Why knowing about the underlying causal structure matters A transfer
learning scenario • The relation between X1 and X2 is about equally strong as the relation between X2 and X3, but more noisy. • {X3} and {X1, X3} are preferred over {X1}, because predicting Y from X1 leads to: • A larger variance than predicting Y from X3 • A larger bias than predicting Y from both X1 and X3. 25 Magliacane, Sara, et al. "Domain adaptation by using causal inference to predict invariant conditional distributions." Advances in Neural Information Processing Systems. 2018.

CAUPER: Causal Performance Debugger localizes and repairs performance faults approx.
50 samples 26

CAUPER is centered around causal structure discovery 27

• We measure the Individual Treatment Effect of each repair:
• The difference between the probability that the performance fault is fixed after a repair and the probability that the performance fault is still faulty after a repair . • Larger the value, more likely we are to repair the fault. • We pick the repair with the largest ITE. CAUPER iteratively explore potential performance repairs 28

CAUPER is able to find more accurate causes comparing with
an statistical debugging 29

CAUPER is able to find comparable and even better repairs
for performance faults comparing with performance optimization 30 X5k X10k X20k X50k Workload 0.4 0.5 0.6 0.7 0.8 0.9 Latency-Gain CAUPER SMAC X5k X10k X20k X50k Workload ≠0.2 ≠0.1 0.0 0.1 0.2 0.3 Heat-Gain CAUPER SMAC X5k X10k X20k X50k Workload ≠1.0 ≠0.5 0.0 0.5 Energy-Gain CAUPER SMAC

CAUPER is able to find repairs with lower costs 31
X5k X10k X20k X50k Workload 0 2500 5000 7500 10000 12500 15000 Time CAUPER SMAC

Team effort 32 Rahul Krishna Columbia Shahriar Iqbal UofSC M.
A. Javidian Purdue Baishakhi Ray Columbia Christian Kästner CMU Norbert Siegmund Leipzig Miguel Velez CMU Sven Apel Saarland Lars Kotthoff Wyoming Marco Valtorta UofSC

Robust Causal Transfer Learning

Robust Causal Transfer Learning

Pooyan Jamshidi

More Decks by Pooyan Jamshidi

Other Decks in Technology

Featured

Transcript