Slide 1

Slide 1 text

Robust, Performant Compiler Optimization Environments for AI Research Chris Cummins [email protected]

Slide 2

Slide 2 text

Chris Cummins Hugh Leather Yuandong Tian Benoit Steiner Jiadong Guo Bram Wasti Jia Liu Olivier Teytaud Jason Ansel Sahir Gomez Somya Jain Brandon Cui The Team at + many community contributors! https://compilergym.ai

Slide 3

Slide 3 text

Compiler wins benefit everyone - Faster binaries = Happier users πŸ˜€ - Lower power = $$$ saved πŸ’Έ Tuning compiler optimizations by hand is time consuming and hard Machine learning promises: - bigger wins - less effort Machine Learning in Compilers https://compilergym.ai https://chriscummins.cc/pub/2020-fdl.pdf

Slide 4

Slide 4 text

Program IR Features void LinearAlgebraOp::AnalyzeInputs( OpKernelContext* context, TensorInputs* inputs, TensorShapes* input_matrix_shapes, TensorShape* batch_shape) { int input_rank = -1; for (int i = 0; i < NumMatrixInputs(context); ++i) { const Tensor& in = context->input(i); if (i == 0) { input_rank = in.dims(); OP_REQUIRES( context, input_rank >= 2, errors::InvalidArgument( "Input tensor ", i, " must have rank >= 2")); (CFG, DFG, AST,...) #. instructions loop nest level arithmetic intensity trip counts Machine Learning in Compilers https://compilergym.ai

Slide 5

Slide 5 text

Features Best Param ... ... Machine Learning in Compilers https://compilergym.ai

Slide 6

Slide 6 text

Supervised Machine Learner Model Features Param Features Best Param ... ... Machine Learning in Compilers https://compilergym.ai

Slide 7

Slide 7 text

Model Model Model Features Param Features Param Features Param Machine Learning in Compilers https://compilergym.ai

Slide 8

Slide 8 text

Model Model Model Features Param Model Model Features Param Model Features Param Model Model Features Param Model Features Param Features Param New Program Features Predicted param Machine Learning in Compilers https://compilergym.ai

Slide 9

Slide 9 text

The Reality of Compiler Research https://compilergym.ai

Slide 10

Slide 10 text

2. Prepare the compiler 3. Download/build benchmark suites 5. Write ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! πŸ’‘ (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! πŸ“ The Reality of Compiler Research https://compilergym.ai

Slide 11

Slide 11 text

2. Prepare the compiler 3. Download/build benchmark suites 5. Write ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! πŸ’‘ (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! πŸ“ The Reality of Compiler Research https://compilergym.ai ???

Slide 12

Slide 12 text

2. Prepare the compiler 3. Download/build benchmark suites 5. Write ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! πŸ’‘ (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! πŸ“ The Reality of Compiler Research https://compilergym.ai

Slide 13

Slide 13 text

1. Lower the barrier to entry to AI for compilers 2. Advance the state-of-the-art in AI for compilers 3. Provide common benchmarks for compiler tasks CompilerGym Goals https://compilergym.ai

Slide 14

Slide 14 text

Gnarly Backend Simple Python Frontend This page intentionally left blank. Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai

Slide 15

Slide 15 text

Gnarly Backend Simple Python Frontend This page intentionally left blank. Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai Built on OpenAI Gym, write your agent in python or plug in existing agents

Slide 16

Slide 16 text

Gnarly Backend Simple Python Frontend This page intentionally left blank. Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai User sees plain old numpy array / string / nx.Graph

Slide 17

Slide 17 text

Gnarly Backend Simple Python Frontend This page intentionally left blank. Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai Interface is self-describing, no compiler expertise required

Slide 18

Slide 18 text

llvm-v0 gcc-v0 loop_tool-v0 Problem Selecting and ordering LLVM optimization passes Tuning command line flags Loop nest code generation Rewards Runtime or code size Code size Throughput (FLOPS) Action Space 124-dimensional discrete choice 504-dimension tuple, mixture of {discrete,int,float} Discrete choice with dynamic dimensionality Observation Space Types IR string, numeric feature vectors, pre-trained embeddings, directed multigraphs Assembly string, IR string, numeric feature vector IR string or numeric feature vector + more on the way! CompilerGym Environments https://compilergym.ai

Slide 19

Slide 19 text

import gym with gym.make("Breakout-v0") as env: observation = env.reset() for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() Using Gym https://compilergym.ai Agent goes here Create the environment

Slide 20

Slide 20 text

import gym import compiler_gym with gym.make("llvm-v0") as env: observation = env.reset() for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() Using CompilerGym https://compilergym.ai Now we are doing compiler optimizations!

Slide 21

Slide 21 text

import gym import compiler_gym with gym.make("llvm-v0") as env: observation = env.reset(env.make_benchmark("myprog.c")) for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() env.write_bitcode("my-compiled-code.bc") Using CompilerGym https://compilergym.ai Integrating with a build

Slide 22

Slide 22 text

import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Drop in support for RL frameworks

Slide 23

Slide 23 text

import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate different program representations

Slide 24

Slide 24 text

import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate across programs

Slide 25

Slide 25 text

import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate different algorithms

Slide 26

Slide 26 text

import compiler_gym import nevergrad as ng env = compiler_gym.make("llvm-ic-v0") @lru_cache def eval_actions(actions: Tuple[int]) -> float: env.reset() env.step(actions) return -env.episode_reward params = ng.p.Choice( choices=range(env.action_space.n), repetitions=15, ) optimizer = ng.optimizers.NGOpt( parametrization=p, budget=1000 ) recommendation = optimizer.minimize(eval_actions) Using CompilerGym https://compilergym.ai Drop in support for gradient-free techniques +8% IC reduction over -Oz!

Slide 27

Slide 27 text

Using CompilerGym https://compilergym.ai Greedy search for LLVM phase ordering def greedy(env: CompilerEnv, search_time_seconds: int): def eval_action(env, action): with env.fork() as fkd: return (fkd.step(action)[1], action) end_time = time() + search_time_seconds while time() < end_time: best = max(eval_action(env, action) for action in range(env.action_space.n)) if best[0] <= 0 or env.step(best[1])[2]: return

Slide 28

Slide 28 text

Using CompilerGym https://compilergym.ai Hill climbing for GCC flag tuning def hill_climb(env: CompilerEnv): best = float("inf") for _ in range(FLAGS.gcc_search_budget): with env.fork() as fkd: fkd.choices = [ random.randint( max(-1, x - 5), min(len(env.gcc_spec.options[i]) - 1, x + 5) ) for i, x in enumerate(env.choices) ] cost = objective(fkd) if cost < objective(env): best = cost env.choices = fkd.choices return best

Slide 29

Slide 29 text

Other Goodies https://compilergym.ai Command line binaries for code-free experiments

Slide 30

Slide 30 text

Other Goodies https://compilergym.ai Automatic validation of results $ python -m compiler_gym.bin.validate results.csv --env=llvm-ic-v0 βœ… cbench-v1/adpcm 1.0083 βœ… cbench-v1/bitcount 1.0197 βœ… cbench-v1/blowfish 1.1005 βœ… cbench-v1/bzip2 1.1755 βœ… cbench-v1/crc32 1.0000 βœ… cbench-v1/dijkstra 1.0387 βœ… cbench-v1/ghostscript 1.0349 βœ… cbench-v1/gsm 1.1434 βœ… cbench-v1/ispell 1.2948 βœ… cbench-v1/jpeg-c 1.0590 βœ… cbench-v1/jpeg-d 1.0575 βœ… cbench-v1/lame 1.0991 ❌ cbench-v1/patricia Failed 20 of 100 validators: [1/20] LeakSanitizer: detected memory leaks βœ… cbench-v1/qsort 1.0219 βœ… cbench-v1/rijndael 1.1152 βœ… cbench-v1/sha 1.6340 βœ… cbench-v1/stringsearch 1.2209 βœ… cbench-v1/stringsearch2 0.9948 ❌ cbench-v1/susan Expected reward 1.0570 but received reward 1.0397 βœ… cbench-v1/tiff2bw 1.1117 βœ… cbench-v1/tiff2rgba 1.1215 βœ… cbench-v1/tiffdither 1.1122 βœ… cbench-v1/tiffmedian 1.1030 ---------------------------------------------------- Number of validated results: 21 of 23 Mean walltime per benchmark: 3719.405s (std: 160.974s) Geometric mean IrInstructionCountOz: 1.103 (std: 0.133)

Slide 31

Slide 31 text

Other Goodies https://compilergym.ai Public leaderboards. Submit your results here!

Slide 32

Slide 32 text

... CompilerGym Architecture https://compilergym.ai CUDA Compiler Service LLVM Compiler Service Compiler APIs Feature Extractors Compiler APIs Feature Extractors Compiler Service Interface Frontend Library User-facing APIs Error handling and recovery Command Line Tools ...

Slide 33

Slide 33 text

Adding a New Compiler https://compilergym.ai Your compiler here! CUDA Compiler Service LLVM Compiler Service Compiler APIs Feature Extractors Compiler APIs Feature Extractors Compiler Service Interface Frontend Library User-facing APIs Error handling and recovery Command Line Tools ... ...

Slide 34

Slide 34 text

Adding a New Compiler https://compilergym.ai from compiler_gym.service import CompilationSession from compiler_gym.service.runtime import create_and_run_compiler_gym_service class MyCompilationSession(CompilationSession): """Class representing a compilation session.""" # what your compiler can do: action_spaces: List[ActionSpace] = [] # what features your compiler provides: observation_spaces: List[ObservationSpace] = [] def __init__(self, action_space, benchmark): pass # start a new compilation session def apply_action(self, action): pass # apply an action def get_observation(self, observation_space): pass # compute and return an observation if __name__ == "__main__": create_and_run_compiler_gym_service(MyCompilationSession)

Slide 35

Slide 35 text

35 35 #include "compiler_gym/service/CompilationSession.h" #include "compiler_gym/service/runtime/Runtime.h" using namespace compiler_gym; struct MyCompilationSession: public CompilationSession{ vector getActionSpaces() {...} vector getObservationSpaces() {...} Status init( const ActionSpace& actionSpace, const Benchmark& benchmark) {...} Status applyAction( const Action& action, bool& endOfEpisode, bool& actionSpaceChanged) {...} Status setObservation( const ObservationSpace& observationSpace, Observation& observation) {...} }; int main(int argc, char** argv) { runtime::createAndRunService(argc, argv, "My compiler service"); } Adding a New Compiler https://compilergym.ai

Slide 36

Slide 36 text

The Future https://compilergym.ai Scaling Search Space Size Today 6-12 months 1-5 years Phase ordering: 10200, Command line: 104000 Loop level: 1010 Instruction level, all opts: 1010 10 5

Slide 37

Slide 37 text

Compilers are fun! But, high barrier-to-entry. CompilerGym reduces that to "pip install" Batteries included, actively developed. Try it out! CompilerGym https://compilergym.ai facebookresearch/CompilerGym https://compilergym.ai [email protected]