CompilerGym CGO'22

Robust, Performant Compiler Optimization Environments for AI Research Chris Cummins
cummins@fb.com

Chris Cummins Hugh Leather Yuandong Tian Benoit Steiner Jiadong Guo
Bram Wasti Jia Liu Olivier Teytaud Jason Ansel Sahir Gomez Somya Jain Brandon Cui The Team at + many community contributors! https://compilergym.ai

Compiler wins beneﬁt everyone - Faster binaries = Happier users
😀 - Lower power = $$$ saved 💸 Tuning compiler optimizations by hand is time consuming and hard Machine learning promises: - bigger wins - less eﬀort Machine Learning in Compilers https://compilergym.ai https://chriscummins.cc/pub/2020-fdl.pdf

Program IR Features void LinearAlgebraOp<InputScalar, OutputScalar>::AnalyzeInputs( OpKernelContext* context, TensorInputs* inputs,
TensorShapes* input_matrix_shapes, TensorShape* batch_shape) { int input_rank = -1; for (int i = 0; i < NumMatrixInputs(context); ++i) { const Tensor& in = context->input(i); if (i == 0) { input_rank = in.dims(); OP_REQUIRES( context, input_rank >= 2, errors::InvalidArgument( "Input tensor ", i, " must have rank >= 2")); (CFG, DFG, AST,...) #. instructions loop nest level arithmetic intensity trip counts Machine Learning in Compilers https://compilergym.ai

Features Best Param ... ... Machine Learning in Compilers https://compilergym.ai

Supervised Machine Learner Model Features Param Features Best Param ...
... Machine Learning in Compilers https://compilergym.ai

Model Model Model Features Param Features Param Features Param Machine
Learning in Compilers https://compilergym.ai

Model Model Model Features Param Model Model Features Param Model
Features Param Model Model Features Param Model Features Param Features Param New Program Features Predicted param Machine Learning in Compilers https://compilergym.ai

The Reality of Compiler Research https://compilergym.ai

2. Prepare the compiler 3. Download/build benchmark suites 5. Write
ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! 💡 (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! 📝 The Reality of Compiler Research https://compilergym.ai

ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! 💡 (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! 📝 The Reality of Compiler Research https://compilergym.ai ???

ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! 💡 (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! 📝 The Reality of Compiler Research https://compilergym.ai

1. Lower the barrier to entry to AI for compilers
2. Advance the state-of-the-art in AI for compilers 3. Provide common benchmarks for compiler tasks CompilerGym Goals https://compilergym.ai

Gnarly Backend Simple Python Frontend This page intentionally left blank.
Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai

Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai Built on OpenAI Gym, write your agent in python or plug in existing agents

Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai User sees plain old numpy array / string / nx.Graph

Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai Interface is self-describing, no compiler expertise required

llvm-v0 gcc-v0 loop_tool-v0 Problem Selecting and ordering LLVM optimization passes
Tuning command line ﬂags Loop nest code generation Rewards Runtime or code size Code size Throughput (FLOPS) Action Space 124-dimensional discrete choice 504-dimension tuple, mixture of {discrete,int,ﬂoat} Discrete choice with dynamic dimensionality Observation Space Types IR string, numeric feature vectors, pre-trained embeddings, directed multigraphs Assembly string, IR string, numeric feature vector IR string or numeric feature vector + more on the way! CompilerGym Environments https://compilergym.ai

import gym with gym.make("Breakout-v0") as env: observation = env.reset() for
_ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() Using Gym https://compilergym.ai Agent goes here Create the environment

import gym import compiler_gym with gym.make("llvm-v0") as env: observation =
env.reset() for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() Using CompilerGym https://compilergym.ai Now we are doing compiler optimizations!

import gym import compiler_gym with gym.make("llvm-v0") as env: observation =
env.reset(env.make_benchmark("myprog.c")) for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() env.write_bitcode("my-compiled-code.bc") Using CompilerGym https://compilergym.ai Integrating with a build

import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer
tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Drop in support for RL frameworks

tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate different program representations

tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate across programs

tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate different algorithms

import compiler_gym import nevergrad as ng env = compiler_gym.make("llvm-ic-v0") @lru_cache
def eval_actions(actions: Tuple[int]) -> float: env.reset() env.step(actions) return -env.episode_reward params = ng.p.Choice( choices=range(env.action_space.n), repetitions=15, ) optimizer = ng.optimizers.NGOpt( parametrization=p, budget=1000 ) recommendation = optimizer.minimize(eval_actions) Using CompilerGym https://compilergym.ai Drop in support for gradient-free techniques +8% IC reduction over -Oz!

Using CompilerGym https://compilergym.ai Greedy search for LLVM phase ordering def
greedy(env: CompilerEnv, search_time_seconds: int): def eval_action(env, action): with env.fork() as fkd: return (fkd.step(action)[1], action) end_time = time() + search_time_seconds while time() < end_time: best = max(eval_action(env, action) for action in range(env.action_space.n)) if best[0] <= 0 or env.step(best[1])[2]: return

Using CompilerGym https://compilergym.ai Hill climbing for GCC flag tuning def
hill_climb(env: CompilerEnv): best = float("inf") for _ in range(FLAGS.gcc_search_budget): with env.fork() as fkd: fkd.choices = [ random.randint( max(-1, x - 5), min(len(env.gcc_spec.options[i]) - 1, x + 5) ) for i, x in enumerate(env.choices) ] cost = objective(fkd) if cost < objective(env): best = cost env.choices = fkd.choices return best

Other Goodies https://compilergym.ai Command line binaries for code-free experiments

Other Goodies https://compilergym.ai Automatic validation of results $ python -m
compiler_gym.bin.validate results.csv --env=llvm-ic-v0 ✅ cbench-v1/adpcm 1.0083 ✅ cbench-v1/bitcount 1.0197 ✅ cbench-v1/blowfish 1.1005 ✅ cbench-v1/bzip2 1.1755 ✅ cbench-v1/crc32 1.0000 ✅ cbench-v1/dijkstra 1.0387 ✅ cbench-v1/ghostscript 1.0349 ✅ cbench-v1/gsm 1.1434 ✅ cbench-v1/ispell 1.2948 ✅ cbench-v1/jpeg-c 1.0590 ✅ cbench-v1/jpeg-d 1.0575 ✅ cbench-v1/lame 1.0991 ❌ cbench-v1/patricia Failed 20 of 100 validators: [1/20] LeakSanitizer: detected memory leaks ✅ cbench-v1/qsort 1.0219 ✅ cbench-v1/rijndael 1.1152 ✅ cbench-v1/sha 1.6340 ✅ cbench-v1/stringsearch 1.2209 ✅ cbench-v1/stringsearch2 0.9948 ❌ cbench-v1/susan Expected reward 1.0570 but received reward 1.0397 ✅ cbench-v1/tiff2bw 1.1117 ✅ cbench-v1/tiff2rgba 1.1215 ✅ cbench-v1/tiffdither 1.1122 ✅ cbench-v1/tiffmedian 1.1030 ---------------------------------------------------- Number of validated results: 21 of 23 Mean walltime per benchmark: 3719.405s (std: 160.974s) Geometric mean IrInstructionCountOz: 1.103 (std: 0.133)

Other Goodies https://compilergym.ai Public leaderboards. Submit your results here!

... CompilerGym Architecture https://compilergym.ai CUDA Compiler Service LLVM Compiler Service
Compiler APIs Feature Extractors Compiler APIs Feature Extractors Compiler Service Interface Frontend Library User-facing APIs Error handling and recovery Command Line Tools ...

Adding a New Compiler https://compilergym.ai Your compiler here! CUDA Compiler
Service LLVM Compiler Service Compiler APIs Feature Extractors Compiler APIs Feature Extractors Compiler Service Interface Frontend Library User-facing APIs Error handling and recovery Command Line Tools ... ...

Adding a New Compiler https://compilergym.ai from compiler_gym.service import CompilationSession from
compiler_gym.service.runtime import create_and_run_compiler_gym_service class MyCompilationSession(CompilationSession): """Class representing a compilation session.""" # what your compiler can do: action_spaces: List[ActionSpace] = [] # what features your compiler provides: observation_spaces: List[ObservationSpace] = [] def __init__(self, action_space, benchmark): pass # start a new compilation session def apply_action(self, action): pass # apply an action def get_observation(self, observation_space): pass # compute and return an observation if __name__ == "__main__": create_and_run_compiler_gym_service(MyCompilationSession)

35 35 #include "compiler_gym/service/CompilationSession.h" #include "compiler_gym/service/runtime/Runtime.h" using namespace compiler_gym; struct
MyCompilationSession: public CompilationSession{ vector<ActionSpace> getActionSpaces() {...} vector<ObservationSpace> getObservationSpaces() {...} Status init( const ActionSpace& actionSpace, const Benchmark& benchmark) {...} Status applyAction( const Action& action, bool& endOfEpisode, bool& actionSpaceChanged) {...} Status setObservation( const ObservationSpace& observationSpace, Observation& observation) {...} }; int main(int argc, char** argv) { runtime::createAndRunService<MyCompilationSession>(argc, argv, "My compiler service"); } Adding a New Compiler https://compilergym.ai

The Future https://compilergym.ai Scaling Search Space Size Today 6-12 months
1-5 years Phase ordering: 10200, Command line: 104000 Loop level: 1010 Instruction level, all opts: 1010 10 5

Compilers are fun! But, high barrier-to-entry. CompilerGym reduces that to
"pip install" Batteries included, actively developed. Try it out! CompilerGym https://compilergym.ai facebookresearch/CompilerGym https://compilergym.ai cummins@fb.com

CompilerGym CGO'22

CompilerGym CGO'22

More Decks by Chris Cummins

Other Decks in Science

Featured

Transcript