Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CompilerGym CGO'22

CompilerGym CGO'22

Interest in applying Artificial Intelligence (AI) techniques to compiler optimizations is increasing rapidly, but compiler research has a high entry barrier. Unlike in other domains, compiler and AI researchers do not have access to the datasets and frameworks that enable fast iteration and development of ideas, and getting started requires a significant engineering investment. What is needed is an easy, reusable experimental infrastructure for real world compiler optimization tasks that can serve as a common benchmark for comparing techniques, and as a platform to accelerate progress in the field.

We introduce CompilerGym, a set of environments for real world compiler optimization tasks, and a toolkit for exposing new optimization tasks to compiler researchers. CompilerGym enables anyone to experiment on production compiler optimization problems through an easy-to-use package, regardless of their experience with compilers. We build upon the popular OpenAI Gym interface enabling researchers to interact with compilers using Python and a familiar API.

We describe the CompilerGym architecture and implementation, characterize the optimization spaces and computational efficiencies of three included compiler environments, and provide extensive empirical evaluations. Compared to prior works, CompilerGym offers larger datasets and optimization spaces, is 27× more computationally efficient, is fault-tolerant, and capable of
detecting reproducibility bugs in the underlying compilers.

In making it easy for anyone to experiment with compilers – irrespective of their background – we aim to accelerate progress in the AI and compiler research domains

Chris Cummins

March 24, 2022
Tweet

More Decks by Chris Cummins

Other Decks in Science

Transcript

  1. Chris Cummins Hugh Leather Yuandong Tian Benoit Steiner Jiadong Guo

    Bram Wasti Jia Liu Olivier Teytaud Jason Ansel Sahir Gomez Somya Jain Brandon Cui The Team at + many community contributors! https://compilergym.ai
  2. Compiler wins benefit everyone - Faster binaries = Happier users

    😀 - Lower power = $$$ saved 💸 Tuning compiler optimizations by hand is time consuming and hard Machine learning promises: - bigger wins - less effort Machine Learning in Compilers https://compilergym.ai https://chriscummins.cc/pub/2020-fdl.pdf
  3. Program IR Features void LinearAlgebraOp<InputScalar, OutputScalar>::AnalyzeInputs( OpKernelContext* context, TensorInputs* inputs,

    TensorShapes* input_matrix_shapes, TensorShape* batch_shape) { int input_rank = -1; for (int i = 0; i < NumMatrixInputs(context); ++i) { const Tensor& in = context->input(i); if (i == 0) { input_rank = in.dims(); OP_REQUIRES( context, input_rank >= 2, errors::InvalidArgument( "Input tensor ", i, " must have rank >= 2")); (CFG, DFG, AST,...) #. instructions loop nest level arithmetic intensity trip counts Machine Learning in Compilers https://compilergym.ai
  4. Supervised Machine Learner Model Features Param Features Best Param ...

    ... Machine Learning in Compilers https://compilergym.ai
  5. Model Model Model Features Param Features Param Features Param Machine

    Learning in Compilers https://compilergym.ai
  6. Model Model Model Features Param Model Model Features Param Model

    Features Param Model Model Features Param Model Features Param Features Param New Program Features Predicted param Machine Learning in Compilers https://compilergym.ai
  7. 2. Prepare the compiler 3. Download/build benchmark suites 5. Write

    ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! 💡 (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! 📝 The Reality of Compiler Research https://compilergym.ai
  8. 2. Prepare the compiler 3. Download/build benchmark suites 5. Write

    ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! 💡 (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! 📝 The Reality of Compiler Research https://compilergym.ai ???
  9. 2. Prepare the compiler 3. Download/build benchmark suites 5. Write

    ad-hoc scripts to glue everything together 4. Write / build feature extractors 6. Test environment, 100s of bugs 7. Existential crisis 1. I have a brilliant idea! 💡 (consult mailing lists, learn how to use gdb) (RTFM, learn a new build system) (they never work OOTB) (chase after prior art authors, reimplement crusty old code) 8. Result! 📝 The Reality of Compiler Research https://compilergym.ai
  10. 1. Lower the barrier to entry to AI for compilers

    2. Advance the state-of-the-art in AI for compilers 3. Provide common benchmarks for compiler tasks CompilerGym Goals https://compilergym.ai
  11. Gnarly Backend Simple Python Frontend This page intentionally left blank.

    Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai
  12. Gnarly Backend Simple Python Frontend This page intentionally left blank.

    Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai Built on OpenAI Gym, write your agent in python or plug in existing agents
  13. Gnarly Backend Simple Python Frontend This page intentionally left blank.

    Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai User sees plain old numpy array / string / nx.Graph
  14. Gnarly Backend Simple Python Frontend This page intentionally left blank.

    Action Space Optimization Choices Observation View of Program Reward Performance Metric CompilerGym Environments Halide ... etc Action Optimization Choice CompilerGym https://compilergym.ai Interface is self-describing, no compiler expertise required
  15. llvm-v0 gcc-v0 loop_tool-v0 Problem Selecting and ordering LLVM optimization passes

    Tuning command line flags Loop nest code generation Rewards Runtime or code size Code size Throughput (FLOPS) Action Space 124-dimensional discrete choice 504-dimension tuple, mixture of {discrete,int,float} Discrete choice with dynamic dimensionality Observation Space Types IR string, numeric feature vectors, pre-trained embeddings, directed multigraphs Assembly string, IR string, numeric feature vector IR string or numeric feature vector + more on the way! CompilerGym Environments https://compilergym.ai
  16. import gym with gym.make("Breakout-v0") as env: observation = env.reset() for

    _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() Using Gym https://compilergym.ai Agent goes here Create the environment
  17. import gym import compiler_gym with gym.make("llvm-v0") as env: observation =

    env.reset() for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() Using CompilerGym https://compilergym.ai Now we are doing compiler optimizations!
  18. import gym import compiler_gym with gym.make("llvm-v0") as env: observation =

    env.reset(env.make_benchmark("myprog.c")) for _ in range(1000): action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: observation = env.reset() env.write_bitcode("my-compiled-code.bc") Using CompilerGym https://compilergym.ai Integrating with a build
  19. import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer

    tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Drop in support for RL frameworks
  20. import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer

    tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate different program representations
  21. import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer

    tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate across programs
  22. import compiler_gym from ray import tune from ray.rllib.agents.ppo import PPOTrainer

    tune.register_env( "CompilerGym", lambda _: compiler_gym.wrappers.TimeLimit( max_episode_steps=45, env=compiler_gym.make( "llvm-autophase-ic-v0", benchmark="cbench-v1/qsort", observation_space="Autophase", ), ), ) tune.run(PPOTrainer, config={"env": "CompilerGym"}) Using CompilerGym https://compilergym.ai Evaluate different algorithms
  23. import compiler_gym import nevergrad as ng env = compiler_gym.make("llvm-ic-v0") @lru_cache

    def eval_actions(actions: Tuple[int]) -> float: env.reset() env.step(actions) return -env.episode_reward params = ng.p.Choice( choices=range(env.action_space.n), repetitions=15, ) optimizer = ng.optimizers.NGOpt( parametrization=p, budget=1000 ) recommendation = optimizer.minimize(eval_actions) Using CompilerGym https://compilergym.ai Drop in support for gradient-free techniques +8% IC reduction over -Oz!
  24. Using CompilerGym https://compilergym.ai Greedy search for LLVM phase ordering def

    greedy(env: CompilerEnv, search_time_seconds: int): def eval_action(env, action): with env.fork() as fkd: return (fkd.step(action)[1], action) end_time = time() + search_time_seconds while time() < end_time: best = max(eval_action(env, action) for action in range(env.action_space.n)) if best[0] <= 0 or env.step(best[1])[2]: return
  25. Using CompilerGym https://compilergym.ai Hill climbing for GCC flag tuning def

    hill_climb(env: CompilerEnv): best = float("inf") for _ in range(FLAGS.gcc_search_budget): with env.fork() as fkd: fkd.choices = [ random.randint( max(-1, x - 5), min(len(env.gcc_spec.options[i]) - 1, x + 5) ) for i, x in enumerate(env.choices) ] cost = objective(fkd) if cost < objective(env): best = cost env.choices = fkd.choices return best
  26. Other Goodies https://compilergym.ai Automatic validation of results $ python -m

    compiler_gym.bin.validate results.csv --env=llvm-ic-v0 ✅ cbench-v1/adpcm 1.0083 ✅ cbench-v1/bitcount 1.0197 ✅ cbench-v1/blowfish 1.1005 ✅ cbench-v1/bzip2 1.1755 ✅ cbench-v1/crc32 1.0000 ✅ cbench-v1/dijkstra 1.0387 ✅ cbench-v1/ghostscript 1.0349 ✅ cbench-v1/gsm 1.1434 ✅ cbench-v1/ispell 1.2948 ✅ cbench-v1/jpeg-c 1.0590 ✅ cbench-v1/jpeg-d 1.0575 ✅ cbench-v1/lame 1.0991 ❌ cbench-v1/patricia Failed 20 of 100 validators: [1/20] LeakSanitizer: detected memory leaks ✅ cbench-v1/qsort 1.0219 ✅ cbench-v1/rijndael 1.1152 ✅ cbench-v1/sha 1.6340 ✅ cbench-v1/stringsearch 1.2209 ✅ cbench-v1/stringsearch2 0.9948 ❌ cbench-v1/susan Expected reward 1.0570 but received reward 1.0397 ✅ cbench-v1/tiff2bw 1.1117 ✅ cbench-v1/tiff2rgba 1.1215 ✅ cbench-v1/tiffdither 1.1122 ✅ cbench-v1/tiffmedian 1.1030 ---------------------------------------------------- Number of validated results: 21 of 23 Mean walltime per benchmark: 3719.405s (std: 160.974s) Geometric mean IrInstructionCountOz: 1.103 (std: 0.133)
  27. ... CompilerGym Architecture https://compilergym.ai CUDA Compiler Service LLVM Compiler Service

    Compiler APIs Feature Extractors Compiler APIs Feature Extractors Compiler Service Interface Frontend Library User-facing APIs Error handling and recovery Command Line Tools ...
  28. Adding a New Compiler https://compilergym.ai Your compiler here! CUDA Compiler

    Service LLVM Compiler Service Compiler APIs Feature Extractors Compiler APIs Feature Extractors Compiler Service Interface Frontend Library User-facing APIs Error handling and recovery Command Line Tools ... ...
  29. Adding a New Compiler https://compilergym.ai from compiler_gym.service import CompilationSession from

    compiler_gym.service.runtime import create_and_run_compiler_gym_service class MyCompilationSession(CompilationSession): """Class representing a compilation session.""" # what your compiler can do: action_spaces: List[ActionSpace] = [] # what features your compiler provides: observation_spaces: List[ObservationSpace] = [] def __init__(self, action_space, benchmark): pass # start a new compilation session def apply_action(self, action): pass # apply an action def get_observation(self, observation_space): pass # compute and return an observation if __name__ == "__main__": create_and_run_compiler_gym_service(MyCompilationSession)
  30. 35 35 #include "compiler_gym/service/CompilationSession.h" #include "compiler_gym/service/runtime/Runtime.h" using namespace compiler_gym; struct

    MyCompilationSession: public CompilationSession{ vector<ActionSpace> getActionSpaces() {...} vector<ObservationSpace> getObservationSpaces() {...} Status init( const ActionSpace& actionSpace, const Benchmark& benchmark) {...} Status applyAction( const Action& action, bool& endOfEpisode, bool& actionSpaceChanged) {...} Status setObservation( const ObservationSpace& observationSpace, Observation& observation) {...} }; int main(int argc, char** argv) { runtime::createAndRunService<MyCompilationSession>(argc, argv, "My compiler service"); } Adding a New Compiler https://compilergym.ai
  31. The Future https://compilergym.ai Scaling Search Space Size Today 6-12 months

    1-5 years Phase ordering: 10200, Command line: 104000 Loop level: 1010 Instruction level, all opts: 1010 10 5
  32. Compilers are fun! But, high barrier-to-entry. CompilerGym reduces that to

    "pip install" Batteries included, actively developed. Try it out! CompilerGym https://compilergym.ai facebookresearch/CompilerGym https://compilergym.ai [email protected]