Slide 1

Slide 1 text

Infusing AI and ML into Chip Design for Faster Delivery, Better Performance Jinwook Jung*, Jenn Kazda, Derren Dunn, Gi-Joon Nam*, Raghu Ganti, Mudhakar Srivatsa, Carlos Costa, Rama Divakaruni IBM Research, Yorktown Heights, NY *: Today’s presenters

Slide 2

Slide 2 text

2 1. Background: Chip design flow 2. AI/ML-infused chip design flow orchestration 3. Ray code examples 4. Results Outline

Slide 3

Slide 3 text

3 § Multiple stages involving many tools § Large storage requirement - PDK and cell libraries (>1TB) - Tool installation (~500GB or more) - User workspace (~10TB per project) § Single established design flow being applied to multiple blocks and units Chip Design Flow RTL RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification DFM, Fill DRC/LVS ERC Timing Closure Library PDK Library GDS GDS Tape-out Background: Chip Design Flow

Slide 4

Slide 4 text

4 RTL § Flow parameters: change tool behavior - Spend more time to search for better structure - Do more power optimization at placement - Prefer higher layer for clock routing § EDA tools have hundreds of parameters affecting design outcome - Finding optimal parameter setting is difficult Chip Design Flow and Flow Parameters RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification Timing Closure Library PDK Library GDS GDS Tape-out Same design and flow with different flow parameters Parameter Parameter Parameter Parameter Parameter Multiple designs Background: Chip Design Flow

Slide 5

Slide 5 text

5 Chip Design Flow as Code § Chip design flow itself is “code” à Maintained in code repositories § Design flow is continuously evolving, e.g., - Floorplan and timing constraint updates - New tool, PDK, library versions - Flow step and parameter updates § Desire for CI/CD* of design flow code project-a-flow project-b-flow project-c-flow project-d-flow project-e-flow project-X-flow *CI/CD: Continuous integration and continuous delivery Background: Chip Design Flow

Slide 6

Slide 6 text

6 1. Background: Chip design flow 2. AI/ML-infused chip design flow orchestration 3. Ray code examples 4. Results Outline

Slide 7

Slide 7 text

7 1. Containerized design flow enablement - Custom container images for design flow execution - Volume resources for PDK/libraries, tools, workspace - Batch job deployment capability 2. ML-based design flow tuning with containerized design flow - Implemented ML-based flow tuner using containerized design flow - Applied tuner to IBM’s chip designs and delivered better design outcome 3. Cloud-native design flow CI/CD - CI/CD pipeline around containerized flow - IBM design team uses it for nightly flow build AI/ML-Infused Chip Design Flow Orchestration Jenkins Pod Worker Pod MongoDB Pod Dashboard Pod Kubernetes Cluster https://github.ibm.com/zeus2-pd edaas/centos metrics-dashboard jenkins mongodb custom-container-image project-X-flow Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster

Slide 8

Slide 8 text

8 § Multiple stages involving many tools § Large storage requirement - PDK and cell libraries (>1TB) - Tool installation (~500GB or more) - User workspace (~10TB per project) § Single established design flow being applied to multiple blocks and units Chip Design Flow: Characteristics RTL RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification DFM, Fill DRC/LVS ERC Timing Closure Library PDK Library GDS GDS Tape-out (3) (1) (2) Background: Chip Design Flow

Slide 9

Slide 9 text

9 § Custom container images of tools for design flow execution § Volume resources for hosting PDK/libraries, tools, and user workspace § Parallel workload distribution capability enabled by Ray - Given user’s flow descriptions, multiple flow executions are distributed Containerized Chip Design Flow Enablement *PV: Persistent Volume **Pod: Construct for running containers in cloud orchestrator like Kubernetes (3) (1) (2) Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster

Slide 10

Slide 10 text

10 RTL § Flow parameters: change tool behavior - Spend more time to search for better structure - Do more power optimization at placement - Prefer higher layer for clock routing § EDA tools have hundreds of parameters affecting design outcome - Finding optimal parameter setting is difficult Chip Design Flow and Flow Parameters RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification Timing Closure Library PDK Library GDS GDS Tape-out Parameter Parameter Parameter Parameter Parameter Multiple designs Same design and flow with different flow parameters Background: Chip Design Flow

Slide 11

Slide 11 text

11 § Design flow agnostic implementation à no dependency on tools and flows § Flexible interface: parameter definition, max iteration, parallelism, etc. § Various parameter search algorithms supported (e.g., hyperopt, skopt) ML-Based Chip Design Flow Tuner Flow Tuner • Search algorithm • Max iteration • Max concurrent jobs • Flow execution script • Parameter names • Ranges or choices • Default values Design flow (Black box) Compute reward Apply parameter Input: config, tunable parameters Output: best results with parameters 1. Read configuration and parameters 2. iteration := 0 3. while iteration++ < max_iteration do: 4. Create new parameter setting 5. Execute design flow 6. Compute reward 7. return best results with parameters Pseudo code ML Configuration Design flow parameters

Slide 12

Slide 12 text

12 § Extended chip design flow tuner using Ray and Tune § Tuning trials are distributed by Ray and executed within Kubernetes cluster § Using parallel search algorithm, we can run multiple tuning trials simultaneously Chip Design Flow Tuning with Containerized Flow *PV: Persistent Volume **Pod: Construct for running containers in cloud orchestrator like Kubernetes Compute Node Compute Node Pod** Kubernetes Cluster Pod Pod Pod Container Container Container Container Parameters Flow Tuner Tools PV User PV PV* Flow tuning trials distributed by Ray PDK/Lib Custom container image

Slide 13

Slide 13 text

13 Chip Design Flow as Code § Chip design flow itself is “code” à Maintained in code repositories § Design flow is continuously evolving, e.g., - Floorplan and timing constraint updates - New tool, PDK, library versions - Flow step and parameter updates § Desire for CI/CD* of design flow code project-a-flow project-b-flow project-c-flow project-d-flow project-e-flow project-X-flow *CI/CD: Continuous integration and continuous delivery Background: Chip Design Flow

Slide 14

Slide 14 text

14 Kubernetes Cluster § Developed CI/CD pipeline utilizing containerized chip design flow § In IBM, we use it for nightly design flow build covering entire chip design flow - Synthesis à Floorplan à Placement à CTS à Routing à DFM/Fill/DRC/LVS Cloud-Native Design Flow CI/CD Jenkins Pod Metrics Dashboard Dashboard Pod Worker Pod Scalable Worker Pod Database Pod project-X-flow

Slide 15

Slide 15 text

15 Kubernetes Cluster Jenkins Pod Metrics Dashboard Dashboard Pod Worker Pod Worker Pod Database Pod § Continuously and autonomously evolving design flow enabled by - Cloud-native containerized design flow enablement - ML-based design flow tuning ML-Driven Continuous Design Flow Evolution at Scale Design and flow updates invoking CI/CD pipeline Many workers run at scale with different parameters Quality of result Time Continuous QoR Improvement Knowledge database project-X-flow Scalable

Slide 16

Slide 16 text

16 1. Background: Chip design flow 2. AI/ML-infused chip design flow orchestration 3. Ray deployment and code examples 4. Results Outline

Slide 17

Slide 17 text

17 § Containerized design flow running on IBM Cloud § Ray Kubernetes Operator* to deploy Ray on containerized design flow - Used custom Docker image for running both Ray and chip design tools - Modified Ray Helm Chart to mount volumes properly Deploying Ray Cluster on Containerized Design Flow Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster *Link: https://docs.ray.io/en/latest/cluster/kubernetes.html#deploying-on-kubernetes

Slide 18

Slide 18 text

18 Dockerfile Example FROM centos:8 # For kubectl RUN dnf install -y kubectl # For Ray RUN dnf -y update \ && dnf install -y python39 kubectl \ && alternatives --set python /usr/bin/python3 \ && python -m pip install kopf --no-cache-dir \ && python -m pip install kubernetes --no-cache-dir \ && python -m pip install pandas numpy matplotlib --no-cache-dir && python -m pip install ray --no-cache-dir \ && python -m pip install ray[tune] --no-cache-dir \ # For chip design flow tools RUN ...

Slide 19

Slide 19 text

19 § Design floorplan utilization - Lower utilization: easy, but bad for performance, power, and area - Higher utilization: difficult, but good for performance, power, and area § How to find an optimal floorplan utilization? - Find maximum achievable floorplan utilization by sweeping - Individual flow execution is distributed across containerized flow by Ray Ray Code Example: Floorplan Utilization Sweeping Utilization: 50% Utilization: 70%

Slide 20

Slide 20 text

20 Ray Code Example: Floorplan Utilization Sweeping

Slide 21

Slide 21 text

21 Ray Code Example: Floorplan Utilization Sweeping

Slide 22

Slide 22 text

22 Ray Code Example: Chip Design Flow Tuning import ray from ray import tune ray.init(...) def execute_design_flow(config) -> float: ... def evaluate_design_flow(config): reward = execute_design_flow(config) tune.report(reward=reward) all_trials = tune.run(evaluate_design_flow, search_alg=algo, config=flow_parameters, ...) Flow Tuner • Search algorithm • Max iteration • Max concurrent jobs • Flow execution script Design flow (Black box) Compute reward Apply parameter Configuration Design flow parameters § Regard chip design flow parameters as hyperparameter tuned by Tune § Each tuning trial invokes chip design flow, returning quality metric as reward

Slide 23

Slide 23 text

23 1. Background: Chip design flow 2. AI/ML-infused chip design flow orchestration 3. Ray code examples 4. Results Outline

Slide 24

Slide 24 text

24 § Team demonstrated: - The design environment can be fully buildable on IBM Cloud Kubernetes Cluster - The QoRs can be significantly improved via “automated parameter tuning” IBM Telum: On-Chip AI Accelerator Example 24 AI Compute Array (~ 30 mm2) Macro-level Unit-level

Slide 25

Slide 25 text

25 Design Flow Tuning Result § Applied flow tuner to IBM internal design flow targeting 14nm technology - Took an expert designer’s optimized design result as baseline - Tuning objective: power minimization - Tuning configuration: 17 synthesis and P&R parameters, 50 iterations § Achieved ~9% additional power reduction compared to designer’s baseline Baseline Optimization 1 Optimization 2 Optimization 3 Designer’s manual optimization Flow tuner result

Slide 26

Slide 26 text

26 § Chip design challenge: - Reduce overall turn-around time while delivering better quality § We built cloud-based, containerized chip design flow with: - Distributed chip design workload execution enabled by Ray - Parallel automatic flow parameter tuning enabled by Tune § We used our platform for IBM’s chip designs and delivered better design outcome with reduced overall turn-around time § Find more details from IBM Research’s Anyscale blog post: Conclusion https://www.anyscale.com/blog/infusing-ai-and-ml-into-integrated-circuit-design-for-faster-chip-delivery