Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI/ML-infused digital IC design workflows on the hybrid cloud

AI/ML-infused digital IC design workflows on the hybrid cloud

As the complexity of modern hardware systems explodes, fast and effective design space explorations for better integrated circuit (IC) implementations is becoming more and more difficult to achieve due to higher demands of computational resources. Recent years have seen increasing use of decision intelligence in IC design flows to navigate the design solution space in a more systematic and intelligent manner. To address these problems, we have been working on AI/ML-infused IC design orchestration in order 1) to enable the IC design environment on hybrid cloud platform so that we can easily scale up/down the workloads according to the computation demands; and 2) to produce higher quality of results (QoRs) in shorter total turnaround time (TAT). In this work, we will illustrate how we provide a scalable IC design workload execution that produces higher performance designs by utilizing AI/ML-driven automatic parameter tuning capability. We first demonstrate that we can build a cloud-based IC design environment including containerized digital design flow on Kubernetes clusters. Then, we extend the containerized design flow with the automatic parameter tuning capability using AI/ML techniques. Finally, we demonstrate that the automatic parameter tuning can be executed in a more scalable and distributable manner using the Ray platform. We will use the actual design environment setups, the code snippets, and results from the product IC designs as evidence that the proposed method can produce a higher quality of IC designs using the Ray-based automatic parameter tuning methodologies.

Speakers: Gi-Joon Nam & Jinwook Jung

Anyscale

June 23, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Infusing AI and ML into Chip Design for Faster Delivery,

    Better Performance Jinwook Jung*, Jenn Kazda, Derren Dunn, Gi-Joon Nam*, Raghu Ganti, Mudhakar Srivatsa, Carlos Costa, Rama Divakaruni IBM Research, Yorktown Heights, NY *: Today’s presenters
  2. 2 1. Background: Chip design flow 2. AI/ML-infused chip design

    flow orchestration 3. Ray code examples 4. Results Outline
  3. 3 § Multiple stages involving many tools § Large storage

    requirement - PDK and cell libraries (>1TB) - Tool installation (~500GB or more) - User workspace (~10TB per project) § Single established design flow being applied to multiple blocks and units Chip Design Flow RTL RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification DFM, Fill DRC/LVS ERC Timing Closure Library PDK Library GDS GDS Tape-out Background: Chip Design Flow
  4. 4 RTL § Flow parameters: change tool behavior - Spend

    more time to search for better structure - Do more power optimization at placement - Prefer higher layer for clock routing § EDA tools have hundreds of parameters affecting design outcome - Finding optimal parameter setting is difficult Chip Design Flow and Flow Parameters RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification Timing Closure Library PDK Library GDS GDS Tape-out Same design and flow with different flow parameters Parameter Parameter Parameter Parameter Parameter Multiple designs Background: Chip Design Flow
  5. 5 Chip Design Flow as Code § Chip design flow

    itself is “code” à Maintained in code repositories § Design flow is continuously evolving, e.g., - Floorplan and timing constraint updates - New tool, PDK, library versions - Flow step and parameter updates § Desire for CI/CD* of design flow code project-a-flow project-b-flow project-c-flow project-d-flow project-e-flow project-X-flow *CI/CD: Continuous integration and continuous delivery Background: Chip Design Flow
  6. 6 1. Background: Chip design flow 2. AI/ML-infused chip design

    flow orchestration 3. Ray code examples 4. Results Outline
  7. 7 1. Containerized design flow enablement - Custom container images

    for design flow execution - Volume resources for PDK/libraries, tools, workspace - Batch job deployment capability 2. ML-based design flow tuning with containerized design flow - Implemented ML-based flow tuner using containerized design flow - Applied tuner to IBM’s chip designs and delivered better design outcome 3. Cloud-native design flow CI/CD - CI/CD pipeline around containerized flow - IBM design team uses it for nightly flow build AI/ML-Infused Chip Design Flow Orchestration Jenkins Pod Worker Pod MongoDB Pod Dashboard Pod Kubernetes Cluster https://github.ibm.com/zeus2-pd edaas/centos metrics-dashboard jenkins mongodb custom-container-image project-X-flow Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster
  8. 8 § Multiple stages involving many tools § Large storage

    requirement - PDK and cell libraries (>1TB) - Tool installation (~500GB or more) - User workspace (~10TB per project) § Single established design flow being applied to multiple blocks and units Chip Design Flow: Characteristics RTL RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification DFM, Fill DRC/LVS ERC Timing Closure Library PDK Library GDS GDS Tape-out (3) (1) (2) Background: Chip Design Flow
  9. 9 § Custom container images of tools for design flow

    execution § Volume resources for hosting PDK/libraries, tools, and user workspace § Parallel workload distribution capability enabled by Ray - Given user’s flow descriptions, multiple flow executions are distributed Containerized Chip Design Flow Enablement *PV: Persistent Volume **Pod: Construct for running containers in cloud orchestrator like Kubernetes (3) (1) (2) Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster
  10. 10 RTL § Flow parameters: change tool behavior - Spend

    more time to search for better structure - Do more power optimization at placement - Prefer higher layer for clock routing § EDA tools have hundreds of parameters affecting design outcome - Finding optimal parameter setting is difficult Chip Design Flow and Flow Parameters RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification Timing Closure Library PDK Library GDS GDS Tape-out Parameter Parameter Parameter Parameter Parameter Multiple designs Same design and flow with different flow parameters Background: Chip Design Flow
  11. 11 § Design flow agnostic implementation à no dependency on

    tools and flows § Flexible interface: parameter definition, max iteration, parallelism, etc. § Various parameter search algorithms supported (e.g., hyperopt, skopt) ML-Based Chip Design Flow Tuner Flow Tuner • Search algorithm • Max iteration • Max concurrent jobs • Flow execution script • Parameter names • Ranges or choices • Default values Design flow (Black box) Compute reward Apply parameter Input: config, tunable parameters Output: best results with parameters 1. Read configuration and parameters 2. iteration := 0 3. while iteration++ < max_iteration do: 4. Create new parameter setting 5. Execute design flow 6. Compute reward 7. return best results with parameters Pseudo code ML Configuration Design flow parameters
  12. 12 § Extended chip design flow tuner using Ray and

    Tune § Tuning trials are distributed by Ray and executed within Kubernetes cluster § Using parallel search algorithm, we can run multiple tuning trials simultaneously Chip Design Flow Tuning with Containerized Flow *PV: Persistent Volume **Pod: Construct for running containers in cloud orchestrator like Kubernetes Compute Node Compute Node Pod** Kubernetes Cluster Pod Pod Pod Container Container Container Container Parameters Flow Tuner Tools PV User PV PV* Flow tuning trials distributed by Ray PDK/Lib Custom container image
  13. 13 Chip Design Flow as Code § Chip design flow

    itself is “code” à Maintained in code repositories § Design flow is continuously evolving, e.g., - Floorplan and timing constraint updates - New tool, PDK, library versions - Flow step and parameter updates § Desire for CI/CD* of design flow code project-a-flow project-b-flow project-c-flow project-d-flow project-e-flow project-X-flow *CI/CD: Continuous integration and continuous delivery Background: Chip Design Flow
  14. 14 Kubernetes Cluster § Developed CI/CD pipeline utilizing containerized chip

    design flow § In IBM, we use it for nightly design flow build covering entire chip design flow - Synthesis à Floorplan à Placement à CTS à Routing à DFM/Fill/DRC/LVS Cloud-Native Design Flow CI/CD Jenkins Pod Metrics Dashboard Dashboard Pod Worker Pod Scalable Worker Pod Database Pod project-X-flow
  15. 15 Kubernetes Cluster Jenkins Pod Metrics Dashboard Dashboard Pod Worker

    Pod Worker Pod Database Pod § Continuously and autonomously evolving design flow enabled by - Cloud-native containerized design flow enablement - ML-based design flow tuning ML-Driven Continuous Design Flow Evolution at Scale Design and flow updates invoking CI/CD pipeline Many workers run at scale with different parameters Quality of result Time Continuous QoR Improvement Knowledge database project-X-flow Scalable
  16. 16 1. Background: Chip design flow 2. AI/ML-infused chip design

    flow orchestration 3. Ray deployment and code examples 4. Results Outline
  17. 17 § Containerized design flow running on IBM Cloud §

    Ray Kubernetes Operator* to deploy Ray on containerized design flow - Used custom Docker image for running both Ray and chip design tools - Modified Ray Helm Chart to mount volumes properly Deploying Ray Cluster on Containerized Design Flow Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster *Link: https://docs.ray.io/en/latest/cluster/kubernetes.html#deploying-on-kubernetes
  18. 18 Dockerfile Example FROM centos:8 # For kubectl RUN dnf

    install -y kubectl # For Ray RUN dnf -y update \ && dnf install -y python39 kubectl \ && alternatives --set python /usr/bin/python3 \ && python -m pip install kopf --no-cache-dir \ && python -m pip install kubernetes --no-cache-dir \ && python -m pip install pandas numpy matplotlib --no-cache-dir && python -m pip install ray --no-cache-dir \ && python -m pip install ray[tune] --no-cache-dir \ # For chip design flow tools RUN ...
  19. 19 § Design floorplan utilization - Lower utilization: easy, but

    bad for performance, power, and area - Higher utilization: difficult, but good for performance, power, and area § How to find an optimal floorplan utilization? - Find maximum achievable floorplan utilization by sweeping - Individual flow execution is distributed across containerized flow by Ray Ray Code Example: Floorplan Utilization Sweeping Utilization: 50% Utilization: 70%
  20. 22 Ray Code Example: Chip Design Flow Tuning import ray

    from ray import tune ray.init(...) def execute_design_flow(config) -> float: ... def evaluate_design_flow(config): reward = execute_design_flow(config) tune.report(reward=reward) all_trials = tune.run(evaluate_design_flow, search_alg=algo, config=flow_parameters, ...) Flow Tuner • Search algorithm • Max iteration • Max concurrent jobs • Flow execution script Design flow (Black box) Compute reward Apply parameter Configuration Design flow parameters § Regard chip design flow parameters as hyperparameter tuned by Tune § Each tuning trial invokes chip design flow, returning quality metric as reward
  21. 23 1. Background: Chip design flow 2. AI/ML-infused chip design

    flow orchestration 3. Ray code examples 4. Results Outline
  22. 24 § Team demonstrated: - The design environment can be

    fully buildable on IBM Cloud Kubernetes Cluster - The QoRs can be significantly improved via “automated parameter tuning” IBM Telum: On-Chip AI Accelerator Example 24 AI Compute Array (~ 30 mm2) Macro-level Unit-level
  23. 25 Design Flow Tuning Result § Applied flow tuner to

    IBM internal design flow targeting 14nm technology - Took an expert designer’s optimized design result as baseline - Tuning objective: power minimization - Tuning configuration: 17 synthesis and P&R parameters, 50 iterations § Achieved ~9% additional power reduction compared to designer’s baseline Baseline Optimization 1 Optimization 2 Optimization 3 Designer’s manual optimization Flow tuner result
  24. 26 § Chip design challenge: - Reduce overall turn-around time

    while delivering better quality § We built cloud-based, containerized chip design flow with: - Distributed chip design workload execution enabled by Ray - Parallel automatic flow parameter tuning enabled by Tune § We used our platform for IBM’s chip designs and delivered better design outcome with reduced overall turn-around time § Find more details from IBM Research’s Anyscale blog post: Conclusion https://www.anyscale.com/blog/infusing-ai-and-ml-into-integrated-circuit-design-for-faster-chip-delivery