AI/ML-infused digital IC design workflows on the hybrid cloud

Infusing AI and ML into Chip Design for Faster Delivery,
Better Performance Jinwook Jung*, Jenn Kazda, Derren Dunn, Gi-Joon Nam*, Raghu Ganti, Mudhakar Srivatsa, Carlos Costa, Rama Divakaruni IBM Research, Yorktown Heights, NY *: Today’s presenters

2 1. Background: Chip design flow 2. AI/ML-infused chip design
flow orchestration 3. Ray code examples 4. Results Outline

3 § Multiple stages involving many tools § Large storage
requirement - PDK and cell libraries (>1TB) - Tool installation (~500GB or more) - User workspace (~10TB per project) § Single established design flow being applied to multiple blocks and units Chip Design Flow RTL RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification DFM, Fill DRC/LVS ERC Timing Closure Library PDK Library GDS GDS Tape-out Background: Chip Design Flow

4 RTL § Flow parameters: change tool behavior - Spend
more time to search for better structure - Do more power optimization at placement - Prefer higher layer for clock routing § EDA tools have hundreds of parameters affecting design outcome - Finding optimal parameter setting is difficult Chip Design Flow and Flow Parameters RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification Timing Closure Library PDK Library GDS GDS Tape-out Same design and flow with different flow parameters Parameter Parameter Parameter Parameter Parameter Multiple designs Background: Chip Design Flow

5 Chip Design Flow as Code § Chip design flow
itself is “code” à Maintained in code repositories § Design flow is continuously evolving, e.g., - Floorplan and timing constraint updates - New tool, PDK, library versions - Flow step and parameter updates § Desire for CI/CD* of design flow code project-a-flow project-b-flow project-c-flow project-d-flow project-e-flow project-X-flow *CI/CD: Continuous integration and continuous delivery Background: Chip Design Flow

7 1. Containerized design flow enablement - Custom container images
for design flow execution - Volume resources for PDK/libraries, tools, workspace - Batch job deployment capability 2. ML-based design flow tuning with containerized design flow - Implemented ML-based flow tuner using containerized design flow - Applied tuner to IBM’s chip designs and delivered better design outcome 3. Cloud-native design flow CI/CD - CI/CD pipeline around containerized flow - IBM design team uses it for nightly flow build AI/ML-Infused Chip Design Flow Orchestration Jenkins Pod Worker Pod MongoDB Pod Dashboard Pod Kubernetes Cluster https://github.ibm.com/zeus2-pd edaas/centos metrics-dashboard jenkins mongodb custom-container-image project-X-flow Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster

8 § Multiple stages involving many tools § Large storage
requirement - PDK and cell libraries (>1TB) - Tool installation (~500GB or more) - User workspace (~10TB per project) § Single established design flow being applied to multiple blocks and units Chip Design Flow: Characteristics RTL RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification DFM, Fill DRC/LVS ERC Timing Closure Library PDK Library GDS GDS Tape-out (3) (1) (2) Background: Chip Design Flow

9 § Custom container images of tools for design flow
execution § Volume resources for hosting PDK/libraries, tools, and user workspace § Parallel workload distribution capability enabled by Ray - Given user’s flow descriptions, multiple flow executions are distributed Containerized Chip Design Flow Enablement *PV: Persistent Volume **Pod: Construct for running containers in cloud orchestrator like Kubernetes (3) (1) (2) Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster

10 RTL § Flow parameters: change tool behavior - Spend
more time to search for better structure - Do more power optimization at placement - Prefer higher layer for clock routing § EDA tools have hundreds of parameters affecting design outcome - Finding optimal parameter setting is difficult Chip Design Flow and Flow Parameters RTL Synthesis Floorplanning Placement Clock Tree Synthesis Routing Physical Verification Timing Closure Library PDK Library GDS GDS Tape-out Parameter Parameter Parameter Parameter Parameter Multiple designs Same design and flow with different flow parameters Background: Chip Design Flow

11 § Design flow agnostic implementation à no dependency on
tools and flows § Flexible interface: parameter definition, max iteration, parallelism, etc. § Various parameter search algorithms supported (e.g., hyperopt, skopt) ML-Based Chip Design Flow Tuner Flow Tuner • Search algorithm • Max iteration • Max concurrent jobs • Flow execution script • Parameter names • Ranges or choices • Default values Design flow (Black box) Compute reward Apply parameter Input: config, tunable parameters Output: best results with parameters 1. Read configuration and parameters 2. iteration := 0 3. while iteration++ < max_iteration do: 4. Create new parameter setting 5. Execute design flow 6. Compute reward 7. return best results with parameters Pseudo code ML Configuration Design flow parameters

12 § Extended chip design flow tuner using Ray and
Tune § Tuning trials are distributed by Ray and executed within Kubernetes cluster § Using parallel search algorithm, we can run multiple tuning trials simultaneously Chip Design Flow Tuning with Containerized Flow *PV: Persistent Volume **Pod: Construct for running containers in cloud orchestrator like Kubernetes Compute Node Compute Node Pod** Kubernetes Cluster Pod Pod Pod Container Container Container Container Parameters Flow Tuner Tools PV User PV PV* Flow tuning trials distributed by Ray PDK/Lib Custom container image

13 Chip Design Flow as Code § Chip design flow
itself is “code” à Maintained in code repositories § Design flow is continuously evolving, e.g., - Floorplan and timing constraint updates - New tool, PDK, library versions - Flow step and parameter updates § Desire for CI/CD* of design flow code project-a-flow project-b-flow project-c-flow project-d-flow project-e-flow project-X-flow *CI/CD: Continuous integration and continuous delivery Background: Chip Design Flow

14 Kubernetes Cluster § Developed CI/CD pipeline utilizing containerized chip
design flow § In IBM, we use it for nightly design flow build covering entire chip design flow - Synthesis à Floorplan à Placement à CTS à Routing à DFM/Fill/DRC/LVS Cloud-Native Design Flow CI/CD Jenkins Pod Metrics Dashboard Dashboard Pod Worker Pod Scalable Worker Pod Database Pod project-X-flow

15 Kubernetes Cluster Jenkins Pod Metrics Dashboard Dashboard Pod Worker
Pod Worker Pod Database Pod § Continuously and autonomously evolving design flow enabled by - Cloud-native containerized design flow enablement - ML-based design flow tuning ML-Driven Continuous Design Flow Evolution at Scale Design and flow updates invoking CI/CD pipeline Many workers run at scale with different parameters Quality of result Time Continuous QoR Improvement Knowledge database project-X-flow Scalable

flow orchestration 3. Ray deployment and code examples 4. Results Outline

17 § Containerized design flow running on IBM Cloud §
Ray Kubernetes Operator* to deploy Ray on containerized design flow - Used custom Docker image for running both Ray and chip design tools - Modified Ray Helm Chart to mount volumes properly Deploying Ray Cluster on Containerized Design Flow Compute Node Compute Node Tools PV Pod Pod Pod Pod User PV PDK/Lib PV Container Container Container Container Flow description Flow description Custom container image Kubernetes Cluster *Link: https://docs.ray.io/en/latest/cluster/kubernetes.html#deploying-on-kubernetes

18 Dockerfile Example FROM centos:8 # For kubectl RUN dnf
install -y kubectl # For Ray RUN dnf -y update \ && dnf install -y python39 kubectl \ && alternatives --set python /usr/bin/python3 \ && python -m pip install kopf --no-cache-dir \ && python -m pip install kubernetes --no-cache-dir \ && python -m pip install pandas numpy matplotlib --no-cache-dir && python -m pip install ray --no-cache-dir \ && python -m pip install ray[tune] --no-cache-dir \ # For chip design flow tools RUN ...

19 § Design floorplan utilization - Lower utilization: easy, but
bad for performance, power, and area - Higher utilization: difficult, but good for performance, power, and area § How to find an optimal floorplan utilization? - Find maximum achievable floorplan utilization by sweeping - Individual flow execution is distributed across containerized flow by Ray Ray Code Example: Floorplan Utilization Sweeping Utilization: 50% Utilization: 70%

20 Ray Code Example: Floorplan Utilization Sweeping

21 Ray Code Example: Floorplan Utilization Sweeping

22 Ray Code Example: Chip Design Flow Tuning import ray
from ray import tune ray.init(...) def execute_design_flow(config) -> float: ... def evaluate_design_flow(config): reward = execute_design_flow(config) tune.report(reward=reward) all_trials = tune.run(evaluate_design_flow, search_alg=algo, config=flow_parameters, ...) Flow Tuner • Search algorithm • Max iteration • Max concurrent jobs • Flow execution script Design flow (Black box) Compute reward Apply parameter Configuration Design flow parameters § Regard chip design flow parameters as hyperparameter tuned by Tune § Each tuning trial invokes chip design flow, returning quality metric as reward

24 § Team demonstrated: - The design environment can be
fully buildable on IBM Cloud Kubernetes Cluster - The QoRs can be significantly improved via “automated parameter tuning” IBM Telum: On-Chip AI Accelerator Example 24 AI Compute Array (~ 30 mm2) Macro-level Unit-level

25 Design Flow Tuning Result § Applied flow tuner to
IBM internal design flow targeting 14nm technology - Took an expert designer’s optimized design result as baseline - Tuning objective: power minimization - Tuning configuration: 17 synthesis and P&R parameters, 50 iterations § Achieved ~9% additional power reduction compared to designer’s baseline Baseline Optimization 1 Optimization 2 Optimization 3 Designer’s manual optimization Flow tuner result

26 § Chip design challenge: - Reduce overall turn-around time
while delivering better quality § We built cloud-based, containerized chip design flow with: - Distributed chip design workload execution enabled by Ray - Parallel automatic flow parameter tuning enabled by Tune § We used our platform for IBM’s chip designs and delivered better design outcome with reduced overall turn-around time § Find more details from IBM Research’s Anyscale blog post: Conclusion https://www.anyscale.com/blog/infusing-ai-and-ml-into-integrated-circuit-design-for-faster-chip-delivery

AI/ML-infused digital IC design workflows on th...

AI/ML-infused digital IC design workflows on the hybrid cloud

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

Infusing AI and ML into Chip Design for Faster Delivery,

2 1. Background: Chip design flow 2. AI/ML-infused chip design

3 § Multiple stages involving many tools § Large storage

4 RTL § Flow parameters: change tool behavior - Spend

5 Chip Design Flow as Code § Chip design flow

6 1. Background: Chip design flow 2. AI/ML-infused chip design

7 1. Containerized design flow enablement - Custom container images

8 § Multiple stages involving many tools § Large storage

9 § Custom container images of tools for design flow

10 RTL § Flow parameters: change tool behavior - Spend

11 § Design flow agnostic implementation à no dependency on

12 § Extended chip design flow tuner using Ray and

13 Chip Design Flow as Code § Chip design flow

14 Kubernetes Cluster § Developed CI/CD pipeline utilizing containerized chip

15 Kubernetes Cluster Jenkins Pod Metrics Dashboard Dashboard Pod Worker

16 1. Background: Chip design flow 2. AI/ML-infused chip design

17 § Containerized design flow running on IBM Cloud §

18 Dockerfile Example FROM centos:8 # For kubectl RUN dnf

19 § Design floorplan utilization - Lower utilization: easy, but

20 Ray Code Example: Floorplan Utilization Sweeping

21 Ray Code Example: Floorplan Utilization Sweeping

22 Ray Code Example: Chip Design Flow Tuning import ray

23 1. Background: Chip design flow 2. AI/ML-infused chip design

24 § Team demonstrated: - The design environment can be

25 Design Flow Tuning Result § Applied flow tuner to

26 § Chip design challenge: - Reduce overall turn-around time