Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI/ML-infused digital IC design workflows on the hybrid cloud

AI/ML-infused digital IC design workflows on the hybrid cloud

As the complexity of modern hardware systems explodes, fast and effective design space explorations for better integrated circuit (IC) implementations is becoming more and more difficult to achieve due to higher demands of computational resources. Recent years have seen increasing use of decision intelligence in IC design flows to navigate the design solution space in a more systematic and intelligent manner. To address these problems, we have been working on AI/ML-infused IC design orchestration in order 1) to enable the IC design environment on hybrid cloud platform so that we can easily scale up/down the workloads according to the computation demands; and 2) to produce higher quality of results (QoRs) in shorter total turnaround time (TAT). In this work, we will illustrate how we provide a scalable IC design workload execution that produces higher performance designs by utilizing AI/ML-driven automatic parameter tuning capability. We first demonstrate that we can build a cloud-based IC design environment including containerized digital design flow on Kubernetes clusters. Then, we extend the containerized design flow with the automatic parameter tuning capability using AI/ML techniques. Finally, we demonstrate that the automatic parameter tuning can be executed in a more scalable and distributable manner using the Ray platform. We will use the actual design environment setups, the code snippets, and results from the product IC designs as evidence that the proposed method can produce a higher quality of IC designs using the Ray-based automatic parameter tuning methodologies.

Speakers: Gi-Joon Nam & Jinwook Jung

Anyscale
PRO

June 23, 2022
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Infusing AI and ML into Chip Design
    for Faster Delivery, Better Performance
    Jinwook Jung*, Jenn Kazda, Derren Dunn, Gi-Joon Nam*,
    Raghu Ganti, Mudhakar Srivatsa, Carlos Costa, Rama Divakaruni
    IBM Research, Yorktown Heights, NY
    *: Today’s presenters

    View Slide

  2. 2
    1. Background: Chip design flow
    2. AI/ML-infused chip design flow orchestration
    3. Ray code examples
    4. Results
    Outline

    View Slide

  3. 3
    § Multiple stages involving many tools
    § Large storage requirement
    - PDK and cell libraries (>1TB)
    - Tool installation (~500GB or more)
    - User workspace (~10TB per project)
    § Single established design flow being applied
    to multiple blocks and units
    Chip Design Flow
    RTL
    RTL
    Synthesis
    Floorplanning
    Placement
    Clock Tree Synthesis
    Routing
    Physical Verification
    DFM, Fill
    DRC/LVS
    ERC
    Timing Closure
    Library
    PDK
    Library
    GDS
    GDS Tape-out
    Background: Chip Design Flow

    View Slide

  4. 4
    RTL
    § Flow parameters: change tool behavior
    - Spend more time to search for better structure
    - Do more power optimization at placement
    - Prefer higher layer for clock routing
    § EDA tools have hundreds of parameters
    affecting design outcome
    - Finding optimal parameter setting is difficult
    Chip Design Flow and Flow Parameters
    RTL
    Synthesis
    Floorplanning
    Placement
    Clock Tree Synthesis
    Routing
    Physical Verification
    Timing Closure
    Library
    PDK
    Library
    GDS
    GDS Tape-out
    Same design and flow with
    different flow parameters
    Parameter
    Parameter
    Parameter
    Parameter
    Parameter
    Multiple designs
    Background: Chip Design Flow

    View Slide

  5. 5
    Chip Design Flow as Code
    § Chip design flow itself is “code”
    à Maintained in code repositories
    § Design flow is continuously evolving, e.g.,
    - Floorplan and timing constraint updates
    - New tool, PDK, library versions
    - Flow step and parameter updates
    § Desire for CI/CD* of design flow code
    project-a-flow project-b-flow
    project-c-flow project-d-flow
    project-e-flow
    project-X-flow
    *CI/CD: Continuous integration and continuous delivery
    Background: Chip Design Flow

    View Slide

  6. 6
    1. Background: Chip design flow
    2. AI/ML-infused chip design flow orchestration
    3. Ray code examples
    4. Results
    Outline

    View Slide

  7. 7
    1. Containerized design flow enablement
    - Custom container images for design flow execution
    - Volume resources for PDK/libraries, tools, workspace
    - Batch job deployment capability
    2. ML-based design flow tuning with containerized design flow
    - Implemented ML-based flow tuner using containerized design flow
    - Applied tuner to IBM’s chip designs and delivered better design outcome
    3. Cloud-native design flow CI/CD
    - CI/CD pipeline around containerized flow
    - IBM design team uses it for nightly flow build
    AI/ML-Infused Chip Design Flow Orchestration
    Jenkins Pod
    Worker Pod
    MongoDB Pod Dashboard Pod
    Kubernetes Cluster
    https://github.ibm.com/zeus2-pd
    edaas/centos metrics-dashboard
    jenkins mongodb
    custom-container-image
    project-X-flow
    Compute Node
    Compute Node
    Tools
    PV Pod Pod Pod Pod
    User
    PV
    PDK/Lib
    PV
    Container Container Container
    Container
    Flow description
    Flow description
    Custom container image
    Kubernetes Cluster

    View Slide

  8. 8
    § Multiple stages involving many tools
    § Large storage requirement
    - PDK and cell libraries (>1TB)
    - Tool installation (~500GB or more)
    - User workspace (~10TB per project)
    § Single established design flow being applied
    to multiple blocks and units
    Chip Design Flow: Characteristics
    RTL
    RTL
    Synthesis
    Floorplanning
    Placement
    Clock Tree Synthesis
    Routing
    Physical Verification
    DFM, Fill
    DRC/LVS
    ERC
    Timing Closure
    Library
    PDK
    Library
    GDS
    GDS Tape-out
    (3)
    (1)
    (2)
    Background: Chip Design Flow

    View Slide

  9. 9
    § Custom container images of tools for design flow execution
    § Volume resources for hosting PDK/libraries, tools, and user workspace
    § Parallel workload distribution capability enabled by Ray
    - Given user’s flow descriptions, multiple flow executions are distributed
    Containerized Chip Design Flow Enablement
    *PV: Persistent Volume
    **Pod: Construct for running containers in cloud orchestrator like Kubernetes
    (3)
    (1)
    (2)
    Compute Node
    Compute Node
    Tools
    PV Pod Pod Pod Pod
    User
    PV
    PDK/Lib
    PV
    Container Container Container
    Container
    Flow description
    Flow description
    Custom container image
    Kubernetes Cluster

    View Slide

  10. 10
    RTL
    § Flow parameters: change tool behavior
    - Spend more time to search for better structure
    - Do more power optimization at placement
    - Prefer higher layer for clock routing
    § EDA tools have hundreds of parameters
    affecting design outcome
    - Finding optimal parameter setting is difficult
    Chip Design Flow and Flow Parameters
    RTL
    Synthesis
    Floorplanning
    Placement
    Clock Tree Synthesis
    Routing
    Physical Verification
    Timing Closure
    Library
    PDK
    Library
    GDS
    GDS Tape-out
    Parameter
    Parameter
    Parameter
    Parameter
    Parameter
    Multiple designs
    Same design and flow with
    different flow parameters
    Background: Chip Design Flow

    View Slide

  11. 11
    § Design flow agnostic implementation à no dependency on tools and flows
    § Flexible interface: parameter definition, max iteration, parallelism, etc.
    § Various parameter search algorithms supported (e.g., hyperopt, skopt)
    ML-Based Chip Design Flow Tuner
    Flow Tuner
    • Search algorithm
    • Max iteration
    • Max concurrent jobs
    • Flow execution script
    • Parameter names
    • Ranges or choices
    • Default values
    Design flow
    (Black box)
    Compute reward
    Apply parameter
    Input: config, tunable parameters
    Output: best results with parameters
    1. Read configuration and parameters
    2. iteration := 0
    3. while iteration++ < max_iteration do:
    4. Create new parameter setting
    5. Execute design flow
    6. Compute reward
    7. return best results with parameters
    Pseudo code
    ML
    Configuration
    Design flow
    parameters

    View Slide

  12. 12
    § Extended chip design flow tuner using Ray and Tune
    § Tuning trials are distributed by Ray and executed within Kubernetes cluster
    § Using parallel search algorithm, we can run multiple tuning trials simultaneously
    Chip Design Flow Tuning with Containerized Flow
    *PV: Persistent Volume
    **Pod: Construct for running containers in cloud orchestrator like Kubernetes
    Compute Node
    Compute Node
    Pod**
    Kubernetes Cluster
    Pod Pod Pod
    Container Container Container
    Container
    Parameters
    Flow Tuner
    Tools
    PV
    User
    PV
    PV*
    Flow tuning trials
    distributed by Ray
    PDK/Lib
    Custom container image

    View Slide

  13. 13
    Chip Design Flow as Code
    § Chip design flow itself is “code”
    à Maintained in code repositories
    § Design flow is continuously evolving, e.g.,
    - Floorplan and timing constraint updates
    - New tool, PDK, library versions
    - Flow step and parameter updates
    § Desire for CI/CD* of design flow code
    project-a-flow project-b-flow
    project-c-flow project-d-flow
    project-e-flow
    project-X-flow
    *CI/CD: Continuous integration and continuous delivery
    Background: Chip Design Flow

    View Slide

  14. 14
    Kubernetes Cluster
    § Developed CI/CD pipeline utilizing containerized chip design flow
    § In IBM, we use it for nightly design flow build covering entire chip design flow
    - Synthesis à Floorplan à Placement à CTS à Routing à DFM/Fill/DRC/LVS
    Cloud-Native Design Flow CI/CD
    Jenkins Pod
    Metrics Dashboard
    Dashboard Pod
    Worker Pod
    Scalable
    Worker Pod Database Pod
    project-X-flow

    View Slide

  15. 15
    Kubernetes Cluster
    Jenkins Pod
    Metrics Dashboard
    Dashboard Pod
    Worker Pod
    Worker Pod Database Pod
    § Continuously and autonomously evolving design flow enabled by
    - Cloud-native containerized design flow enablement
    - ML-based design flow tuning
    ML-Driven Continuous Design Flow Evolution at Scale
    Design and flow updates
    invoking CI/CD pipeline
    Many workers run at scale
    with different parameters
    Quality of result
    Time
    Continuous QoR
    Improvement
    Knowledge database
    project-X-flow
    Scalable

    View Slide

  16. 16
    1. Background: Chip design flow
    2. AI/ML-infused chip design flow orchestration
    3. Ray deployment and code examples
    4. Results
    Outline

    View Slide

  17. 17
    § Containerized design flow running on IBM Cloud
    § Ray Kubernetes Operator* to deploy Ray on containerized design flow
    - Used custom Docker image for running both Ray and chip design tools
    - Modified Ray Helm Chart to mount volumes properly
    Deploying Ray Cluster on Containerized Design Flow
    Compute Node
    Compute Node
    Tools
    PV Pod Pod Pod Pod
    User
    PV
    PDK/Lib
    PV
    Container Container Container
    Container
    Flow description
    Flow description
    Custom container image
    Kubernetes Cluster
    *Link: https://docs.ray.io/en/latest/cluster/kubernetes.html#deploying-on-kubernetes

    View Slide

  18. 18
    Dockerfile Example
    FROM centos:8
    # For kubectl
    RUN dnf install -y kubectl
    # For Ray
    RUN dnf -y update \
    && dnf install -y python39 kubectl \
    && alternatives --set python /usr/bin/python3 \
    && python -m pip install kopf --no-cache-dir \
    && python -m pip install kubernetes --no-cache-dir \
    && python -m pip install pandas numpy matplotlib --no-cache-dir
    && python -m pip install ray --no-cache-dir \
    && python -m pip install ray[tune] --no-cache-dir \
    # For chip design flow tools
    RUN ...

    View Slide

  19. 19
    § Design floorplan utilization
    - Lower utilization: easy, but bad for performance, power, and area
    - Higher utilization: difficult, but good for performance, power, and area
    § How to find an optimal floorplan utilization?
    - Find maximum achievable floorplan utilization by sweeping
    - Individual flow execution is distributed across containerized flow by Ray
    Ray Code Example: Floorplan Utilization Sweeping
    Utilization: 50%
    Utilization: 70%

    View Slide

  20. 20
    Ray Code Example: Floorplan Utilization Sweeping

    View Slide

  21. 21
    Ray Code Example: Floorplan Utilization Sweeping

    View Slide

  22. 22
    Ray Code Example: Chip Design Flow Tuning
    import ray
    from ray import tune
    ray.init(...)
    def execute_design_flow(config) -> float:
    ...
    def evaluate_design_flow(config):
    reward = execute_design_flow(config)
    tune.report(reward=reward)
    all_trials = tune.run(evaluate_design_flow,
    search_alg=algo,
    config=flow_parameters, ...)
    Flow Tuner
    • Search algorithm
    • Max iteration
    • Max concurrent jobs
    • Flow execution script
    Design flow
    (Black box)
    Compute reward
    Apply parameter
    Configuration
    Design flow
    parameters
    § Regard chip design flow parameters as hyperparameter tuned by Tune
    § Each tuning trial invokes chip design flow, returning quality metric as reward

    View Slide

  23. 23
    1. Background: Chip design flow
    2. AI/ML-infused chip design flow orchestration
    3. Ray code examples
    4. Results
    Outline

    View Slide

  24. 24
    § Team demonstrated:
    - The design environment can be fully buildable on IBM Cloud Kubernetes Cluster
    - The QoRs can be significantly improved via “automated parameter tuning”
    IBM Telum: On-Chip AI Accelerator Example
    24
    AI Compute Array
    (~ 30 mm2)
    Macro-level
    Unit-level

    View Slide

  25. 25
    Design Flow Tuning Result
    § Applied flow tuner to IBM internal design flow targeting 14nm technology
    - Took an expert designer’s optimized design result as baseline
    - Tuning objective: power minimization
    - Tuning configuration: 17 synthesis and P&R parameters, 50 iterations
    § Achieved ~9% additional power reduction compared to designer’s baseline
    Baseline Optimization 1 Optimization 2 Optimization 3
    Designer’s manual optimization
    Flow tuner result

    View Slide

  26. 26
    § Chip design challenge:
    - Reduce overall turn-around time while delivering better quality
    § We built cloud-based, containerized chip design flow with:
    - Distributed chip design workload execution enabled by Ray
    - Parallel automatic flow parameter tuning enabled by Tune
    § We used our platform for IBM’s chip designs and delivered better design
    outcome with reduced overall turn-around time
    § Find more details from IBM Research’s Anyscale blog post:
    Conclusion
    https://www.anyscale.com/blog/infusing-ai-and-ml-into-integrated-circuit-design-for-faster-chip-delivery

    View Slide