Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Artificial Intelligence and Systems Laboratory (AISys): A Research Overview

Artificial Intelligence and Systems Laboratory (AISys): A Research Overview

A research overview of AISys lab at USC.

Pooyan Jamshidi

April 15, 2023
Tweet

More Decks by Pooyan Jamshidi

Other Decks in Research

Transcript

  1. Arti
    f
    icial Intelligence and Systems
    Laboratory (AISys)
    Research Overview
    Pooyan Jamshidi


    University of South Carolina
    https://pooyanjamshidi.github.io/AISys/

    View Slide

  2. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  3. Building reliable models that produce causal explanations for performance
    debugging and transfer better to new environments
    Cache Misses
    Throughput (FPS)
    20
    10
    0
    100k 200k
    Cache Misses
    Throughput (FPS)
    LRU
    FIFO
    LIFO
    MRU
    20
    10
    0
    100k 200k
    Cache Policy a
    ff
    ects
    Throughput via Cache Misses.
    Cache


    Policy
    Cache


    Misses
    Through


    put

    View Slide

  4. FlexiBO: A multi-objective optimization that tradeoff information gain with
    the cost of design evaluations
    • FlexiBO is a cost-aware
    approach for multi-
    objective optimization
    that iteratively selects a
    design and an objective
    for evaluation.

    • It allows us to trade o
    ff
    the additional
    information gained
    through an evaluation
    and the cost incurred
    due to the evaluation.

    View Slide

  5. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  6. Finding root causes of configuration issues in highly-configurable robots
    We discovered that root causes
    of task failures in robots could
    be captured by causal e
    ff
    ect
    estimation of task inputs and
    robot con
    fi
    gurations.

    View Slide

  7. Sim-to-real by enabling causal transfer learning


































    Causal models learned in
    simulation can be
    transferred to real robots to
    fi
    nd the root causes of
    failures of physical robots.

    View Slide

  8. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  9. Looking for winning tickets in over-parametrized networks
    • How robust are the discovered sub-networks
    (e.g., adversarial attack, distributional shift)?

    • Is there any always-winning lottery ticket
    hidden in a randomly initialized network?

    • Is it possible to train the sparse sub-network
    e
    ffi
    ciently?

    View Slide

  10. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  11. • Contrastive learning (CL)
    without label information is less
    robust than other learning
    schemes.

    • Semi-supervised learning (SL-
    CL or SCL-CL) is more robust
    than CL.
    Is there anything special about contrastive learning in terms of adversarial
    robustness?

    View Slide

  12. • Adversarial training causes
    similar representations
    between consecutive layers.

    • Fully adversarial fine-tuning can
    improve clean accuracy (red line)
    and robustness (blue line) by
    eliminating these similarities.
    • The lack of differentiated layer-
    wise representations after
    adversarial training may hinder
    neural networks from achieving
    high clean/adversarial accuracy.
    Is there anything special about contrastive learning in terms of adversarial
    robustness?

    View Slide

  13. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  14. Hardware-aware partitioning and mapping for multi-chiplet and multi-card
    AI inference systems
    Partitioned
    Computation Graph
    Pipeline Schedule
    M0-C0 B-1
    B-1
    M0-C1
    M1-C8 B-n
    B-1
    B-2
    B-2 B-n
    B-n
    SG1
    SG2
    SG2
    Mapping
    M1
    M2
    M3
    M4
    HOST/CPU
    PCIE Switch PCIE Switch
    M5
    M6
    M7
    M8
    C0 C1 C2
    PCIE
    C4 C5 C6 C7
    D2D
    C0 C1 C2 C3
    PCIE
    C4 C5 C6 C7
    D2D
    Hetrogeneous System
    Interconnect Graph
    PE0
    PE3
    PE1
    PE2
    D2D-M
    D2D-N
    DDR-N
    DDR-S
    PCIE
    Vendor A
    Intra-Chiplet Interconnect Graph
    PE0
    PE3
    CONV
    RISCV
    D2D-M D2D-N
    DDR-S
    PCIE
    Vendor B
    Intra-Chiplet Interconnect Graph
    Workload
    Computation Graph
    C3
    Time
    Module# - Chiplet#
    FRAMEWORK OUTPUTS
    FRAMEWORK INPUTS
    1
    2
    3
    4
    5
    3
    2
    Inter-Chiplet
    Interconnect Graph
    Inter-Chiplet
    Interconnect Graph

    View Slide

  15. Partitioned
    Computation Graph
    Pipeline Schedule
    M0-C0 B-1
    B-1
    M0-C1
    M1-C8 B-n
    B-1
    B-2
    B-2 B-n
    B-n
    SG1
    SG2
    SG2
    Mapping
    M1
    M2
    M3
    M4
    HOST/CPU
    PCIE Switch PCIE Switch
    M5
    M6
    M7
    M8
    C0 C1 C2
    PCIE
    C4 C5 C6 C7
    D2D
    C0 C1 C2 C3
    PCIE
    C4 C5 C6 C7
    D2D
    Hetrogeneous System
    Interconnect Graph
    PE0
    PE3
    PE1
    PE2
    D2D-M
    D2D-N
    DDR-N
    DDR-S
    PCIE
    Vendor A
    Intra-Chiplet Interconnect Graph
    PE0
    PE3
    CONV
    RISCV
    D2D-M D2D-N
    DDR-S
    PCIE
    Vendor B
    Intra-Chiplet Interconnect Graph
    Workload
    Computation Graph
    C3
    Time
    Module# - Chiplet#
    FRAMEWORK OUTPUTS
    FRAMEWORK INPUTS
    1
    2
    3
    4
    5
    3
    2
    Inter-Chiplet
    Interconnect Graph
    Inter-Chiplet
    Interconnect Graph
    Hardware-aware partitioning and mapping for multi-chiplet and multi-card
    AI inference systems

    View Slide

  16. Partitioned
    Computation Graph
    Pipeline Schedule
    M0-C0 B-1
    B-1
    M0-C1
    M1-C8 B-n
    B-1
    B-2
    B-2 B-n
    B-n
    SG1
    SG2
    SG2
    Mapping
    M1
    M2
    M3
    M4
    HOST/CPU
    PCIE Switch PCIE Switch
    M5
    M6
    M7
    M8
    C0 C1 C2
    PCIE
    C4 C5 C6 C7
    D2D
    C0 C1 C2 C3
    PCIE
    C4 C5 C6 C7
    D2D
    Hetrogeneous System
    Interconnect Graph
    PE0
    PE3
    PE1
    PE2
    D2D-M
    D2D-N
    DDR-N
    DDR-S
    PCIE
    Vendor A
    Intra-Chiplet Interconnect Graph
    PE0
    PE3
    CONV
    RISCV
    D2D-M D2D-N
    DDR-S
    PCIE
    Vendor B
    Intra-Chiplet Interconnect Graph
    Workload
    Computation Graph
    C3
    Time
    Module# - Chiplet#
    FRAMEWORK OUTPUTS
    FRAMEWORK INPUTS
    1
    2
    4
    5
    6
    4
    2
    Inter-Chiplet
    Interconnect Graph
    Inter-Chiplet
    Interconnect Graph
    C
    C
    C
    C
    C
    C
    C
    CA
    C
    C
    C
    C
    C
    C
    C
    CB
    3
    Set of
    Chiplets
    Hardware-aware partitioning and mapping for multi-chiplet and multi-card
    AI inference systems

    View Slide

  17. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  18. Reconciling high accuracy, cost-efficiency, and low latency of inference
    serving systems
    • Model variants provide a
    different level of accuracy/
    latency trade-offs.

    • Models’ performance varies
    under different resource
    assignments.

    View Slide

  19. Reconciling high accuracy, cost-efficiency, and low latency of inference
    serving systems

    View Slide

  20. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  21. A new paradigm that integrates probabilistic model checking with causal
    inference to enable planning and verification in autonomous systems
    • RQ1: How can structural causal models be integrated with probabilistic model
    checking to provide a framework for planning tasks in autonomous systems?

    • RQ2: How can counterfactual reasoning be integrated with probabilistic model
    checking to analyze the effect of interventions that have not been observed in
    the system's behavior?

    View Slide

  22. Independent modular networks for learning robust and disentangled
    representations
    • Modular networks can automatically
    decompose the shapes into different
    learnable representations.

    • With the introduction of the ID
    classifier, the decomposition is
    improved significantly, where a large
    majority of the images for each
    shape are passed through one
    module.

    View Slide

  23. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  24. Pretrained language models are symbolic mathematics solvers too!
    • Does this pre-trained model help us to
    use fewer data for fine-tuning?

    • Does the result of this fine-tuning depend
    on the languages used for pretraining?

    • How robust is this fine-tuned model with
    respect to the distribution shift of test
    data compared to fine-tuning data?

    View Slide

  25. Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)


    co-advised by Forest Agostinelli
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide

  26. Unit cycle architecture: an intrinsically faster approach
    • Problem:

    • Single/multi-cycle microarchitectures waste time using the
    critical path.

    • If the longest instruction takes 1100 ps, then every instruction
    takes 1100 ps.

    • Solution:

    • Use a timer to measure the time.

    • Set the timer to the duration of the instruction.

    • When the timer runs out, move to the next instruction.
    Single-Cycle Multi-Cycle Unit-Cycle
    Clock Period (ps) 1100 300 100
    Cycles Executed 360 1,316 2,748
    Execution Time (ps) 396,000 394,800 274,800
    Benchmark program: Square root

    Unit-Cycle is more than 40% faster than Single-Cycle or Multi-Cycle

    View Slide

  27. Arti
    fi
    cial Intelligence and Systems Laboratory (AISys)
    https://pooyanjamshidi.github.io/AISys/
    Research Areas:


    - Causal AI


    - ML for Systems


    - Systems for ML


    - Adversarial ML


    - Robot Learning


    - Representation Learning
    Sponsors:
    Collaborators:
    Saeid Ghafouri


    (PhD student)
    Fatemeh Ghofrani


    (PhD student)
    Abir Hossen


    (PhD student)
    Shahriar Iqbal


    (PhD student)
    Sonam Kharde


    (Postdoc)
    Hamed Damirchi


    (PhD student)
    Mehdi Yaghouti


    (Postdoc)
    Samuel Whidden


    (Undergraduate)
    Rasool Shari
    fi

    (PhD student)
    Kimia Noorbakhsh


    (Undergraduate)

    View Slide