Asian Solid-State Circuits Conference (A-SSCC 2021) Slide 4 Edge Computing Use case Energy Sending 400 samples to the cloud 4mJ Sending just anomaly detection results 1uJ Order of Magnitudes Lower. Normal/Anomaly
− Can maximize reuse of “inputs” or “weights” or “outputs” − Systematic analysis show that similar efficiency can be achieved, regardless of the data-reuse strategy RiSE(Rising Star Express) Forum Slide 6 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) Maximize weight reuse Maximize output reuse
− Can maximize reuse of “inputs” or “weights” or “outputs” − Systematic analysis show that similar efficiency can be achieved, regardless of the data-reuse strategy RiSE(Rising Star Express) Forum Slide 7 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) [Ref] X.Yang, “Interstellar: Using halide's scheduling l anguage to analyze DNN ac celerators” ASPLOS 2020. Maximize weight reuse Maximize output reuse
− How can we go further? • One extreme option: Analog computing − Required DNN arithmetic precision is low (INT2~INT8) − Analog computation can achieve higher efficiency, if not limited by noise RiSE(Rising Star Express) Forum Slide 8 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) Full(FP32) INT2 INT3 INT4 INT5 Resnet50 Image Net top-1 0.769 0.722 0.753 0.765 0.767 Weight+Activation quantized network with PACT J. Choi, “PACT: Parameterized Clipping Activation for Quantized Neural Networks” arXiv:1805.06085 arXiv:1805.06085 arXiv:1805.06085
At high precision (>8b), energy exponentially increase due to kT/C noise − Digital is efficient for binary precision; not much advantage RiSE(Rising Star Express) Forum Slide 9 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) [Ref] B.Murmann, “Mixed-Signal Computing for Deep Neural Netw ork Inference” TVLSI 2021. Binary ~8b
Sweet spot is INT3-6, where analog is not limited by noise − Ideally, analog MAC’s energy increases linearly in this region RiSE(Rising Star Express) Forum Slide 10 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) Sweet spot INT3~6 [Ref] B.Murmann, “Mixed-Signal Computing for Deep Neural Netw ork Inference” TVLSI 2021.
multi-bit analog computation methods that can cover the INT3-6 sweet spot: − Charge-based computing − Phase-based computing • Aiming to replace the Multiply-and-Accumulate (MAC) circuit RiSE(Rising Star Express) Forum Slide 11 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) W[N] IN[N] +
of vector N is done in the analog domain → realize binary MAC − Can integrate weights memory and process as “in-memory computing” RiSE(Rising Star Express) Forum Slide 12 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) [Ref] H. Valavi, “A 64-Tile 2.4-Mb In-Memory- Computing CNN Accelerator Employing Charge -Domain Compute”, JSSC 2019. Inputs [1:N] W[0] IN[0] W[N] IN[N] 8b SAR Accumulate via charge N=2304
of vector N is done in the analog domain → realize binary MAC − Can integrate weights memory and process as “in-memory computing” RiSE(Rising Star Express) Forum Slide 13 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) [Ref] H. Valavi, “A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute”, JSSC 2019. Inputs [1:N] W[0] IN[0] W[N] IN[N] • 1. Process IN[i]*W[i] • 2. Store outputs as charge • 3. Colum caps are shorted to realize analog accumulation • 4. Readout by ADC 8b SAR Accumulate via charge N=2304
− Binary computation can extend to arbitrary precision by “bit-serial” processing RiSE(Rising Star Express) Forum Slide 14 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) 1010 x 0101 1010 0000 1010 0000 110010 4b x 4b broken up to 16 binary multiple&adds C.Eckert, “Neural cache: Bit-serial in-cache acceleration of deep neural networks” ISCA 2018.
Asian Solid-State Circuits Conference (A-SSCC 2021) • Target low-area and 8-bit MAC resolution − Realize analog computation for wide application with low cost Time domain approach → Accumulates pulse length → Multiple DTC required DTC: Digital-to-time-converter DTC DTC DTC DTC [Miyashita, ASSCC2017] Proposed phase domain approach → Accumulates phase → Only single DTC + Gated Ring Oscillator Require digital cells only; small area and scalable DTC Gated Ring Oscillator (GRO) IN Weight Output [Yoshioka, VLSI2018][Toyama, ASSCC2018]
IEEE Asian Solid-State Circuits Conference (A-SSCC 2021) Phase Domain Digital MAC Resolution 1~8 bit 1~64 bit Norm. Area /Bit 1.2 1 Norm. Power 0.125 1 • Target low-area and 8-bit MAC resolution − Realize analog computation for wide application with low cost Proposed phase domain approach → Accumulates phase → Only single DTC + Gated Ring Oscillator Require digital cells only; small area and scalable DTC Gated Ring Oscillator (GRO) IN Weight Output [Yoshioka, VLSI2018][Toyama, ASSCC2018]
Solid-State Circuits Conference (A-SSCC 2021) Slide 21 1. DTC outputs a pulse corresponding to Din Din *tinv DTC Din W Gated Ring Oscillator (GRO) GRO Phase 2π DTC Din “3” “24” Phase saved by gating Counter “0” “1” Seq. 1 W “1” “0.5” Counter
Solid-State Circuits Conference (A-SSCC 2021) Gated Ring Oscillator (GRO) Slide 22 2. GRO phase advances while DTC pulse is high Din *tinv DTC Din W GRO Phase 2π DTC Din “3” “24” Counter “0” “1” Seq. 1 Seq. 2 W “1” “0.5” Counter
Solid-State Circuits Conference (A-SSCC 2021) Gated Ring Oscillator (GRO) Din *tinv DTC Din W GRO Phase 2π DTC Din “3” “24” Phase saved by gating Counter “0” “1” Seq. 1 Seq. 2 W “1” “0.5” 1~3 repeated for number of MACs. When phase reaches 2p, detected by counter Counter Phase=Prev. Phase + 𝑫𝒊𝒏 𝑾 𝟐𝝅 𝟏𝟎
Asian Solid-State Circuits Conference (A-SSCC 2021) Gated Ring Oscillator (GRO) Slide 25 Din *tinv DTC Din W Readout Logic + x10 Phase to digital OUT =15 Counter MSB GRO Phase LSB Din “3” “24” Counter “0” “1” Seq. 1 Seq. 2 W “1” “0.5” During readout, GRO phase and counter value is summed with proper weightings.
within the analog computation sweet spot (INT3-6) − Does not require high-precision ADC • Cons: − Only supports output-stationary dataflows • Cannot adapt in-memory architectures − Only proven with a single MAC circuit • Entire analog accelerator efficiency is unknown RiSE(Rising Star Express) Forum Slide 29 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021)
INT3-6 precision. In this talk, we covered analog computing methods handling multi-bit operations: − Charge-based computing − Phase-based computing • While proven to be power efficient than digital, challenges remain − Flexibility, reliability, noise issues.. − We need to get together with the software guys! • Framework integration • Open-source RiSE(Rising Star Express) Forum Slide 30 IEEE Asian Solid-State Circuits Conference (A-SSCC 2021)