At high precision (>9-10b), energy exponentially increase due to kT/C noise − Digital is efficient for binary precision; not much advantage 最適輸送研究会OT2023 [Ref] B.Murmann, “Mixed-Signal Co mputing for Deep Neural Network In ference” TVLSI 2021. Binary ~9-10b
Sweet spot is INT3-6, where analog is not limited by noise − Ideally, analog MAC’s energy increases linearly in this region 最適輸送研究会OT2023 Sweet spot INT3~6 [Ref] B.Murmann, “Mixed-Signal Computing for Deep Neural Netw ork Inference” TVLSI 2021.
that can cover the INT3-6 sweet spot: − Charge-based computing • Aiming to replace the Multiply-and-Accumulate (MAC) circuit 最適輸送研究会OT2023 W[N] IN[N] +
of vector N is done in the analog domain → realize binary MAC − Can integrate weights memory and process as “in-memory computing” 最適輸送研究会OT2023 [Ref] H. Valavi, “A 64-Tile 2.4-Mb In-Memory- Computing CNN Accelerator Employing Charge -Domain Compute”, JSSC 2019. Inputs [1:N] W[0] IN[0] W[N] IN[N] Accumulate via charge 電荷領域(Q=ΣCV)で演算 ・2000要素のベクトル加算を1サイクルで実施 ・必要回路要素がデジタル回路に比べ少なく、 低電力化を実現 ADC
− Binary computation can extend to arbitrary precision by “bit- serial” processing 最適輸送研究会OT2023 1010 x 0101 1010 0000 1010 0000 110010 4b x 4b broken up to 16 binary multiple&adds C.Eckert, “Neural cache: Bit-serial in-cache acceleration of deep neural networks” ISCA 2018.
resolution − Realize analog computation for wide application with low cost Time domain approach → Accumulates pulse length → Multiple DTC required DTC: Digital-to-time-converter DTC DTC DTC DTC [Miyashita, ASSCC2017] Proposed phase domain approach → Accumulates phase → Only single DTC + Gated Ring Oscillator Require digital cells only; small area and scalable DTC Gated Ring Oscillator (GRO) IN Weight Output [Yoshioka, VLSI2018][Toyama, ASSCC2018]
1~8 bit 1~64 bit Norm. Area /Bit 1.2 1 Norm. Power 0.125 1 • Target low-area and 8-bit MAC resolution − Realize analog computation for wide application with low cost Proposed phase domain approach → Accumulates phase → Only single DTC + Gated Ring Oscillator Require digital cells only; small area and scalable DTC Gated Ring Oscillator (GRO) IN Weight Output [Yoshioka, VLSI2018][Toyama, ASSCC2018]
within the analog computation sweet spot (INT3-6) − Does not require high-precision ADC • Cons: − Only supports output-stationary dataflows • Cannot adapt in-memory architectures − Only proven with a single MAC circuit • Entire analog accelerator efficiency is unknown − 逐次演算のみ対応、並列計算は出来ずスループットは電荷型に劣る 最適輸送研究会OT2023
10 20 30 CIFAR-10 Acc. Compute Accuracy(CSNR) [dB] Conv. Analog CIMs[2-5] Transformer(ViT-tiny) Transformers poses a compute accuracy challenge. Both ADC resolution and noise must be addressed.