Slide 1

Slide 1 text

FLEXIBILITY, HARDWARE REUSE AND POWER CONSUMPTION ISSUES IN THE DIGITAL FRONT-END OF MULTISTANDARD SDR HANDSETS Navin Michael Navin Michael School of Computer Engineering, Nanyang Technological University, SCEE Supelec, Rennes y g g y, Singapore p , France

Slide 2

Slide 2 text

FLEXIBILITY VS. ENERGY EFFICIENCY IN 4G 4G core IP network • Higher throughput requirement • Energy Efficiencies ~1 TOPS/W gy • Flexibility needed for seamless mobility LTE Wi-Fi • Flexible hardware less energy efficient _than custom hardware Wimax Seamless Mobility Seamless Mobility Mobility Standard agnostic terminals enabled by Need to reduce the area and power penalty of flexible portions of the radio terminals enabled by software defined radio

Slide 3

Slide 3 text

ENERGY EFFICIENCY OF DIGITAL PLATFORMS 3 ASIC 100 – 1000 Gops/W Energy Efficiency (Mops/mW) FPGA 10 – 100 Gops/W DSP GPP 0 1 1 G /W 1 – 10 Gops/W Flexibility GPP 0.1 – 1 Gops/W y [1] Rabaey J M "Wireless beyond the third generation facing the energy challenge " Low Power Electronics [1] Rabaey, J.M., Wireless beyond the third generation-facing the energy challenge , Low Power Electronics and Design, International Symposium on, 2001. , vol., no., pp.1-3, 2001

Slide 4

Slide 4 text

SDR TERMINAL RECEIVER CHAIN : DIGITAL FRONT-END Programmable Analog Front End Programmable ΣΔ ADC Flexible Digital Front-end SDR Baseband Computational load is strongly dependent on the design of the analog front-end and ADC the analog front-end and ADC.

Slide 5

Slide 5 text

SDR TERMINAL RECEIVER CHAIN : DIGITAL FRONT-END Programmable Analog Front End Programmable ΣΔ ADC Flexible Digital Front-end SDR Baseband A “fixed digitization bandwidth” relaxes the flexibility requirement of the analog front-end in multistandard radios requirement of the analog front-end in multistandard radios

Slide 6

Slide 6 text

FIXED DIGITIZATION BANDWIDTH FIXED DIGITIZATION BANDWIDTH Coarse band selected by the analog front--end Interferer Channel of interest Standard -1 Channel of Interferers interest Standard -2 Channel of interest Standard -3

Slide 7

Slide 7 text

SDR TERMINAL RECEIVER CHAIN : DIGITAL FRONT-END Programmable Analog Front End Programmable ΣΔ ADC Flexible Digital Front-end SDR Baseband • Highly computationally intensive O C • Operates on highly oversampled ADC output • Needs to be implemented using a flexible ASIC HW accelerator

Slide 8

Slide 8 text

MULTISTANDARD CHANNELIZATION ACCELERATOR Functions • Channel Selection • Channel Selection • Sample Rate Conversion • Interferer Attenuation P l Sh d Filt i • Pulse Shaped Filtering

Slide 9

Slide 9 text

MULTISTANDARD CHANNELIZATION ACCELERATOR Variable channel bandwidths Variable channel bandwidths and band–edge specifications Functions • Channel Selection • Sample Rate Conversion p • Interferer Attenuation • Pulse Shaped Filtering

Slide 10

Slide 10 text

MULTISTANDARD CHANNELIZATION ACCELERATOR Variable SRC factors (integral or Functions Variable SRC factors (integral or rational) Functions • Channel Selection • Sample Rate Conversion I f A i • Interferer Attenuation • Pulse Shaped Filtering

Slide 11

Slide 11 text

MULTISTANDARD CHANNELIZATION ACCELERATOR Variable interferer location and Variable interferer location and power levels Functions • Channel Selection • Sample Rate Conversion p • Interferer Attenuation • Pulse Shaped Filtering

Slide 12

Slide 12 text

MULTISTANDARD CHANNELIZATION ACCELERATOR V i bl ll ff f t Variable roll-off factor Functions • Channel Selection • Sample Rate Conversion p • Interferer Attenuation • Pulse Shaped Filtering

Slide 13

Slide 13 text

DESIGN SPACE OF A FLEXIBLE HW S G S C O W ACCELERATOR Area Single Mode HW Single Mode HW Accelerator Power

Slide 14

Slide 14 text

DESIGN SPACE OF A FLEXIBLE HW S G S C O W ACCELERATOR Area Single Mode HW Single Mode HW Accelerator Power Behavioral Optimizations •Constant propagation •Constant propagation •Common subexpression elimination •Operator strength reduction •Operator strength reduction

Slide 15

Slide 15 text

DESIGN SPACE OF A FLEXIBLE HW S G S C O W ACCELERATOR Area Single Mode HW Single Mode HW Accelerator l ibl Flexibility penalty Flexible HW Accelerator Power Reconfiguration Reconfiguration latency

Slide 16

Slide 16 text

THE FLEXIBILITY PENALTY Limited silicon area Scalability Battery life and usability Area Power Leakage power in nanoscale CMOS Increased ops/s in emerging standards Seamless mobility Rec. Latency Vertical handover

Slide 17

Slide 17 text

MULTISTANDARD ACCELERATOR PARADIGMS: VELCRO APPROACH A l t 1 ADC Accelerator 1 Accelerator 2 ADC Accelerator 3 Configuration Register Register

Slide 18

Slide 18 text

MULTISTANDARD ACCELERATOR PARADIGMS: MULTIMODE ASIC / CONFIGURABLE DATAPATHS MULTIMODE ASIC / CONFIGURABLE DATAPATHS Y(n) Y(n) Y(n) Y(n) 8 2 32 8 32 16 2 8 2 8 16 + + + + + + Z-1 Z-1 Z-1 + 19 * Y(n) 41 * Y(n) 10 * Y(n) Z-1 ( ) ( ) ( ) DFG of three single constant integer multipliers (19,41,10) * Y(n) Z 1 Fused DFG

Slide 19

Slide 19 text

MULTISTANDARD ACCELERATOR Control PARADIGMS: FILTER COPROCESSOR Co o data address bus coefficient address bus Data R Fil Coeff R Fil Data Coeff il Data R Fil Coeff R Fil x(n) coefficient address bus Reg File Reg File Reg File Reg File Reg File Reg File x + x + x + z-1 z-1 z-1 + + y(n)

Slide 20

Slide 20 text

MULTISTANDARD ACCELERATOR PARADIGMS: MULTISTANDARD ACCELERATOR PARADIGMS: FINE-GRAINED RECONFIGURABLE FABRIC Configuration Access Port LUT Interconnect

Slide 21

Slide 21 text

GRANULARITY OF HARDWARE REUSE IN DIFFERENT GRANULARITY OF HARDWARE REUSE IN DIFFERENT MULTIMODE HARDWARE ACCELERATORS Reuse of fine-grained bit-level operators Fine-grained fi bl Reuse of coarse grained p Filter coprocessor reconfigurable fabric datapath operators Multimode ASIC No reuse Finer granularity of reuse Velcro based multimode accelerator reuse

Slide 22

Slide 22 text

GRANULARITY OF HARDWARE REUSE VS. FLEXIBILITY Highly Flexibility Fine-grained fi bl Flexible Filter coprocessor reconfigurable fabric Limited Flexibility Multimode ASIC Limited Flexibility Finer granularity of reuse Velcro based multimode accelerator reuse

Slide 23

Slide 23 text

GRANULARITY OF HARDWARE REUSE VS. RECONFIGURATION DATA Large amount of reconfiguration data Fine-grained fi bl Reconfiguration data is a function of the filter length Filter coprocessor reconfigurable fabric Low reconfiguration data f f g Multimode ASIC Low reconfiguration data Finer granularity of reuse Velcro based multimode accelerator reuse

Slide 24

Slide 24 text

GRANULARITY OF HARDWARE REUSE VS. POWER CONSUMPTION Hi h d i i High dynamic power consumption Fine-grained fi bl High dynamic power consumption Low dynamic Filter coprocessor reconfigurable fabric Low dynamic power consumption Multimode ASIC Lowest dynamic power consumption Finer granularity of reuse Velcro based multimode accelerator reuse

Slide 25

Slide 25 text

DESIGN STRATEGY FOR MULTISTANDARD DIGITAL FRONT END DIGITAL FRONT-END  The area, power and reconfiguration latency overheads need to be minimized without i i th fl ibilit t t compromising on the flexibility to support a new specification.  Identify opportunities for reusing hardware at coarser levels of granularity across multiple standards, with low parameterization overheads.  The reused hardwired functional blocks should not be a bottleneck for supporting a new standard.  Use area/power optimizations to minimize the overheads associated with functional blocks which demand a high degree of flexibility.

Slide 26

Slide 26 text

REUSE OF ‘FILTER STAGES’ C h d l i d ASIC h  Coprocessor approach and multimode ASIC approach reuse coarse- grained datapath operators : adders, multipliers, registers, MAC units.  All the functionally different channelization tasks of filtering, sample rate conversion, interference attenuation and pulse shaping can be , p p g simultaneously performed by a multistage decimation filter.  The filter stages in a multistage decimation filter represent a coarser  The filter stages in a multistage decimation filter represent a coarser granularity level for investigating hardware reuse, than simple datapath operators.

Slide 27

Slide 27 text

DESIGN OF A FILTER STAGE IN A MULTISTAGE DECIMATION FILTER  Consider an arbitrary factorization of the SRC factor, Mj j n j j j j j m m m m M .... 3 2 1  Symbol rate ΣΔ ADC Sample rate j j F M j F   z H j 2 j m1   z H j 1   z H j nj j m2 j nj m Multistage Decimation filter for the jth standard

Slide 28

Slide 28 text

DESIGN OF A FILTER STAGE IN A MULTISTAGE DECIMATION FILTER j F pq j j F M j F j F q   H j j pq p j q   z H j k Filter Stage for decimation by ‘p’, at an oversampling rate of ‘pq’

Slide 29

Slide 29 text

IDEAL DECIMATION FILTER   2 0 p p

Slide 30

Slide 30 text

PRACTICALLY REALIZABLE DECIMATION FILTER   2 0 p p  2 0 j f j f j k fpass j k fpass   2 0 j k A p p  2 0

Slide 31

Slide 31 text

STANDARD DEPENDENT PARAMETERS P b d d i i d i l h i  Passband edge, in a raised cosine pulse shaping system can be given by:     1 0 1    j j F Fpass       1 0 ........ 1 2     j j Fpass   Roll-off factor   j j Fpass            j j j k pq F q p p fpass                       1 2 Depend on the standard  Stopband edge :   j j k pq p fstop               1 2 standard specific parameters and j k A   Stopband Attenuation : j k A and j  k

Slide 32

Slide 32 text

ELIMINATING STANDARD SPECIFIC DEPENDENCIES         pq j    1   pq j    1 j A p  p  2 0   pq pq  j A A  max p  p  2 0

Slide 33

Slide 33 text

OBSERVATION  Above stage can be hardwired and reused by the decimation filter of any standard which needs:  A decimation by ‘p’ stage at an OSR of ‘pq’ Required stopband attenuation for abo e stage is less than or  Required stopband attenuation for above stage is less than or equal to Amax  Can we manipulate the factorization of the SRC factor to exploit the above observation ? p

Slide 34

Slide 34 text

FIXED FACTORIZATION METHOD Fi d f i i h d f i h SRC f i  Fixed factorization method factorizes the SRC factor in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn

Slide 35

Slide 35 text

FIXED FACTORIZATION METHOD - SUMMARY Fi d f t i ti th d f t i th SRC f t i  Fixed factorization method factorizes the SRC factor, in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn Standard dependent rational factor Integral Load : CIC Filter Fractional Load : Transpose Farrow Filter Weakly parameterizable

Slide 36

Slide 36 text

FIXED FACTORIZATION METHOD - SUMMARY Fi d f i i h d f i h SRC f i  Fixed factorization method factorizes the SRC factor, in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn Fixed integral factors, common to all standards Hardwired FIR filter stages No reconfiguration overheads g

Slide 37

Slide 37 text

FIXED FACTORIZATION METHOD - SUMMARY Fi d f i i h d f i h SRC f i  Fixed factorization method factorizes the SRC factor, in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn Fixed integral factor, common to all standards P bl FIR filt Programmable FIR filter Incurs reconfiguration latency area, power penalties penalties

Slide 38

Slide 38 text

EXPERIMENTAL SYNTHESIS RESULTS EXPERIMENTAL SYNTHESIS RESULTS W kl t i bl F ll bl Standard SRC CIC Transpose Fixed Programmable Weakly parameterizable Fixed Fully programmable Factor * p Farrow Halfband g FIR GSM 118 154 16 1 8461 2 2 GSM 118.154 16 1.8461 2 2 W-CDMA 16.667 4 1.0461 2 2 IEEE 13.333 2 1.6667 2 2 802.11a WiMax 9.578 2 1.1972 2 2 * A. Rusu, et.al., “Reconfigurable ADCs enable smart radios for 4G wireless connectivity,” IEEE Circuits Devices Mag., vol. 22, no. 3, pp. 6-11, 2006.

Slide 39

Slide 39 text

CHANNELIZATION ACCELERATOR AREA COMPARISON Standard Cell Area GSM 265985 IEEE 802.11a 154812 WCDMA 232752 Wimax 210656 Velcro Approach 864205 Proposed Multistandard Accelerator 299701 • Synthesis results obtained from implementation using a TSMC 0.18 μm process S C 0. 8 μ p ocess

Slide 40

Slide 40 text

OBSERVATIONS N l 65% d i i d V l h f  Nearly 65% reduction in area, compared to a Velcro approach for 4 standards.  Percentage area reduction can be expected to increase with increasing number of supported standard.  The fixed and weakly parameterizable portions of the architecture need to be designed for the worst case attenuation requirements requirements.  Paradigm is scalable for an arbitrary number of standards with low reconfiguration overheads.

Slide 41

Slide 41 text

REDUCING THE AREA/POWER PENALTY OF THE LAST STAGE FILTER P bili i h l fil i h  Programmability in the last stage filter necessitates the use of generic MAC units.  The last stage filter can be implemented as a time-shared MAC FIR Filter.  Power reduction strategies for time-shared MAC based FIR filters have generally focused on reducing the switching filters have generally focused on reducing the switching activity.  In nanoscale CMOS technologies, the leakage power also needs to be taken into account.

Slide 42

Slide 42 text

NANOSCALE CMOS POWER CONSUMPTION COMPONENTS  Dependence on supply voltage (VDD ) and threshold voltage (Vth )  Subthreshold Leakage Power – Increases exponentially with reduced Vth p y th.  Gate Leakage Power – Increases exponentially with increased VDD. Dynamic Power : Increases quadratically with  Dynamic Power : Increases quadratically with increased VDD.

Slide 43

Slide 43 text

EFFECT OF PARALLELISM ON OPERATING VOLTAGES I i h b f fi d MAC i l i  Increasing the number of fixed MAC units, results in a reduced operating frequency while maintaining the same throughput.  Reduced operating frequency translates to increased timing l k i th iti l th slack in the critical paths.  Timing slack can be exploited for increasing V h or  Timing slack can be exploited for increasing Vth or reducing VDD.

Slide 44

Slide 44 text

EFFECT OF REDUCED FREQUENCY ON Q OPERATING VOLTAGES Locus of permissible (VDD ,Vth ) points for a different frequency i constraints Effect of reduced frequency constraints on the permissible (VDD ,Vth ) points of a 16–bit adder ( TSMC 0.18um CMOS ) process )

Slide 45

Slide 45 text

NANOSCALE CMOS POWER CONSUMPTION COMPONENTS  Dependence on Area S bth h ld L k P H li  Subthreshold Leakage Power – Has a linear dependence on total gate width.  Gate Leakage Power – Has a linear dependence on total gate width.  Dynamic Power : Has a linear dependence on the total physical capacitance total physical capacitance. Total Gate width and total physical capacitance are p y p strongly correlated to the total circuit area.

Slide 46

Slide 46 text

PARALLELISM AND AREA-SLACK EFFICIENCY P ll li d i d f l i  Parallelism trades increased area, for a lower operating frequency and increased timing slack.  Increased timing slack can be traded for lower VDD and increased Vth , and hence reduced total power consumption.  Increased area penalty of parallelism, lowers the possible reduction in total power consumption reduction in total power consumption  Area-slack Efficiency : Amount of timing slack increment y f g Amount of area increment

Slide 47

Slide 47 text

FULL PARALLEL DIRECT FORM FILTER OF LENGTH N z-1 z-1 z-1 z-1 z-1 x(n/fs ) x x + x + x + x + h0 h1 h2 h3 hN-1 y(n/fs ) Throughput rate = fs

Slide 48

Slide 48 text

M-MAC BASED TIME-SHARED DIRECT FORM FILTER OF LENGTH N Control Nf /M Co o Nfs /M data address bus coefficient address bus clock Data R Fil Coeff R Fil Data Coeff il Data R Fil Coeff R Fil x(n/fs ) coefficient address bus Reg File Reg File Reg File Reg File Reg File Reg File x + x + x + z-1 z-1 z-1 MAC-1 MAC-2 MAC-M + + y(n/fs )

Slide 49

Slide 49 text

AREA–SLACK EFFICIENCY OF A TIME- SHARED DIRECT FORM FILTER C l i d f M MAC b d ti h d di t f  Cycle period of a M-MAC based time-shared direct form filter of length: M  Extra timing slack obtained by adding P MAC units each s M Nf M T   Extra timing slack obtained by adding P MAC units, each of area Am P T T    Area slack efficiency of a time shared direct form filter s M P M Nf T T     Area-slack efficiency of a time-shared direct form filter DF Nf A PA Nf P E 1 1    Can we design filter structures that have hi h l k s m m Nf A PA Nf a higher area-slack efficiency ?

Slide 50

Slide 50 text

FAST FILTER ALGORITHMS (FFA) STRUCTURES Al i h i h d i FFA h l  Algorithmic strength reduction : FFA structures have a lower number of expensive MAC operations at the cost of increased add operations.  FFA structures can be derived by exploiting the redundancies in the FIR subfilters of a K-parallel FIR filter. p x(2k+1) x(2k+1) Each FIR subfilter is of length N/2

Slide 51

Slide 51 text

FFA BASED TIME-SHARED FILTERS Ti h d FFA b b i d b  Time–shared FFA structures can be obtained by implementing each of the FIR subfilters as a time-shared FIR filter , while implementing the irregular addition network in parallel.  A KxK FFA structure of a N-tap filter has Sk subfilters of length N/K, and Ak postprocessing/preprocessing adders. g , k p p g p p g  Notation, KxK|L used to indicate a structure in which each of theSk subfilters is multiplexed onto L MAC units.

Slide 52

Slide 52 text

AREA-SLACK EFFICIENCY OF FFA BASED TIME-SHARED FILTERS I l f h bfil i K K FFA i f /K  Input sample rate of the subfilters in a KxK FFA is fs /K  Cycle period of the MAC units in a KxK|L FFA  Cycle period of the MAC units in a KxK|L FFA L K K Nf L K T 2 |    Extra timing slack obtained by adding P MAC units in each of the SK subfilters s Nf P K 2  Area slack efficiency of a KxK FFA structure s L K K P L K K Nf P K T T | ) |(       Area-slack efficiency of a KxK FFA structure DF FFA E S K Nf A S K PA S Nf PK E                 2 2 2 1 1 DF K s m K m K s FFA S Nf A S PA S Nf        

Slide 53

Slide 53 text

FFA PARAMETERS K Sk Ak K2 K2/Sk K Sk Ak K K /Sk 2 3 4 4 1.33 3 6 10 9 1.5 4 9 20 16 1 78 4 9 20 16 1.78 5 12 40 25 2.08 6 18 42 36 2 8 27 76 64 2.37 DF K FFA E S K E          2

Slide 54

Slide 54 text

OBSERVATIONS  MAC units in the FFA based time shared structures have a  MAC units in the FFA based time-shared structures have a greater timing slack than a time-shared direct form filter with the same number of MAC units  Adding MAC units to a time-shared FFA structure offers greater timing slack increment than adding the same number greater timing slack increment, than adding the same number of MAC units to a time-shared direct form structure.

Slide 55

Slide 55 text

CONCLUSION P d d i f ffi i i l i f  Proposed a design strategy for efficient implementation of a channelization accelerator for in a flexible mobile radio.  Design has a high degree of hardware reuse across multiple standards.  Proposed design strategy is scalable for supporting an arbitrary number of standards number of standards.  We have investigated strategies for reducing the area/power g g g p penalty of the last stage programmable filter.

Slide 56

Slide 56 text

PUBLICATIONS International Journals [1] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Flexibility and reusability in the digital front-end of cognitive radio terminals,” Circuits, Systems and Signal Processing Journal, Springer Accepted in August 2010 Springer, Accepted in August 2010. [2] Navin Michael, Christophe Moy, A. P. Vinod and Jacques Palicot, “Area-Power tradeoffs for flexible filtering in green radios,” Journal of Communications and Networks, vol.12, no.2, pp. 158-167, April 2010. International Conferences [1] Christophe Moy, Wassim Jouini, Navin Michael, “Cognitive Radio Equipments Supporting Spectrum Agility,” International Workshop on Cognitive Radio and Advanced Spectrum M t (C ART 2010) It l 7 10 N b 2010 Management (CogART 2010), Italy, 7-10 November 2010. [2] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Low power, flexible FIR filters in the digital front-end of green radios,” Proceedings of IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Istanbul, Turkey, September 2010. [3] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Area-efficient time-shared FIR fil i l C OS ” di f i l C f G Ci i d filters in nanoscale CMOS,” Proceedings of IEEE International Conference on Green Circuits and Systems, Shanghai, China, June 2010. [4] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Design paradigm for standard agnostic channelization in flexible mobile radios,” Proceedings of IEEE International Symposium on Circuits and Systems, Paris, France, May-June 2010. [5] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Design of low power multimode time-shared filters,” Proceedings of 7th IEEE International Conference on Information, Communications and Signal Processing, pp. 1-5, Macau, December 2009. [6] Navin Michael and A. P. Vinod, “Reconfigurable architecture for arbitrary sample rate conversion in software defined radios,” Proceedings of 19th IEEE International Symposium on Personal, I d d M bil R di C i ti 1 6 C F S t b 2008 Indoor and Mobile Radio Communications, pp. 1-6, Cannes, France, September 2008.

Slide 57

Slide 57 text

THANK YOU