Navin Michael - Flexibility, Hardware Reuse and Power Consumption Issues in the Digital Front-end of Multistandard Software Defined Radio Handsets

FLEXIBILITY, HARDWARE REUSE AND POWER CONSUMPTION ISSUES IN THE DIGITAL
FRONT-END OF MULTISTANDARD SDR HANDSETS Navin Michael Navin Michael School of Computer Engineering, Nanyang Technological University, SCEE Supelec, Rennes y g g y, Singapore p , France

FLEXIBILITY VS. ENERGY EFFICIENCY IN 4G 4G core IP network
• Higher throughput requirement • Energy Efficiencies ~1 TOPS/W gy • Flexibility needed for seamless mobility LTE Wi-Fi • Flexible hardware less energy efficient _than custom hardware Wimax Seamless Mobility Seamless Mobility Mobility Standard agnostic terminals enabled by Need to reduce the area and power penalty of flexible portions of the radio terminals enabled by software defined radio

ENERGY EFFICIENCY OF DIGITAL PLATFORMS 3 ASIC 100 – 1000
Gops/W Energy Efficiency (Mops/mW) FPGA 10 – 100 Gops/W DSP GPP 0 1 1 G /W 1 – 10 Gops/W Flexibility GPP 0.1 – 1 Gops/W y [1] Rabaey J M "Wireless beyond the third generation facing the energy challenge " Low Power Electronics [1] Rabaey, J.M., Wireless beyond the third generation-facing the energy challenge , Low Power Electronics and Design, International Symposium on, 2001. , vol., no., pp.1-3, 2001

SDR TERMINAL RECEIVER CHAIN : DIGITAL FRONT-END Programmable Analog Front
End Programmable ΣΔ ADC Flexible Digital Front-end SDR Baseband Computational load is strongly dependent on the design of the analog front-end and ADC the analog front-end and ADC.

End Programmable ΣΔ ADC Flexible Digital Front-end SDR Baseband A “fixed digitization bandwidth” relaxes the flexibility requirement of the analog front-end in multistandard radios requirement of the analog front-end in multistandard radios

FIXED DIGITIZATION BANDWIDTH FIXED DIGITIZATION BANDWIDTH Coarse band selected by
the analog front--end Interferer Channel of interest Standard -1 Channel of Interferers interest Standard -2 Channel of interest Standard -3

End Programmable ΣΔ ADC Flexible Digital Front-end SDR Baseband • Highly computationally intensive O C • Operates on highly oversampled ADC output • Needs to be implemented using a flexible ASIC HW accelerator

MULTISTANDARD CHANNELIZATION ACCELERATOR Functions • Channel Selection • Channel Selection
• Sample Rate Conversion • Interferer Attenuation P l Sh d Filt i • Pulse Shaped Filtering

MULTISTANDARD CHANNELIZATION ACCELERATOR Variable channel bandwidths Variable channel bandwidths and
band–edge specifications Functions • Channel Selection • Sample Rate Conversion p • Interferer Attenuation • Pulse Shaped Filtering

MULTISTANDARD CHANNELIZATION ACCELERATOR Variable SRC factors (integral or Functions Variable
SRC factors (integral or rational) Functions • Channel Selection • Sample Rate Conversion I f A i • Interferer Attenuation • Pulse Shaped Filtering

MULTISTANDARD CHANNELIZATION ACCELERATOR Variable interferer location and Variable interferer location
and power levels Functions • Channel Selection • Sample Rate Conversion p • Interferer Attenuation • Pulse Shaped Filtering

MULTISTANDARD CHANNELIZATION ACCELERATOR V i bl ll ff f t
Variable roll-off factor Functions • Channel Selection • Sample Rate Conversion p • Interferer Attenuation • Pulse Shaped Filtering

DESIGN SPACE OF A FLEXIBLE HW S G S C
O W ACCELERATOR Area Single Mode HW Single Mode HW Accelerator Power

O W ACCELERATOR Area Single Mode HW Single Mode HW Accelerator Power Behavioral Optimizations •Constant propagation •Constant propagation •Common subexpression elimination •Operator strength reduction •Operator strength reduction

O W ACCELERATOR Area Single Mode HW Single Mode HW Accelerator l ibl Flexibility penalty Flexible HW Accelerator Power Reconfiguration Reconfiguration latency

THE FLEXIBILITY PENALTY Limited silicon area Scalability Battery life and
usability Area Power Leakage power in nanoscale CMOS Increased ops/s in emerging standards Seamless mobility Rec. Latency Vertical handover

MULTISTANDARD ACCELERATOR PARADIGMS: VELCRO APPROACH A l t 1 ADC
Accelerator 1 Accelerator 2 ADC Accelerator 3 Configuration Register Register

MULTISTANDARD ACCELERATOR PARADIGMS: MULTIMODE ASIC / CONFIGURABLE DATAPATHS MULTIMODE ASIC
/ CONFIGURABLE DATAPATHS Y(n) Y(n) Y(n) Y(n) 8 2 32 8 32 16 2 8 2 8 16 + + + + + + Z-1 Z-1 Z-1 + 19 * Y(n) 41 * Y(n) 10 * Y(n) Z-1 ( ) ( ) ( ) DFG of three single constant integer multipliers (19,41,10) * Y(n) Z 1 Fused DFG

MULTISTANDARD ACCELERATOR Control PARADIGMS: FILTER COPROCESSOR Co o data address
bus coefficient address bus Data R Fil Coeff R Fil Data Coeff il Data R Fil Coeff R Fil x(n) coefficient address bus Reg File Reg File Reg File Reg File Reg File Reg File x + x + x + z-1 z-1 z-1 + + y(n)

MULTISTANDARD ACCELERATOR PARADIGMS: MULTISTANDARD ACCELERATOR PARADIGMS: FINE-GRAINED RECONFIGURABLE FABRIC Configuration
Access Port LUT Interconnect

GRANULARITY OF HARDWARE REUSE IN DIFFERENT GRANULARITY OF HARDWARE REUSE
IN DIFFERENT MULTIMODE HARDWARE ACCELERATORS Reuse of fine-grained bit-level operators Fine-grained fi bl Reuse of coarse grained p Filter coprocessor reconfigurable fabric datapath operators Multimode ASIC No reuse Finer granularity of reuse Velcro based multimode accelerator reuse

GRANULARITY OF HARDWARE REUSE VS. FLEXIBILITY Highly Flexibility Fine-grained fi
bl Flexible Filter coprocessor reconfigurable fabric Limited Flexibility Multimode ASIC Limited Flexibility Finer granularity of reuse Velcro based multimode accelerator reuse

GRANULARITY OF HARDWARE REUSE VS. RECONFIGURATION DATA Large amount of
reconfiguration data Fine-grained fi bl Reconfiguration data is a function of the filter length Filter coprocessor reconfigurable fabric Low reconfiguration data f f g Multimode ASIC Low reconfiguration data Finer granularity of reuse Velcro based multimode accelerator reuse

GRANULARITY OF HARDWARE REUSE VS. POWER CONSUMPTION Hi h d
i i High dynamic power consumption Fine-grained fi bl High dynamic power consumption Low dynamic Filter coprocessor reconfigurable fabric Low dynamic power consumption Multimode ASIC Lowest dynamic power consumption Finer granularity of reuse Velcro based multimode accelerator reuse

DESIGN STRATEGY FOR MULTISTANDARD DIGITAL FRONT END DIGITAL FRONT-END 
The area, power and reconfiguration latency overheads need to be minimized without i i th fl ibilit t t compromising on the flexibility to support a new specification.  Identify opportunities for reusing hardware at coarser levels of granularity across multiple standards, with low parameterization overheads.  The reused hardwired functional blocks should not be a bottleneck for supporting a new standard.  Use area/power optimizations to minimize the overheads associated with functional blocks which demand a high degree of flexibility.

REUSE OF ‘FILTER STAGES’ C h d l i d
ASIC h  Coprocessor approach and multimode ASIC approach reuse coarse- grained datapath operators : adders, multipliers, registers, MAC units.  All the functionally different channelization tasks of filtering, sample rate conversion, interference attenuation and pulse shaping can be , p p g simultaneously performed by a multistage decimation filter.  The filter stages in a multistage decimation filter represent a coarser  The filter stages in a multistage decimation filter represent a coarser granularity level for investigating hardware reuse, than simple datapath operators.

DESIGN OF A FILTER STAGE IN A MULTISTAGE DECIMATION FILTER
 Consider an arbitrary factorization of the SRC factor, Mj j n j j j j j m m m m M .... 3 2 1  Symbol rate ΣΔ ADC Sample rate j j F M j F   z H j 2 j m1   z H j 1   z H j nj j m2 j nj m Multistage Decimation filter for the jth standard

DESIGN OF A FILTER STAGE IN A MULTISTAGE DECIMATION FILTER
j F pq j j F M j F j F q   H j j pq p j q   z H j k Filter Stage for decimation by ‘p’, at an oversampling rate of ‘pq’

IDEAL DECIMATION FILTER   2 0 p p

PRACTICALLY REALIZABLE DECIMATION FILTER   2 0 p p
 2 0 j f j f j k fpass j k fpass   2 0 j k A p p  2 0

STANDARD DEPENDENT PARAMETERS P b d d i i d
i l h i  Passband edge, in a raised cosine pulse shaping system can be given by:     1 0 1    j j F Fpass       1 0 ........ 1 2     j j Fpass   Roll-off factor   j j Fpass            j j j k pq F q p p fpass                       1 2 Depend on the standard  Stopband edge :   j j k pq p fstop               1 2 standard specific parameters and j k A   Stopband Attenuation : j k A and j  k

ELIMINATING STANDARD SPECIFIC DEPENDENCIES      
  pq j    1   pq j    1 j A p  p  2 0   pq pq  j A A  max p  p  2 0

OBSERVATION  Above stage can be hardwired and reused by
the decimation filter of any standard which needs:  A decimation by ‘p’ stage at an OSR of ‘pq’ Required stopband attenuation for abo e stage is less than or  Required stopband attenuation for above stage is less than or equal to Amax  Can we manipulate the factorization of the SRC factor to exploit the above observation ? p

FIXED FACTORIZATION METHOD Fi d f i i h d
f i h SRC f i  Fixed factorization method factorizes the SRC factor in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn

FIXED FACTORIZATION METHOD - SUMMARY Fi d f t i
ti th d f t i th SRC f t i  Fixed factorization method factorizes the SRC factor, in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn Standard dependent rational factor Integral Load : CIC Filter Fractional Load : Transpose Farrow Filter Weakly parameterizable

FIXED FACTORIZATION METHOD - SUMMARY Fi d f i i
h d f i h SRC f i  Fixed factorization method factorizes the SRC factor, in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn Fixed integral factors, common to all standards Hardwired FIR filter stages No reconfiguration overheads g

FIXED FACTORIZATION METHOD - SUMMARY Fi d f i i
h d f i h SRC f i  Fixed factorization method factorizes the SRC factor, in a manner , which maximizes the number of filter stages at the same OSR and which decimate by the same factor, for different standards. Mj = Kj x m1 x m2 x ….mn-1 x mn Fixed integral factor, common to all standards P bl FIR filt Programmable FIR filter Incurs reconfiguration latency area, power penalties penalties

EXPERIMENTAL SYNTHESIS RESULTS EXPERIMENTAL SYNTHESIS RESULTS W kl t i
bl F ll bl Standard SRC CIC Transpose Fixed Programmable Weakly parameterizable Fixed Fully programmable Factor * p Farrow Halfband g FIR GSM 118 154 16 1 8461 2 2 GSM 118.154 16 1.8461 2 2 W-CDMA 16.667 4 1.0461 2 2 IEEE 13.333 2 1.6667 2 2 802.11a WiMax 9.578 2 1.1972 2 2 * A. Rusu, et.al., “Reconfigurable ADCs enable smart radios for 4G wireless connectivity,” IEEE Circuits Devices Mag., vol. 22, no. 3, pp. 6-11, 2006.

CHANNELIZATION ACCELERATOR AREA COMPARISON Standard Cell Area GSM 265985 IEEE
802.11a 154812 WCDMA 232752 Wimax 210656 Velcro Approach 864205 Proposed Multistandard Accelerator 299701 • Synthesis results obtained from implementation using a TSMC 0.18 μm process S C 0. 8 μ p ocess

OBSERVATIONS N l 65% d i i d V l
h f  Nearly 65% reduction in area, compared to a Velcro approach for 4 standards.  Percentage area reduction can be expected to increase with increasing number of supported standard.  The fixed and weakly parameterizable portions of the architecture need to be designed for the worst case attenuation requirements requirements.  Paradigm is scalable for an arbitrary number of standards with low reconfiguration overheads.

REDUCING THE AREA/POWER PENALTY OF THE LAST STAGE FILTER P
bili i h l fil i h  Programmability in the last stage filter necessitates the use of generic MAC units.  The last stage filter can be implemented as a time-shared MAC FIR Filter.  Power reduction strategies for time-shared MAC based FIR filters have generally focused on reducing the switching filters have generally focused on reducing the switching activity.  In nanoscale CMOS technologies, the leakage power also needs to be taken into account.

NANOSCALE CMOS POWER CONSUMPTION COMPONENTS  Dependence on supply voltage
(VDD ) and threshold voltage (Vth )  Subthreshold Leakage Power – Increases exponentially with reduced Vth p y th.  Gate Leakage Power – Increases exponentially with increased VDD. Dynamic Power : Increases quadratically with  Dynamic Power : Increases quadratically with increased VDD.

EFFECT OF PARALLELISM ON OPERATING VOLTAGES I i h b
f fi d MAC i l i  Increasing the number of fixed MAC units, results in a reduced operating frequency while maintaining the same throughput.  Reduced operating frequency translates to increased timing l k i th iti l th slack in the critical paths.  Timing slack can be exploited for increasing V h or  Timing slack can be exploited for increasing Vth or reducing VDD.

EFFECT OF REDUCED FREQUENCY ON Q OPERATING VOLTAGES Locus of
permissible (VDD ,Vth ) points for a different frequency i constraints Effect of reduced frequency constraints on the permissible (VDD ,Vth ) points of a 16–bit adder ( TSMC 0.18um CMOS ) process )

NANOSCALE CMOS POWER CONSUMPTION COMPONENTS  Dependence on Area S
bth h ld L k P H li  Subthreshold Leakage Power – Has a linear dependence on total gate width.  Gate Leakage Power – Has a linear dependence on total gate width.  Dynamic Power : Has a linear dependence on the total physical capacitance total physical capacitance. Total Gate width and total physical capacitance are p y p strongly correlated to the total circuit area.

PARALLELISM AND AREA-SLACK EFFICIENCY P ll li d i d
f l i  Parallelism trades increased area, for a lower operating frequency and increased timing slack.  Increased timing slack can be traded for lower VDD and increased Vth , and hence reduced total power consumption.  Increased area penalty of parallelism, lowers the possible reduction in total power consumption reduction in total power consumption  Area-slack Efficiency : Amount of timing slack increment y f g Amount of area increment

FULL PARALLEL DIRECT FORM FILTER OF LENGTH N z-1 z-1
z-1 z-1 z-1 x(n/fs ) x x + x + x + x + h0 h1 h2 h3 hN-1 y(n/fs ) Throughput rate = fs

M-MAC BASED TIME-SHARED DIRECT FORM FILTER OF LENGTH N Control
Nf /M Co o Nfs /M data address bus coefficient address bus clock Data R Fil Coeff R Fil Data Coeff il Data R Fil Coeff R Fil x(n/fs ) coefficient address bus Reg File Reg File Reg File Reg File Reg File Reg File x + x + x + z-1 z-1 z-1 MAC-1 MAC-2 MAC-M + + y(n/fs )

AREA–SLACK EFFICIENCY OF A TIME- SHARED DIRECT FORM FILTER C
l i d f M MAC b d ti h d di t f  Cycle period of a M-MAC based time-shared direct form filter of length: M  Extra timing slack obtained by adding P MAC units each s M Nf M T   Extra timing slack obtained by adding P MAC units, each of area Am P T T    Area slack efficiency of a time shared direct form filter s M P M Nf T T     Area-slack efficiency of a time-shared direct form filter DF Nf A PA Nf P E 1 1    Can we design filter structures that have hi h l k s m m Nf A PA Nf a higher area-slack efficiency ?

FAST FILTER ALGORITHMS (FFA) STRUCTURES Al i h i h
d i FFA h l  Algorithmic strength reduction : FFA structures have a lower number of expensive MAC operations at the cost of increased add operations.  FFA structures can be derived by exploiting the redundancies in the FIR subfilters of a K-parallel FIR filter. p x(2k+1) x(2k+1) Each FIR subfilter is of length N/2

FFA BASED TIME-SHARED FILTERS Ti h d FFA b b
i d b  Time–shared FFA structures can be obtained by implementing each of the FIR subfilters as a time-shared FIR filter , while implementing the irregular addition network in parallel.  A KxK FFA structure of a N-tap filter has Sk subfilters of length N/K, and Ak postprocessing/preprocessing adders. g , k p p g p p g  Notation, KxK|L used to indicate a structure in which each of theSk subfilters is multiplexed onto L MAC units.

AREA-SLACK EFFICIENCY OF FFA BASED TIME-SHARED FILTERS I l f
h bfil i K K FFA i f /K  Input sample rate of the subfilters in a KxK FFA is fs /K  Cycle period of the MAC units in a KxK|L FFA  Cycle period of the MAC units in a KxK|L FFA L K K Nf L K T 2 |    Extra timing slack obtained by adding P MAC units in each of the SK subfilters s Nf P K 2  Area slack efficiency of a KxK FFA structure s L K K P L K K Nf P K T T | ) |(       Area-slack efficiency of a KxK FFA structure DF FFA E S K Nf A S K PA S Nf PK E                 2 2 2 1 1 DF K s m K m K s FFA S Nf A S PA S Nf        

FFA PARAMETERS K Sk Ak K2 K2/Sk K Sk Ak
K K /Sk 2 3 4 4 1.33 3 6 10 9 1.5 4 9 20 16 1 78 4 9 20 16 1.78 5 12 40 25 2.08 6 18 42 36 2 8 27 76 64 2.37 DF K FFA E S K E          2

OBSERVATIONS  MAC units in the FFA based time shared
structures have a  MAC units in the FFA based time-shared structures have a greater timing slack than a time-shared direct form filter with the same number of MAC units  Adding MAC units to a time-shared FFA structure offers greater timing slack increment than adding the same number greater timing slack increment, than adding the same number of MAC units to a time-shared direct form structure.

CONCLUSION P d d i f ffi i i l
i f  Proposed a design strategy for efficient implementation of a channelization accelerator for in a flexible mobile radio.  Design has a high degree of hardware reuse across multiple standards.  Proposed design strategy is scalable for supporting an arbitrary number of standards number of standards.  We have investigated strategies for reducing the area/power g g g p penalty of the last stage programmable filter.

PUBLICATIONS International Journals [1] Navin Michael, A. P. Vinod, Christophe
Moy and Jacques Palicot, “Flexibility and reusability in the digital front-end of cognitive radio terminals,” Circuits, Systems and Signal Processing Journal, Springer Accepted in August 2010 Springer, Accepted in August 2010. [2] Navin Michael, Christophe Moy, A. P. Vinod and Jacques Palicot, “Area-Power tradeoffs for flexible filtering in green radios,” Journal of Communications and Networks, vol.12, no.2, pp. 158-167, April 2010. International Conferences [1] Christophe Moy, Wassim Jouini, Navin Michael, “Cognitive Radio Equipments Supporting Spectrum Agility,” International Workshop on Cognitive Radio and Advanced Spectrum M t (C ART 2010) It l 7 10 N b 2010 Management (CogART 2010), Italy, 7-10 November 2010. [2] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Low power, flexible FIR filters in the digital front-end of green radios,” Proceedings of IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Istanbul, Turkey, September 2010. [3] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Area-efficient time-shared FIR fil i l C OS ” di f i l C f G Ci i d filters in nanoscale CMOS,” Proceedings of IEEE International Conference on Green Circuits and Systems, Shanghai, China, June 2010. [4] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Design paradigm for standard agnostic channelization in flexible mobile radios,” Proceedings of IEEE International Symposium on Circuits and Systems, Paris, France, May-June 2010. [5] Navin Michael, A. P. Vinod, Christophe Moy and Jacques Palicot, “Design of low power multimode time-shared filters,” Proceedings of 7th IEEE International Conference on Information, Communications and Signal Processing, pp. 1-5, Macau, December 2009. [6] Navin Michael and A. P. Vinod, “Reconfigurable architecture for arbitrary sample rate conversion in software defined radios,” Proceedings of 19th IEEE International Symposium on Personal, I d d M bil R di C i ti 1 6 C F S t b 2008 Indoor and Mobile Radio Communications, pp. 1-6, Cannes, France, September 2008.

THANK YOU

Navin Michael - Flexibility, Hardware Reuse and...

Navin Michael - Flexibility, Hardware Reuse and Power Consumption Issues in the Digital Front-end of Multistandard Software Defined Radio Handsets

More Decks by SCEE Team

Other Decks in Research

Featured

Transcript