1.0 1.5 2.0 2.5 3.0 normalized frequency 200 180 160 140 120 100 80 60 amplitude Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6 Bank 7 Bank 8 Bank 9 Bank 10 Bank 11 Bank 12 Bank 13 Bank 14 Bank 15 14 / 71
1.0 1.5 2.0 2.5 3.0 normalized frequency 200 180 160 140 120 100 80 60 amplitude Bank 16 Bank 17 Bank 18 Bank 19 Bank 20 Bank 21 Bank 22 Bank 23 Bank 24 Bank 25 Bank 26 Bank 27 Bank 28 Bank 29 Bank 30 Bank 31 15 / 71
Xk = 2N−1 n=0 x[n] cos π N k + 1 2 n + 1 2 + N 2 (9) IMDCT (Inverse MDCT) y[n] = 2 N N−1 k=0 Xk cos π N k + 1 2 n + 1 2 + N 2 (10) ▶ ࣌ؒྖҬ 2N ɼपྖҬ N ͷม ʢk = 0, ..., N − 1ʣ 22 / 71
cai X18k+i+1 (20) X18k+i+1 ← cai X18k−i + csi X18k+i+1 (21) k = 1, ..., 31, i = 0, ..., 7 ▶ csi , cai ͷఆٛ csi := 1 1 + c2 i , cai := ci 1 + c2 i (22) ci (i = 0, ...7) ن֨Ͱઃఆ [6] ΑΓҾ༻ 37 / 71
ࠩ͢Δৼ෯ ͕-2dB ΄ͲԼ ʹҠಈʢվળʣ 105 110 115 120 125 130 135 140 145 150 frequency bin 80 75 70 65 60 55 50 amplitude (dB) Aliasing reduction butterfly effect for center bands Band 14 butterfly Band 14 Band 15 butterfly Band 15 Band 16 butterfly Band 16 Band 17 butterfly Band 17 38 / 71
ˆ Xi ∗2 Xq i = round sign(Xi ) Xi (Gl i )−1 3 4 (23) ˆ Xi = sign(Xq i )|Xq i |4 3 Gl i (24) Gl i = 21 4 g2−1 2 (1+scale)(sli +pi ) (25) ▶ gɿάϩʔόϧήΠϯ ▶ scaleɿεέʔϧϑΝΫλͷεέʔϧʢdist10 ΤϯίʔμʔͰৗʹ 0ʣ ▶ sli ɿεέʔϧϑΝΫλ ▶ pi ɿϓϦΤϯϑΝγε૿෯ʢdist10 ΤϯίʔμʔͰৗʹ 0ʣ ∗2round Λআ͚ɼٯྔࢠʹΑΓݩʹΔ ˆ Xi = sign(Xi )|Xq i |4 3 Gl i = sign(Xi ) |Xi (Gl i )−1|3 4 4 3 Gl i = Xi (Gl i )−1Gl i = Xi 41 / 71
long, short ͷύʔςΟγϣϯ b Λ Pl b , Ps b ͱॻ͘ͱɼ ebl[b] = n∈Pl b |wl[n]|2 (34) cb[b] = n∈min Pl b cw[n]|wl[n]|2 (35) ebs[b] = n∈min Ps b |ws[n]|2 (36) 52 / 71
ecbl, ecbsɿSFl b , SFs b ΛΈ͜Μͩ ebl, ebs ▶ ctbɿSFl b ΛΈ͜Μͩ cb ecbl[b] = k SFl b [k]ebl[k] (37) ctb[b] = k SFl b [k]cb[k] (38) ecbs[b] = k SFs b [k]ebs[k] (39) 55 / 71
൛. Tech I. CQ ग़൛ࣾ, 2001. [2] وՈਔࢤ. ϚϧνϨʔτ৴߸ॲཧ. σΟδλϧ৴߸ॲཧγϦʔζ. তߊಊ, 1995. [3] P. P. Vaidyanathan et al. ϚϧνϨʔτ৴߸ॲཧͱϑΟϧλόϯΫ. σΟδλϧ৴߸ॲ ཧɾը૾ॲཧγϦʔζ. Պֶٕज़ग़൛, 2002. [4] John Princen and Alan Bradley. “Analysis/synthesis filter bank design based on time domain aliasing cancellation”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 34.5 (1986), pp. 1153–1161. [5] Bernd Edler. “Aliasing reduction in sub-bands of cascaded filter banks with decimation”. In: Electronics Letters 12.28 (1992), pp. 1104–1106. [6] Rassol Raissi. “The theory behind MP3”. In: MP3’ Tech (2002). [7] Manfred R Schroeder, Bishnu S Atal, and JL Hall. “Optimizing digital speech coders by exploiting masking properties of the human ear”. In: The Journal of the Acoustical Society of America 66.6 (1979), pp. 1647–1652. [8] ҆ాߒ. MPEG/ϚϧνϝσΟΞූ߸Խͷࠃࡍඪ४. ؙળ, 1994. 3 / 42
28-Jul-2024]. 2015. [18] Chi-Min Liu and Wen-Chieh Lee. “The design of a hybrid filter bank for the psychoacoustic model in ISO/MPEG phases 1, 2 audio encoder”. In: IEEE transactions on consumer electronics 43.3 (1997), pp. 586–592. [19] James D Johnston. “Estimation of perceptual entropy using noise masking criteria”. In: Icassp-88., international conference on acoustics, speech, and signal processing. IEEE. 1988, pp. 2524–2527. 5 / 42
k=0 tk,i tk,j Ͱ͋Γɼ tk,itk,j = 4 cos π M k + 1 2 i − L − 1 2 + (−1)k π 4 cos π M k + 1 2 j − L − 1 2 + (−1)k π 4 = 2 cos π M k + 1 2 {i + j − (L − 1)} + (−1)k π 2 + cos π M k + 1 2 (i − j) i + j − (L − 1) = A ͱ͓͘ͱɼ cos π M k + 1 2 A + (−1)k π 2 = cos π M k + 1 2 A cos (−1)k π 2 − sin π M k + 1 2 A sin (−1)k π 2 = −(−1)k sin π M k + 1 2 A 26 / 42
N−1 k=0 2N−1 m=0 x[m] cos π N k + 1 2 m + 1 2 + N 2 cos π N k + 1 2 n + 1 2 + N 2 = 1 N N−1 k=0 2N−1 m=0 x[m] cos π N k + 1 2 (n + m + 1 + N) + cos π N k + 1 2 (n − m) = 1 N 2N−1 m=0 x[m] N−1 k=0 cos π N k + 1 2 (n + m + 1 + N) + cos π N k + 1 2 (n − m) ͜͜Ͱɼ In := N−1 k=0 cos π N k + 1 2 n (72) Λܭࢉ͍ͯ͘͠ 35 / 42