Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References An alternative estimator for the number of factors for high-dimensional time series. A robust approach. Vald´ erio A. Reisen DEST-CCE/PPGEA/PPGECON - Federal University of Esp´ ırito Santo, Brazil valderioanselmoreisen@gmail.com CentraleSup´ elec, Gif-Sur-Yvette V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Summary 1 Introduction 2 Introduction 3 Time Domain: A robust estimator of the ACF 4 Factor Analysis-Methodology 5 Factor Analysis - Simulation cases 6 Application 7 Conclusions V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Abstract This paper considers the factor modeling for high-dimensional time series with short and long-memory properties and in the presence of additive outliers. The factor model studied by Lam and Yao (2012) is extended to the presence of additive outliers. The estimators of the number of factors are obtained by the robust covariance matrix. The methodology is analyzed in terms of the convergence rate of the number factors and by means of Monte Carlo simulations. Application with the aiming to reduce the dimensionality of the data set: The pollutant PM10 in the Greater Vit´ oria region (ES, Brazil). V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Introduction: Main topics of this talk Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction: Main topics of this talk Factor Analysis (FA) for multivariate time series with long-memory and short-memory properties and outliers; A robust dimension reduction estimator for the number of components in FA is proposed (an extension Lam and Yao (2012)); A simulation study to show the perfomance of the method for TS under additive outliers The application of the suggested estimator to a real data set. The pollutant PM10 in the Greater Vit´ oria region (ES, Brazil). V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Introduction: Main topics of this talk Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction There are some high levels of concentrations and long dependence among the air pollution data from AAQMN of Greater Vit´ oria Region. High level of concentrations → health impact. High level of concentrations → may be outliers. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Introduction: Main topics of this talk Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction Lemma 1 (Fajardo et al. (2009)) Suppose that y1, y2, . . . , yn is a set of time series observations and let ρy (h) = γy (h)/γy (0), then i. For m = 1 ( one outlier), lim n→∞ lim ω→∞ ρy (h) = 0. ii. For m = 2 and T2 = T1 + l, such that h < T1 < T2 < n − h, we have lim n→∞ limω1→∞ ω2→±∞ ρy (h) = 0, if h = l, ±0.5, if h = l. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Introduction: Main topics of this talk Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction Proposition 1 (Cotta et al. (2017)) Suppose that Z 1,t, Z 2,t, . . . , Z n,t is a set of k-dimensional time series observations of Model 15 and m is the expected number of additive outliers as stated in (15). Let ˆ ρ Z ij (h) = γ Z ij (h)/( γ Z ii (0) γ Z jj (0)), for ∀i, j = 1, ..., k, then a. For m = 1 (one outlier occurring only at Zi,t), lim n→∞ plim ωi →∞ ˆ ρ Z ij (h) = 0. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Introduction: Main topics of this talk Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction: MOTIVATIONS FROM REAL APPLICATION. Introduction Proposition 1 (cont.) b. For m = 2 (two outliers occurring at Zi,t or/and at Zj,t) and assuming that ˆ γ Z ij (h) = 0, for Zi,t and Zj,t, it follows lim n→∞ plim ωi →∞ and/or ωj →∞ ˆ ρ Z ij (h) = 0. In (a.) and (b.), wi and wj are the magnitudes of the additive outliers occurring at position i and j. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References A robust estimator of ACF Rousseeuw & Croux (1993) proposed a robust scale estimator function which is based on the kth order statistic of n 2 distances {|yi − yj |, i < j}, and can be written as Qn(y) = c × {|yi − yj |; i < j}(k) , (1) where y = (y1, y2, . . . , yn) , c is a constant used to guarantee consistency (c = 2.2191 for the normal distribution), and k = (n 2 )+2 4 + 1. Rousseeuw & Croux (1993) showed that the asymptotic breakdown point of Qn(·) is 50%, which means that the time series can be contaminated by up to half of the observations with outliers and Qn(·) will still yield sensible estimates. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Robust Sample autocovariance Ma & Genton (2000) suggested the following robust sample autocovariance function γQ(h) = 1 4 Q2 n−h (u + v) − Q2 n−h (u − v) , (2) where u and v are vectors containing the initial n − h and the ﬁnal n − h observations, respectively. Note that, the above ACF is not necessarily positive semi-deﬁnite. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Robust Sample autocovariance The references below are background works which present theoretical and applied results related to the robust ACF γQ(h) MOLINARES F. A. M., REISEN, V. A., CRIBARI NETO, Francisco. Robust estimation in Long-memory processes under additive outliers. Journal of Statistical Planning and Inference,139 ,2511 - 2525, 2009. C. L´ EVY-LEDUC, H. BOISTARD, , MOULINES, E. MURAD S TAQQU and REISEN, V. A. Robust estimation of the scale and the autocovariance functions in short and long-range dependence. Journal of Time Series Analysis,32 (2),135-156. 2011. Souza, I., REISEN, V. A. Franco,G. Bondon, P. The estimation and testing of the cointegration order based on the frequency domain. IN Journal of Business & Economic Statistics, 2017. REISEN, V. A., MONTE, E. Z. A.,A. M., G.C.FRANCO , SGRANCIO, MOLINARES F., BONDON, P., ZIELGELMANN, F.A. ABRAHAM,B. Fractional seasonal process with outliers to model and forecast daily average SO2 concentrations. IN Mathematical and Computers in simulation. 2017. COTTA, H., REISEN, V., BONDON, P., STUMMER, WOLFGANG. Robust estimation of covariance and correlation functions of a stationary multivariate process. IN EUSIPCO. 2017. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Let Z t, t ∈ Z, be a k-dimensional zero-mean vector of an observed time series. Also, let x t be an unobserved r-dimensional vector of common factors. It is assumed that these series are generated by r (r ≤ k) factors, x t, plus a measurement error t as z t = Px t + t, (3) where P is a k × r matrix of parameters of rank r ( factor loading matrix), and t is a k-dimensional white-noise sequence with full-rank covariance matrix Σ . Thus, all the common dynamic structure comes through the common factors, x t. Assumes that P P = I. When r is much smaller than k an eﬀective dimension-reduction is achieved. Z t is driven by a much lower-dimension process x t . V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Estimation The key to the inference for the model in Equation (3) is to determine the number of factors r and to estimate the k × r factor loading matrix P, or more precisely the factor loading space Ω(P). Once an estimator is obtained, say, P, a natural estimator for the factor process is x t = P z t, (4) and the resulting residuals are t = (I d − PP )z t. (5) The estimation of P is suggested by Lam & Yao (2012). V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Suppose that the vector of common factors x t = (x1,t, ..., xr,t) , t ∈ Z, follows a zero-mean r-dimensional stationary vector Fractional Autoregressive Moving Average process (VARFIMA(px , dx , qx )) given by φx (B) Dx [(1 − B)d ] xt = θx (B) at , (6) where φx (B) = I − φ1B − · · · − φpBp θx (B) = I + θ1B + · · · + θqBq are matrix polynomials in the backshift operator B, the φ s and the θ s are r × r matrices, the roots of the determinant polynomial |φx (B)| are all outside the unit circle, and those of |θx (B)| are all outside the unit circle. In addition, a t is a sequence of r × r vector Gaussian White noise with zero mean and covariance matrix Σa. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Remark 1 The VARFIMA process xt , t ∈ Z, can be written as an inﬁnite stationary second-order moving average representation as follows: x t = ∞ j=0 Ψj a t−j , (7) where the innovations a t = [a1,t , . . . , ar,t ] are r-dimensional martingale diﬀerences with respect to an increasing sequence of σ-ﬁelds { t } such that for some λ > 0, supt E(|ai,t |2+λ | t−1) < ∞, a.s., for all i = 1, . . . , r. Let E( a t a t | t−1) = Σa, a.s. The r × r matrix coeﬃcients Ψj are often referred to as impulse responses. The main characterization of the process x t considered in this paper is that impulses responses Ψj converge at slow hyperbolic rates as j −→ ∞. More precisely, there are r memory parameters d1, d2, . . . , dr , whose values lie in (0, 0.5) such that the impulse responses Ψj can be approximated by V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Remark 1 (cont.) Ψj D 1 Γ(d) jd−1 Π, as j −→ ∞ , (8) where Γ(·) is a gamma function and Π is a nonsingular r × r matrix of constants that are independent of j and may be functions of a smaller set of unknown parameters. The notation D[jd−1/Γ(j)] represents a r × r diagonal matrix with jd1−1/Γ(d1), . . . , jdr −1/Γ(dr ) on the diagonal. In fact, for any univariate function f of a single variable, the notation D[f (d)] represents r × r diagonal matrix with f (d1), . . . , f (dr ) on the diagonal. Also, the notation ∼ is deﬁned as follows: given two sequences of matrices U j and V j , as j −→ ∞, for i and r, where ui,r,j and vi,r,j are the (i, r)th elements of U j and of V j , respectively. Let ψ i,j and πi be the ith rows of Ψj and Π, respectively; then Equation (8) implies that ψ i,j ∼ jd1−1Γ(di )−1π i , as j −→ ∞, for all i = 1, . . . , r. Note that the conditions on the memory parameters di ∈ (0, 0.5), for i = 1, . . . , r, ensure that the impulse responses are square-summable and the inﬁnite sum in Equation (7) exists. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Remark 2 The autocovariances Γ(j) ≡ Cov(x t, x t+j ) of the process xt must also converge at hyperbolic rates as follows (Chung (2002)). Γ(j) ∼ D(jd−0.5).A.D(jd−0.5), as j −→ ∞ , (9) where the (i, r)th element of the r × r matrix A is Gamma(1 − di − dr )/[Γ(dr ) × Γ(1 − dr )].πiΣaπr. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Remark 2 (Cont.) Hence, as j −→ ∞, not only do the autocovariances γi,i (j) of each xi,t die out slowly at a hyperbolic rate, i.e, γi,i (j) ∼ j2di −1.Γ(1 − 2di )/[Γ(di )Γ(1 − di )].πiΣaπi, the covariances γi,r (j) between the current xi,t and the future xr,t+j , for i = r, also vanish at hyperbolic rates, i.e., γi,r (j) ∼ jdi +dr −1Γ(1 − di − dr )/[Γ(dr ) × Γ(1 − dr )].πiΣaπr. Hosking (1996) presents the result for the univariate case. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Lemma 2 Let z t = Px t + t as deﬁned in Equations (3) and (6) and assume that Γz(h) = E[z t−h z t ] are the covariance matrices of the process z t and Γx (h) = E[x t−h x t ] are the covariance matrices for the generating vector x t. Then Γz(0) = PΓx (0)P + Σ , (10) Γz(h) = PΓx (h)P , h ≥ 1, (11) where rank(Γz(h)) = rank(Γx (h)), as h ≥ 1. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Lemma 3 If the factors are independent for all lags and the matrix Σ is diagonal, then a) All of the covariance matrices Γx (h) are diagonal; b) The matrices Γz(h) are symmetric for h ≥ 1; c) By Spectral Decomposition, the columns of P will be eigenvectors of Γz(h) with eigenvalues γi (h), where γi (h) are the diagonal elements of Γx (h). V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Proposition 2 Suppose z t = Px t + t, where x t is a r-dimensional VARFIMA(px , dx , qx ) process, P is a k × r matrix (k ≥ r) of rank r, and t is a k-dimensional white noise sequence with covariance Σ . Then z t follows a k-dimensional VARFIMA(pz, dz, qz) with pz = px , dz = dx and qz = max(px , qx ). V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Remark 3 In Equation (6), consider y t = D x [(1 − B)d ]x t. Then, if the parameters of long memory (di ) are all equal to zero, the model presented in Equation 6 becomes the short memory model described by Pe˜ na & Box (1987), i.e., a VARMA(p, q). Thus, φx (B)D x [(1 − B)d ]x t = θx (B)a t becomes φy (B)y t = θy (B)a t. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Factor Analysis - Model Estimation and test Following the same line as Lam & Yao (2012), the estimation of P if performed by an eigenanalysis on ˆ M = h0 h=1 ˆ Γz (h)ˆ Γz (h) , (12) where h0 is a prescribed integer and ˆ Γz (h) denotes the sample covariance matrix of z t at lag h. ratio-based estimator for r. The estimator for the number of factors r is given by: r = argmin 1≤i≤R ˆ λi+1/ˆ λi , (13) where ˆ λ1 ≥ . . . ≥ ˆ λk are the eigenvalues of M, and r < R < k is a constant. As suggested in Lam and Yao (2012), in practice, R = p/2 can be the starting point. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references ROBUST Estimation for P and r Based on Equation 12 and on the robust ACF estimator, the robust M estimator is here suggested as M Qn = h0 h=1 Γz,Qn (h)Γz,Qn (h) . (14) where Γz,Qn (h) denotes the sample robust covariance matrix of z t at lag h. Therefore, the estimator rQn for the number of factors is similarly obtained from Equation (13). Note that M Qn is positive semi-deﬁnite. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Theoretical results Proposition 3 Let h be a ﬁxed positive integer and ΓQ (h) 1≤i,j≤p = γQ i,j (h) 1≤i,j≤p , where γQ i,j (h) is Robust ACF function deﬁned previosuly. Assume that the process is short-memory, then √ n sup 1≤j≤p λQ j − λj = Op (1), as n → ∞, where (λQ j )1≤j≤p and (λj )1≤j≤p denote the eigenvalues of h0 h=1 ΓQ (h)ΓQ (h) and h0 h=1 Γ(h)Γ(h) , respectively, where (Γ(h))1≤i,j≤p = (γi,j (h))1≤i,j≤p and h0 is a ﬁxed integer larger than 1. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references Theoretical results Proposition 4 Let h be a ﬁxed positive integer and ΓQ (h) 1≤i,j≤p = γQ i,j (h) 1≤i,j≤p , where γQ i,j (h) is deﬁned previously. Assume that the vector process has long-memory property, then (i) If, for all i in {1, . . . , k}, Di > 1/2(di < 1/4), √ n sup 1≤j≤p λQ j − λj = Op (1), as n → ∞, (ii) If, there exists i0 in {1, . . . , k} such that Di0 < 1/2(di > 1/4), nDi0 sup 1≤j≤p λQ j − λj = Op (1), as n → ∞, V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Factor Analysis - Model Factor Analysis - Model Estimation Factor Analysis - Model Estimation and number of factor Factor Analysis - ROBUST Estimation for P and r Factor Analysis - Theoretical results Introduction: Base references ( background): SGRANCIO, A., REISEN, V.A,LEVY-LEDUC, C., PASCAL, B. ZIELGELMANN, F.A., ZAMBOM, E, COTTA, H, , Robust Factor Modeling for High-Dimensional Time Series: an Application to Air Pollution Data. Submitted. 2016. COTTA, H., REISEN, V.A., BONDON, P., STUMMER, W. Robust principal component analysis with air pollution data: . 2017. EUSIPCO. Reisen, V. A, L´ evy-Leduc, C. , Zambon, E. Robust Factor Model for long-memory multivariate time series with application to stock market. IN REVISION. Bottoni, J Reisen, V. A, Spanny, Franco, G. E., Pascal, B. Generalized additive model with principal component analysis: An application to time series of respiratory disease and air pollution data.2017. JRSS. Zamprogno. B., Reisen, V. A, , Cotta, Bondon, P.The use of PCA in time correlated processes with short and long-memory property. An application to Air pollution data.2017. Submitted. Erik Vanhataloa and Murat Kulahcia. Impact of Autocorrelation on Principal Components and Their Use in Statistical Process Control. Quality and Reliability Engineering International. 2015. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Simulation: Monte Carlo studies - Models Factor Analysis - Simulation cases Simulation: Monte Carlo studies - Models Monte Carlo experiments were conducted to analyze the eﬀect of high-dimensional time series with additive outliers on the factor modeling and time series with long and short-memory dependency. The empirical study is divided into two cases of xt (Equation 6), which follows a VARFIMA model with r = 3: (1) short-memory process that is, xt is a VARMA process; (2) long-memory process; d = (d1, d2, d3 ) , for at least one d1, d2, d3 ∈ (0, 0.5). The VARFIMA model was generated with independent at from N(0, I ) and Φ coeﬃcients, which are displayed in Table 1. The sample size is n = 50, 100, 200, 400, 800 and 1600, and k = 0.2n, 0.5n, 0.8n. Here, an only one case is presented all k × r elements of matrix P were generated as independent observations from the uniform distribution on the interval [-1,1] ( simulation method was according to Lam and Yao (2012). V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Simulation: Monte Carlo studies - Models Factor Analysis - Simulation cases Factor Analysis - Simulations The empirical study is divided into two cases of xt (Equation 6), which follows a VARFIMA model with r = 3: (1) short-memory process where d = (0, 0, 0) ; that is, xt is a VARMA process; (2) long-memory process where d = (d1, d2, d3 ) , for at least one d1, d2, d3 ∈ (0, 0.5). The VARFIMA model was generated with independent at from N(0, I ) and Φ coeﬃcients, which are displayed in Table 1. The sample size is n = 50, 100, 200, 400, 800 and 1600, and k = 0.2n, 0.5n, 0.8n and 1.2n. Table 1: Φ matrices for VARFI(1,d ) process. Φ1 (Model 1) Φ1 (Model 2) Φ1 (Model 3) 0.6 0.0 0.0 0.6 0.35 0.1 0.2 0.0 0.6 0.0 -0.5 0.0 0.05 -0.5 0.65 0.0 0.3 0.0 0.0 0.0 0.3 0.8 0.0 0.3 0.2 0.0 0.5 V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Simulation: Monte Carlo studies - Models Factor Analysis - Simulation cases Factor Analysis - Simulations Let {zt}, t = 1, ..., t ∈ Z, be a vector process contaminated by additive outliers deﬁned as follows: zt = Px t + ω ◦ δt, (15) where ”◦” is the Hadamard product. ω = [ω1, ..., ωk ] is a magnitude vector of additive outliers. δt = [δ1t, ..., δkt ] is a random vector indicating the occurrence of an outlier at time t, in variable k, such as P(δk,t = −1) = P(δk,t = 1) = p/2 and P(δk,t = 0) = 1 − p, where E[δk,t ] = 0 and E[δ2 k,t ] = Var(δk,t ) = p. The model described above assumes that {Zt} and {δt} are independent processes. Also, it is assumed that the elements of δt are not correlated and temporally uncorrelated, i.e, E(δt δt ) = Σδ = diag(p, ..., p) and E(δt δt+h ) = 0 for h = 0. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Simulation: Monte Carlo studies - Models Factor Analysis - Simulation cases Factor Analysis - Simulations Table 2: Relative frequency estimates for frel. (r = 3) in the simulation with 200 replications - Model 1. ( Classical ACF): simlar results of Lam and Yao (2012) n 50 100 200 400 800 1600 δ = 0 k = 0.2n 0.170 0.585 0.870 0.995 1 1 k = 0.5n 0.395 0.710 0.975 1 1 1 k = 0.8n 0.435 0.740 0.960 1 1 1 V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Simulation: Monte Carlo studies - Models Factor Analysis - Simulation cases Simulation: Monte Carlo studies - Table 3: Relative frequency estimates for dimensional reduction - d = [0.1, 0.2, 0.4] and Model 1’s Φ coeﬃcients, r=3. Γz Γz Γz,Qn Γz,Qn p = 0 p = 0.05 and ω = 15 p = 0 p = 0.05 and ω = 15 n = 100 n = 100 n = 100 n = 100 r = 1 r = 2 r = 3 r = 1 r = 2 r = 3 rQn = 1 rQn = 2 rQn = 3 rQn = 1 rQn = 2 rQn = 3 k = 0.2n 0.110 0.260 0.630 0.310 0.270 0.260 0.150 0.310 0.540 0.160 0.320 0.520 k = 0.5n 0.080 0.110 0.810 0.100 0.200 0.320 0.140 0.110 0.750 0.160 0.130 0.710 k = 0.8n 0.010 0.150 0.840 0.140 0.160 0.280 0.020 0.160 0.820 0.020 0.200 0.780 The third column gives the simulation results using Γz,Qn when p = 0. As one can see, the r estimates using Γz,Qn present similar results of Γz when p = 0. Both methods have the same asymptotical properties. The presence of atypical observations in the data leads to a reduction of the estimated frequencies when r = 3 for all values of k in the classical method. This does not occur when the robust estimator is utilized, the results are quite close to the ones from the ﬁrst column. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - The data The application of the proposed methodology consists of applying the PCA to cluster stations with the same behavior of measured pollutants. This section presents an application of the methodology for PM10 concentrations measured at the Air Quality Automatic Monitoring Network (AQAMN) of the Greater Vit´ oria Region (GVR). The application was divided into two parts: 1) reduction of the dimensions, and 2) forecasting. The data are: PM10 measured in µg/m3; Daily average; The period: January 2005 to December 2009; All eight RAMQAr’s stations. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - Time series 20 40 60 80 Laranjeiras 20 40 60 80 Carapina 10 20 30 40 50 60 Camburi 10 20 30 40 50 60 70 0 200 400 600 Sua Time 10 20 30 40 50 60 70 VixCentro 20 40 60 80 Ibes 20 40 60 80 VVCentro 20 40 60 80 100 0 200 400 600 Cariacica Time Figure 3: PM10 ’s concentrations of RAMQAr’s stations V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - Descriptive statistics Laranjeiras Camburi Sua VixCentro Ibes VVCentro 20 40 60 80 100 Boxplot of PM10 concentrations (µg/m3) Figure 4: Boxplot of PM10 ’s of RAMQAr’s stations. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - The Robust ACF X ACF- IBIS −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Classical ACF Robust ACF V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - FA’s Classical ACF i λi 1 2 3 4 5 6 7 8 0 20000 40000 60000 80000 100000 (a) i λi+1 λi 1 2 3 4 5 6 7 0.2 0.4 0.6 0.8 (b) Figure 6: Plots of estimated eigenvalues (a) and ratios of estimated eigenvalues of ˆ M (b). The test indicated 1 factor. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - FA’s ROBUST i λi 1 2 3 4 5 6 7 8 5000 10000 15000 20000 25000 30000 (a) i λi+1 λi 1 2 3 4 5 6 7 0.5 0.6 0.7 0.8 0.9 (b) Figure 7: Plots of estimated eigenvalues (a) and the ratios of estimated eigenvalues of MQ (b). The Robust indicated 2 factors V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Application to real data - Descriptive statistics Application to real data - Time series Application to real data - FA’s results The RAMQAr - THE FINAL ESTIMATED MODEL The AF analysis suggests the following model for Z t (the daily PM10 concentrations of the stations): Z t = p 1wt + p 2vt + εt, where wt denotes the ﬁrst factor, vt is the second factor of solution ˆ Z = ˆ P ˆ X, and εt is a vector white-noise process. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Conclusions Main Conclusions A robust factor model for high-dimensional time series with short and long-memory and additive outliers is proposed. This study considered the eﬀects of diﬀerent correlation structures and additive outliers on a vector linear process and its implication in the analysis and interpretation of Factor Analysis calculated from the correlation matrix of this process; It was shown that the existence of outliers destroys the correlation and cross-correlation of a vector time series; This article applied the proposed methodology to identify pollution behavior for the pollutant PM10 in the Greater Region of Vit´ oria to enable better management of the local monitoring network. The results in this paper will hopefully stimulate further research on using robust estimation methods and long-memory models to represent and forecast environmental time series. V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Conclusions Thanks!!! valderioanselmoreisen@gmail.com V. A. Reisen Factor Analysis on Time Series
Factor Analysis-Methodology Factor Analysis - Simulation cases Application Conclusions References Chung, C. (2002), ‘Sample means, sample autocovariances, and linear regression of stationary multivariate long memory processes’, Econometric Theory 18, 51–78. Cotta, H. H. A., Reisen, V. A., Bondon, P., Stummer, W. & L´ evy-Leduc, C. (2017), Robust estimation of covariance and correlation functions of a stationary multivariate process, in ‘to apper inSignal Processing Conference (EUSIPCO), 2017 25rd European’, IEEE. Fajardo, F., Reisen, V. A. & Cribari-Neto, F. (2009), ‘Robust estimation in long-memory processes under additive outliers’, Journal of Statistical Planning and Inference 139, 2511–2525. Hosking, J. R. (1996), ‘Asymptotic distributions of the sample mean, autocovariances, and autocorrelations of long-memory time series’, Journal of Econometrics 73(1), 261–284. Lam, C. & Yao, Q. (2012), ‘Factor modeling for high-dimensional time series: Inference for the number of factors’, Ann. Statist. 40(2), 694–726. URL: http://dx.doi.org/10.1214/12-AOS970 Ma, Y. & Genton, M. G. (2000), ‘Highly robust estimation of the autocovariance function’, Journal of Time Series Analysis 21(6), 663–684. Pe˜ na, D. & Box, G. E. P. (1987), ‘Identifying a simplifying structure in time series’, Journal of the American Statistical Association 82(399), pp. 836–843. URL: http://www.jstor.org/stable/2288794 V. A. Reisen Factor Analysis on Time Series