$30 off During Our Annual Pro Sale. View Details »

Milena Machado

Milena Machado

(Federal Institute of Science and Technology of Espírito Santo, Brazil - IFES)

Title — Time series, principal component analysis and logistic regression: An application to the association between annoyance and air pollution

Abstract — Several studies applied regression models to quantify the relationship between annoyance from environmental stressors and measured levels of these stressors, such as odors (Blanes-Vidal, 2015), noise (Klaeboe et al., 2000), vibration (Klaeboe et al., 2003) and various air pollutants (Rotko et al. 2002; Llop et al., 2008; Amundsen et al., 2008; Nikolopoulou et al., 2011; Machado et al. 2018). Most of these authors have used linear and logistic regression techniques to establish a relationship between annoyance and concentration levels of air pollutants, considering only one pollutant in the model as a single covariate. In this talk, we will consider multiple pollutants covariates. These covariates are generally physically and statistically correlated, implying multicollinearity and, therefore, inflation of the variance of the estimators and spurious results.

The objective of this work is to propose a combination of a time series model (VAR-1), principal components analysis (PCA), and binary logistic regression to estimate the annoyance caused by particulate matter using more than one pollutant (PM10, TSP, and PS) in the same model. As Zamprogno (2020) pointed out, the PCA technique requires variables not correlated in time, i. e., serially independent. In practice, air pollutant concentrations can hardly be assumed uncorrelated or stationary.

This work uses a time series model to transform the original air pollutants data in time uncorrelated data (white noise) before applying the PCA technique. By the filtering analysis (VAR (1)) and PCA technique, it is possible to obtain components (linear combination of air pollutants) that are uncorrelated in time and between them. Then, applying multiple logistic regression allows us to calculate the relative risk (RR) of annoyance for each pollutant involved in the model. The relative risk estimates confirm that, in general, an increase in air pollutant concentrations significantly contributes to increasing the probability of being annoyed.

These results provide evidence of a significant correlation between perceived annoyance and groups of particulate matter. This study offers scientific arguments for policymakers to force some industries to reduce the emission of pollutants that generate nuisance and health risks.

Biography — I am Associate Professor at the Federal Institute of Science and Technology of ES – IFES. I am collaborator researcher at NQUALIAR (Group for Air Quality Studies) and collaborator researcher at NUMES (Stochastics Modeling Study Group). At the moment, I am a visiting researcher at the L2S supervised by prof. Pascal Bondon I have been studying/working and interested in: multivariate statistical techniques, such as Times Series Analysis, Principal Component Analysis, Multiple Correspondence Analysis, Logistic Regression Models, and others. The main areas of applications are Environmental Engineering (Air quality area), health science (Epidemiological area), with a special interest in problems of atmospheric pollution.

S³ Seminar

March 16, 2023
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Mar 2023
    Profª. MACHADO Milena
    Associate Professor –IFES, Brazil
    Visiting researcher
    Time series, principal component
    analysis and logistic regression: An
    application to the association
    between annoyance and air
    pollution

    View Slide

  2. Summary
    • Introduction
    • Objective
    • Methodology
    • Results
    • Conclusions
    • Further works

    View Slide

  3. Espírito Santo
    Espírito Santo
    Brazil

    View Slide

  4. Prevailing
    wind
    direction
    The Vitoria region (Brazil)
    Metropolitan area: 1,689,714 hab
    Density: 1490 hab/km²
     3º largest port system in Latin America
    Industrial sites: steel plant, iron ore
    pellet mill, stone quarrying, cement and
    food industry, asphalt plant, etc.

    View Slide

  5. Objective
    The aim of this study is investigate the annoyance caused by
    air pollution.

    View Slide

  6. Contribution
    ETE Barueri (SP)
    Regresseion models only one pollutant
    Time series
    analysis
    PCA
    Logistic
    regression
    + +
    Regresseion models more than one pollution
    Relative risk
    References for this talk:
    • Melo M.M. et al. “STUDY OF A SPATIAL AND TEMPORAL ANALYSIS FOR PARTICULATE MATTER” (Award The best oral presentation –
    Dust Conference – Italy, 2014).
    • Melo M.M. et al, Santos J., Reisen V. (2018) “A new methodology to derive settleable particulate matter guidelines to assist policy-
    makers on reducing public nuisance” (Journal of Atmospheric Environment)
    • Machado M, Reisen VA, Santos JM, Reis NC, Frère S, Bondon P, Ispány M, Cotta HHA., ( 2020) “Use of multivariate time series
    techniques to estimate the impact of particulate matter on the perceived annoy” ( Journal of Atmospheric Environment)

    View Slide

  7. Methodology
     Vitoria (Brazil)
    Survey face to face (n= 2638)
    Mesurement of air
    pollutants
    Time series analysis, Principal component analysis, Logistic regression,
    Relative risk
    Panel by phone (n= 519) from 2011 to 2014
    Settled Particles
    (SP = PM 2.5
    , PM10
    and TSP )

    View Slide

  8. 0,00
    2,00
    4,00
    6,00
    8,00
    10,00
    12,00
    14,00
    jan-11 mai-12 out-13 fev-15
    Settled particles flux (SP)
    Meses
    0,00
    5,00
    10,00
    15,00
    20,00
    25,00
    30,00
    35,00
    40,00
    jan-11 mai-12 out-13 fev-15
    PM10 (Mean 30 days)
    Meses
    0,00
    10,00
    20,00
    30,00
    40,00
    50,00
    60,00
    jan-11 mai-12 out-13 fev-15
    PM10 (max 30 days)
    Meses
    0,00
    10,00
    20,00
    30,00
    40,00
    50,00
    60,00
    70,00
    jan-11 mai-12 out-13 fev-15
    PTS (Mean 30 days)
    Meses
    0,00
    20,00
    40,00
    60,00
    80,00
    100,00
    120,00
    jan-11 mai-12 out-13 fev-15
    PTS (Max (30 days)
    Meses
    Time series analysis

    View Slide

  9. RESULTS
    Question: Think about this month, how do you fell
    annoyed by dust, in a scale from 1 to 10, where 1 is
    not annoyed and 10 extremely annoyed ?
    1-2-3-4-5-6-7-8-9-10
    n= 2638 8% 8%
    18%
    32%
    34%
    0%
    5%
    10%
    15%
    20%
    25%
    30%
    35%
    40%
    0 a 2 3 a 4 5 a 6 7 a 8 9 a 10
    Percentage of respondents
    Levels of annoyance
    P (x= 5) P(x=10) P(x=14)
    RMGV -3,32 ,449 1,566 25% 76% 95%
    0%
    10%
    20%
    30%
    40%
    50%
    60%
    70%
    80%
    90%
    100%
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
    Probabilidade de incomodados

    View Slide

  10. CORRELATION MATRIX FOR THE ORIGINAL VARIABLES
    (BEFORE TIME SERIES ANALYSIS – VAR 1)
    Variables SP PM10
    (mean)
    TSP
    (mean)
    PM10
    (maxim) TSP (maxim)
    SP 1.
    PM10
    (mean) 0.424** 1
    TSP (mean) 0.278 0.764** 1
    PM10
    (maxim) 0.409** 0.681** 0.654** 1
    TSP (maxim) 0.342* 0.701** 0.754** 0.772** 1
    **p-value=0,01; *p-value=0,05
    RESULTS
    Zamprogno (2013), the PCA technique requires variables that are not correlated
    in time, i. e., and also stationary time series (serially independent). Thus, it is
    necessary to apply a Vector Autoregressive Model as a filter to eliminate the
    temporal correlation. To avoid spurious results.

    View Slide

  11. Auto-correlation function and partial correlation function
    PS – Settled Particles
    PM10 Mean 30 days
    TSP Mean 30 days
    PM10 Max 30 days
    TSP Max 30 days
    2011 2013 2014
    6 7 8 9 10 11 12 13
    Time
    Deposition rate (g/m3
    30 days)
    0 5 15 25
    -0.2 0.0 0.2 0.4 0.6 0.8 1.0
    Lag
    ACF
    0 5 15 25
    -0.2 0.0 0.2 0.4
    Lag
    Partial ACF
    2011 2012 2013 2014
    24 26 28 30 32 34
    Time
    PM10
    : Monthly mean ( /m3
    )
    0 5 10 15 20 25 30
    -0.4 0.0 0.2 0.4 0.6 0.8 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.2 0.0 0.2 0.4 0.6
    Lag
    Partial ACF
    2011 2013 2014
    30 35 40 45 50
    Time
    PM10
    : Monthly maximum ( /m3
    )
    0 5 15 25
    -0.2 0.0 0.2 0.4 0.6 0.8 1.0
    Lag
    ACF
    0 5 15 25
    -0.3 -0.1 0.0 0.1 0.2 0.3
    Lag
    Partial ACF
    2011 2012 2013 2014
    35 40 45 50 55 60
    Time
    TSP: Monthly mean( /m3
    )
    0 5 10 15 20 25 30
    -0.4 0.0 0.2 0.4 0.6 0.8 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.2 0.0 0.2 0.4 0.6 0.8
    Lag
    Partial ACF
    2011 2013 2014
    50 60 70 80 90
    Time
    TSP: Monthly maximum( /m3
    )
    0 5 15 25
    -0.2 0.0 0.2 0.4 0.6 0.8 1.0
    Lag
    ACF
    0 5 15 25
    -0.2 0.0 0.2 0.4
    Lag
    Partial ACF

    View Slide

  12. SP (Settled Particles –
    30 days)
    PM10
    (Mean 30 days)
    TSP
    (Mean 30 days)
    PM10
    (Max 30 days)
    TSP
    (Máx 30 days)
    After Var (1) filter
    0 5 10 15 20 25 30
    -0.2 0.2 0.6 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.3 -0.1 0.1 0.3
    Lag
    Partial ACF
    0 5 10 15 20 25 30
    -0.2 0.2 0.6 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.3 -0.1 0.1 0.3
    Lag
    Partial ACF
    0 5 10 15 20 25 30
    -0.2 0.2 0.6 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.3 -0.1 0.1 0.3
    Lag
    Partial ACF
    0 5 10 15 20 25 30
    -0.2 0.2 0.6 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.3 -0.1 0.1 0.3
    Lag
    Partial ACF
    0 5 10 15 20 25 30
    -0.2 0.2 0.6 1.0
    Lag
    ACF
    0 5 10 15 20 25 30
    -0.3 -0.1 0.1 0.3
    Lag
    Partial ACF

    View Slide

  13. Variables SP PM10
    (mean)
    TSP
    (mean)
    PM10
    (maxim)
    TSP
    (maxim)
    SP 1,000
    PM10
    (mean) 0,214 1,000
    TSP (mean) -0,004 ,573** 1,000
    PM10
    (maxim) 0,234 ,428** ,344* 1,000
    TSP (maxim) ,378* ,533** ,337* ,685** 1,000
    **p-value=0,01
    *p-value=0,05
    CORRELATION MATRIX FOR THE VARIABLES AFTER APPLYING THE FILTERING MODEL
    RESULTS
    The PCA technique is going to be applied at the filtered series in
    order to avoid the cross-correlation (multicolinearity) among
    variables.

    View Slide

  14. PC1 PC2 PC3 PC4 PC5
    Eigenvalue 2,576 1,071 0,681 0,396 0,276
    Variability (%) 51,528 21,426 13,622 7,913 5,510
    Cumulative % 51,528 72,955 86,577 94,490 100,000
    SP (monthly rate) 0,267 0,733* -0,554 -0,269 -0,112
    PM10
    (monthly mean) 0,495* -0,257 -0,365 0,674 -0,319
    TSP (monthly mean) 0,400* -0,583 -0,318 -0,607 0,172
    PM10
    (monthly maxim) 0,492* 0,104 0,611* -0,254 -0,557
    TSP (monthly maxim) 0,531* 0,214 0,293 0,200 0,739
    RESULTS OF FACTOR LOADINGS STATISTICS AND APPLICATION OF PCA
    RESULTS
    The components PC1, PC2 and PC3 explain about 86% of the total variability
    the original data.

    View Slide

  15. Pollutants RR IC (95%) Dif IC
    SP 1.462 (1.070; 1.854) 0,784
    PM
    10
    (monthly mean) 1.649 (1.061; 2.237) 1,176
    TSP (monthly mean) 2.181 (1.471; 2.891) 1,42
    PM
    10
    (monthly maxim) 2.411 (1.401; 3.421) 2,02
    TSP (monthly maxim) 1.822 (0.592; 3.052) 2,46
    THE RELATIVE RISK ESTIMATED BY MODEL VAR-PCA-LOGISITC REGRESSION
    RESULTS
    The estimated
    relative risks
    increased the
    probability of
    annoyance by a
    factor of 1.5
    considering the
    interquartile
    variation equal to
    2g/m² 30 days.
    ̰
    𝜷 Standard error
    PC1 0,053 0,202
    PC2 0,058 0,309
    PC3 -0,245 0,390
    Intercept 0,204 0,320
    Parameters estimated by the multiple logistic model

    𝑅𝑅∗
    𝑥𝑖
    ≈ 𝑒𝑥𝑖

    𝛽𝑖∗
    The RR can be defined as the association that an effect can be
    occur (annoyance) following a certain exposure to a risk factor.
    y= Degree ≥7 (Extremely annoyed) = 1
    Degree <7 (not/ little annoyed) = 0
    X = SP, TSP, PM10
    𝑃 𝑌 = 1 = 𝜋 𝑿 =
    𝑒𝛽0+⋯+𝛽𝑝𝑥
    1 + 𝑒𝛽0+⋯+𝛽𝑝𝑥

    View Slide

  16. Conclusions
     By combining VAR-PCA-LOG statistical techniques, is proposed as useful tool
    to considering a group of pollutants at the same model.
     This study provide evidence of a significant correlation between particulate
    matter and perceived annoyance levels, indicating that, at least for
    particulate matter, perceived annoyance is not related only to one pollutant
    but to a group of pollutants.
     The estimates relative risk showed that, in general, an increase in air
    pollutant concentrations (i.e., the particulate matter metrics examined here:
    TSP, PM10
    and SP) significantly contributes in increasing the probability of
    being annoyed.

    View Slide

  17. Further work…
    1. Use the bootstrap technique, or others, to estimate the most accurate
    confidence intervals of the results.
    2. Add other pollutants in a multiple model.
    Papers published:
    MACHADO, M.; SANTOS, J. M.; FRERE, S.; CHAGNON, P.; REISEN, V. A.; BONDON, P.; ISPÁNY, M.;
    MAVROIDIS, I.; REIS JR, N. C. Deconstruction of annoyance due to air pollution by multiple correspondence
    analyses. Environmental Science and Pollution Research, v. 28, n. 35, p. 47904-47920, 2021.
    MACHADO, M. ; SANTOS, J. M.; REISEN, V. A.; PEGO E SILVA, A. F.; REIS JUNIOR, N. C.; BONDON, P.;
    MAVROIDIS, I.; PREZOTTI FILHO, P. R.; FRERE, S.; LIMA, A. T. Parameters influencing population
    annoyance pertaining to air pollution. Journal of Environmental Management, v. 323, p. 115955, 2022.

    View Slide

  18. Contact Information:
    Milena Machado de Melo
    ([email protected])
    Merci !

    View Slide