Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Irregular Time Series and How To Whip Them

Irregular Time Series and How To Whip Them

Aileen Nielsen

May 11, 2016
Tweet

Other Decks in Technology

Transcript

  1. OUTLINE i. Why you already care ii. Why you should

    worry iii.Assessing periodic behavior iv.Grasping for causality
  2. Where aren’t irregular time series in human behavior? Anywhere you

    have an underlying, continuous, ongoing truth – but irregular measurement of that truth – you’ve got an irregular time series Health reporting Financial markets Parliamentary election cycles
  3. p.s. It’s really hard to get any human responses, let

    alone temporally regular responses Researchers trying to continue a landmark study published in Science found they couldn’t come close to matching the original study’s response rate.
  4. Where aren’t irregular time series in the natural sciences? Biology

    experiments (qua human behavior experiments) For exogenous and endogenous reasons the natural sciences will always face irregularity of measurement Astronomical data (technical failures, weather, galaxies blocking other galaxies) Paleo-climate proxy data (rock and ice samples of relevant provenance can be hard to find)
  5. OUTLINE i. Why you already care ii. Why you should

    worry iii.Assessing periodic behavior iv.Grasping for causality
  6. Existing solutions Less common More common Strategy Python Resample and

    carry on pandas.DataFrame.resample Impute data and carry on sklearn.preprocessing.Imputer Do something sensible without imputing or losing data To be continued… Strategy Python Analyze each non-gapped bit of the series pandas.DateFrame.shift compute a lag vector and then ’cut’ your series into pieces… But have you got enough data? Feature analysis rather than temporal analysis Sometimes feature extraction is more appropriate than time series analysis see e.g. pyeeg
  7. Pitfalls of imputing data The Effects of the Irregular Sample

    and Missing Data in Time Series Analysis, David M. Kreinler and Charles J. Lumsden, Nonlinear Dynamics, Psychology, and Life Sciences, Vol. 10, No. 2, pp. 187-214. ’Patching’ using ‘gap closure’ shifts spectral features as a resul of the inevitable distortion of temporal relationships in the case of a Lorenz series Random-time sampling of a sinusoid time series effectively adds white noise
  8. Error varies with skewness of distribution of gaps in data

    ad by method Pitfalls in general Rehfield et al, Comparison of correlation analysis techniques for irregularly sampled time series, Nonlin. Processes Geophy., 18, 389-404, 2011 Error also varies depending on the underlying time series behavior In some circumstances a particular method can be disastrous
  9. Know your data before your start your analysis What’s the

    distribution of gaps in your time series? • Skewness • Distribution of missing data • % of missing data What How Skewness pandas.DataFrame.skew Distribution of Missing Data pandas.DateFrame.shift to compute a lag, then histogram the lags, scipy.stats.kstest to test your empirical distribution against common distributions (normal, gamma) % Missing Data numpy.linspace, list comprehensions
  10. What kind of noise do you have in your data

    set? Know your data before your start your analysis
  11. OUTLINE i. Why you already care ii. Why you should

    worry iii.Assessing periodic behavior iv.Grasping for causality
  12. Seeking cycles in irregularly sampled series The period of pulsation

    and absolute magnitude of RR Lyrae make them good standard candles for nearby astronomical structure Beyond the Milky Way, they are difficult to detect due to low luminosity.
  13. Seeking cycles in irregularly sampled series The period of pulsation

    and absolute magnitude of RR Lyrae make them good standard candles for nearby astronomical structure Beyond the Milky Way, they are difficult to detect due to low luminosity.
  14. Seeking cycles in irregularly sampled series Temporally-periodic gene expression during

    development may explain how repeated segments are formed during organismal growth E. Glynn, Using Lomb- Scargle Periodograms to Identify Periodic Genes in Somitogenesis, Stowers Institute for Medical Research, 2006
  15. Seeking cycles in irregularly sampled series E. Glynn, Using Lomb-

    Scargle Periodograms to Identify Periodic Genes in Somitogenesis, Stowers Institute for Medical Research, 2006 Temporally-periodic gene expression during development may explain how repeated segments are formed during organismal growth
  16. Lomb-Scargle Periodograms a.k.a. Least-squares Spectral Analysis • A data vector,

    φ, is represented as a weighted sum of sinusoidal basis functions in a matrix, A, evaluated at the same times with weight vector w: φ ≈ Aw • Using standard linear regression, this leads to the closed form solution for the weights: w = (ATA)-1AT φ • A can be based on any set of mutually independent functions when evaluated at the sample times • Usually use sines and cosines equally distributed over the frequency range • This Discrete Fourier Transform is a special case (orthogonal basis functions)
  17. This might all be sounding quite familiar… http://research.stowers.org/efg/Report/LombScargle-Somitogenesis.pdf This probably

    sounds a lot like a Discrete Fourier Transforms….because it is. Key differences: • Basis functions: ‘independent at times measure’ vs ‘orthogonal’ • Lomb-Scargle weights data points rather than frequency intervals • Lomb-Scargle input data can be unevenly sampled • No data imputation • ”p” value • Any number of data points, rather than exactly 2N
  18. Seeking cycles in irregularly sampled series Sunspot-count has a known

    period and has been irregularly measured since at least the 1600s. We can see the Lomb- Scargle periodogram plot does a better job at finding the true period than a naively applied DFT
  19. How wonderful…and Python already does it • SciPy • O(N2)

    implementation • You must normalize your data to a mean of 0 • astroML • O(N2) implementation • Floating mean periodogram – no need to normalize your data • gatspy • O(N2)and O(NlogN) implementations • For O(NlogN) implementations, must use a regular grid of frequencies • Fast ‘trick’ uses the fact that computations for one sine tell you something about others • Floating mean periodogram – no need to normalize your data • Roll your own…look it up in Numerical Recipes • This info is as of June 2015...double-check if it matters to you J. Vanderplas,, Fast Lomb-Scargle Periodograms in Python, June 2015, blog post
  20. OUTLINE i. Why you already care ii. Why you should

    worry iii.Assessing periodic behavior iv.Grasping for causality
  21. Non-periodic behavior in time series • ‘Learning temporal causal structures

    between time series is one of the key tools for analyzing time series data.’ • Irregularity in sampling violates basic assumptions behind many models for structure learning. • Seek a way to sensibly implement Granger Causality for irregular time series • What’s a good way to test whether an input matters? Lasso regression M. Bahadori and Y. Liu, Granger Causality Analysis in Irregular Time Series Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
  22. Non-periodic behavior in time series • ‘Learning temporal causal structures

    between time series is one of the key tools for analyzing time series data.’ • Irregularity in sampling violates basic assumptions behind many models for structure learning. • Seek a way to sensibly implement Granger Causality for irregular time series • What’s a good way to test whether an input matters? Lasso regression • (Also, check out statsmodels.tsa.stattools.grangercausalitytests ) M. Bahadori and Y. Liu, Granger Causality Analysis in Irregular Time Series Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
  23. The irregular bit M. Bahadori and Y. Liu, Granger Causality

    Analysis in Irregular Time Series, Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012 Predictions of areas with more regular data points should count more overall Predictions of each point in dense areas should be less weighted because information is nearly redundant
  24. M. Bahadori and Y. Liu, Granger Causality Analysis in Irregular

    Time Series, Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
  25. This talk was misnamed Increasing need should motivate future work

    In the meantime, be careful with your data
  26. Sources Images: http://www.westernwhips.com/ http://www.momentology.com/3081-mobile-healthcare-consumers-ready-for-new-technologies/ http://financejakarta.com/tag/stock-exchange/ http://electronicscomponentsworld.com/magnetostrictive-sensors-help-bring-motion-to-worlds-largest-radio-telescope/ http://http://www-scf.usc.edu/~mohammab/sdm2012.pdf http://sas.lau.edu.lb/natural-sciences/facilities/biology-labs-byblos.php https://en.wikipedia.org/wiki/Colors_of_noise https://en.wikipedia.org/wiki/Solar_cycle

    http://www.r-bloggers.com/lomb-scargle-periodogram-for-unevenly-sampled-time-series/ http://spiff.rit.edu/classes/phys230/lectures/mw_size/mw_size.html http://research.stowers.org/efg/Report/LombScargle-Somitogenesis.pdf http://scidavis.sourceforge.net/manual/c4166.html Recommended Papers and Web Pages: http://faculty.washington.edu/dbp/PDFFILES/Mond08a.pdf http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Fullpapers/01696.pdf https://www.researchgate.net/publication/7257956_Effects_of_the_irregular_sample_and_missing_data_in_time_series_analysis http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Fullpapers/01696.pdf http://www-scf.usc.edu/~mohammab/codes/iLasso.m http://www.r-bloggers.com/lomb-scargle-periodogram-for-unevenly-sampled-time-series/ http://research.stowers.org/efg/Report/LombScargle-Somitogenesis.pdf http://http://www-scf.usc.edu/~mohammab/sdm2012.pdf http://exoplanetarchive.ipac.caltech.edu/applications/Periodogram/docs/Algorithms.html