Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConJP 2016: pandas による 時系列データ処理

Sinhrks
September 22, 2016

PyConJP 2016: pandas による 時系列データ処理

Sinhrks

September 22, 2016
Tweet

More Decks by Sinhrks

Other Decks in Science

Transcript

  1. pandas ʹΑΔ

    ࣌ܥྻσʔλॲཧ

    @PyConJP 2016

    sinhrks

    View Slide

  2. ࣗݾ঺հ
    • @sinhrks

    • ۀ຿: σʔλ෼ੳ

    • OSS׆ಈ:

    PyData Development Team (pandas)

    Dask Development Team (Dask)

    • GitHub: https://github.com/sinhrks

    View Slide

  3. ໨త
    • ໨త:

    • ࣌ܥྻσʔλ෼ੳͷͨΊͷޮ཰తͳॲཧΛ஌Δ

    • ࣌ܥྻϞσϧͷΠϯτϩμΫγϣϯ

    View Slide

  4. ໨࣍
    • pandasͱ͸

    • ࣌ܥྻσʔλͷॲཧ

    • ࣌ܥྻσʔλͷ౷ܭϞσϧ

    • ͓·͚:

    • ։ൃϩʔυϚοϓ

    View Slide

  5. pandasͱ͸?

    View Slide

  6. pandasͱ͸
    • σʔλ෼ੳͷͨΊͷσʔλߏ଄ͱɺσʔλͷલॲཧ / ूܭʹ͓
    ͍ͯศརͳؔ਺ / ϝιουΛఏڙ

    • Rͷ “data.frame” + α

    • ࡞ऀ: Wes McKinney

    • ϥΠηϯε: BSD

    • ҙຯ: PANel DAta System

    • GitHub: 7000↑⭐️

    View Slide

  7. pandasΛ࢖͏ϝϦοτ
    • ݱ࣮ͷ(Ԛ͍)σʔλʹରԠ

    • ௚ײతͳૢ࡞

    • ߴ଎

    • ࢀߟ: pandas internals @PyConJP 2015

    View Slide

  8. pandasͷσʔλߏ଄
    • σʔλͷ࣍ݩ͝ͱʹఆٛ
    4FSJFT
    ࣍ݩ

    %BUB'SBNF
    ࣍ݩ

    1BOFM
    ࣍ݩ

    ৭෇͖ͷηϧ͸ϥϕϧ
    ࣍ݩҎ্ͷσʔλߏ଄͸WͰඇਪ঑

    View Slide

  9. DataFrame
    • 2࣍ݩͷσʔλߏ଄:

    • ߦ (index) ͱ ྻ(columns) ʹϥϕϧΛ࣋ͭ

    • ྻ͝ͱʹܕΛ࣋ͭ
    $PMVNOT
    *OEFY
    JOUܕ PCKFDUܕ

    View Slide

  10. import pandas as pd
    df = pd.read_csv(‘adult.csv’)
    df
    DataFrame
    "EVMU%BUBTFUUBLFOGSPN6$*.-3FQPTJUPSZ
    -JDINBO .
    6$*.BDIJOF-FBSOJOH3FQPTJUPSZ

    View Slide

  11. DataFrame
    df[['age', 'marital-status']]
    df.groupby('income')['hours-per-week'].mean()
    άϧʔϓԽ ྻબ୒ ू໿ ฏۉ

    ྻͷબ୒

    View Slide

  12. pandasͷػೳ
    • ϕΫτϧԽ͞Εͨܭࢉ

    • άϧʔϓԽ ὎ ू໿ (split-apply-combine)

    • มܗ (merge, join, concat…)

    • ଟ༷ͳೖग़ྗ (SQL, CSV, Excel, …)

    • ॊೈͳ࣌ܥྻσʔλॲཧ

    • ՄࢹԽ

    View Slide

  13. ؀ڥ
    • όʔδϣϯ

    • Python 3.5.2

    • pandas 0.19.0rc1

    • statsmodels 0.8.0rc1

    • ໊લۭؒ
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import statsmodels.api as sm

    View Slide

  14. pandasʹΑΔ

    ࣌ܥྻσʔλॲཧ

    View Slide

  15. ࣌ܥྻσʔλͱ͸
    • ݱ࣮ͷσʔλ͸ʁ
    ͋Δݱ৅ͷ࣌ؒతͳมԽΛɺ࿈ଓతʹʢ·ͨ͸ҰఆִؒΛ͓͍ͯ
    ෆ࿈ଓʹʣ؍ଌͯ͠ಘΒΕͨ஋ͷܥྻʢҰ࿈ͷ஋ʣ XJLJQFEJBΑΓ

    View Slide

  16. ݱ࣮ͷσʔλ͸
    • ඞཁͳपظ͕ҟͳΔ

    • ೔࣍σʔλΛ݄࣍Ͱ෼ੳ͍ͨ͠

    • पظతͰͳ͍

    • Πϕϯτͷൃੜ͝ͱʹه࿥͞ΕͨϩάΛ෼ੳ͍ͨ͠

    • ࣌ؒͰϥϕϧ෇͚͞Ε͍ͯͳ͍

    • ೔࣌Λྻͱؚͯ͠ΉੜσʔλΛɺ͋Δपظ͝ͱʹूܭͯ͠෼ੳ͍ͨ͠

    ԿΒ͔ͷલॲཧΛߦ͍ѻ͍΍͍͢ܗʹ͢Δ
    ੜσʔλ ࣌ܥྻσʔλ

    View Slide

  17. ࣌ܥྻσʔλͷ४උ
    values = [datetime.datetime(2001, 1, 1),
    datetime.datetime(2001, 2, 1),
    datetime.datetime(2001, 3, 1)]
    s = pd.Series(np.arange(3), index=values)
    s
    2001-01-01 0
    2001-02-01 1
    2001-03-01 2
    dtype: int64
    ೔࣌ͷϦετ
    ೔࣌Λϥϕϧͱ͢Δ
    Ұ࣍ݩσʔλ 4FSJFT

    df = pd.DataFrame({'঎඼A': [25, 27, 30],
    '঎඼B': [10, 15, 17]},
    index=values)
    df
    ೔࣌Λϥϕϧͱ͢Δ
    ೋ࣍ݩσʔλ %BUB'SBNF

    View Slide

  18. ࣌ܥྻσʔλͷϥϕϧ
    df.index
    DatetimeIndex(['2001-01-01', '2001-02-01', '2001-03-01'],
    dtype='datetime64[ns]', freq=None)
    df
    df['঎඼A']
    2001-01-01 25
    2001-02-01 27
    2001-03-01 30
    Name: ঎඼A, dtype: int64
    ϥϕϧ͸೔࣌ͷܕΛ࣋ͭ
    %BUFUJNF*OEFY

    ϥϕϧ JOEFY

    ྻͷબ୒

    View Slide

  19. σʔλͷ४උ
    • ΍Γ͍ͨ͜ͱ

    • 1. σʔλʹ೔࣌ͷϥϕϧΛ͚͍ͭͨ

    • 2. ೚ҙͷ೔࣌ϑΥʔϚοτΛύʔε͍ͨ͠

    View Slide

  20. γʔέϯεͷੜ੒ (pd.date_range)
    s = pd.Series(np.arange(10))
    s
    0 0
    1 1
    2 2
    dtype: int64
    s.index = pd.date_range('2001-01-01', freq='M', periods=3)
    s
    2001-01-31 0
    2001-02-28 1
    2001-03-31 2
    Freq: M, dtype: int64
    ϥϕϧ JOEFY
    Λ্ॻ͖
    pd.date_range('2001-01-01', freq='M', periods=3)
    DatetimeIndex(['2001-01-31', '2001-02-28', '2001-03-31'],
    dtype='datetime64[ns]', freq='M')
    ͔Β݄࣍Ͱݸ
    ͷσʔλΛ࡞੒
    ೔࣌ͷϥϕϧ͕ͳ͍σʔλ

    View Slide

  21. Frequency String
    • ੜ੒͢Δ࣌ܥྻͷपظΛࢦఆ͢Δ

    • ଞɺશ25छྨ
    'SFRVFODZ4USJOH ҙຯ
    " ೥຤
    . ݄຤
    8 ि
    % ೔
    ) ࣌
    5 ෼
    4 ඵ

    View Slide

  22. Frequency String
    pd.date_range('2016-01-01', freq='M', periods=3)
    DatetimeIndex(['2016-01-31', '2016-02-29', ‘2016-03-31’],
    dtype='datetime64[ns]', freq='M')
    pd.date_range('2016-01-01', freq='MS', periods=3)
    DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01'],
    dtype='datetime64[ns]', freq='MS')
    pd.date_range('2016-01-01', freq='W', periods=3)
    DatetimeIndex(['2016-01-03', '2016-01-10', '2016-01-17'],
    dtype='datetime64[ns]', freq='W-SUN')
    pd.date_range('2016-01-01', freq='W-TUE', periods=3)
    DatetimeIndex(['2016-01-05', '2016-01-12', '2016-01-19'],
    dtype='datetime64[ns]', freq='W-TUE')
    856&
    ि Ր༵࢝·Γ

    8ि
    .4݄ॳ
    .݄຤

    View Slide

  23. ೔࣌ͷύʔε (pd.to_datetime)
    • ೔࣌จࣈྻΛߴ଎ʹύʔε

    • C Parser ὎ ਖ਼نදݱ ὎ dateutil
    pd.to_datetime(['2016-09-22', '2016-09-23'])
    DatetimeIndex(['2016-09-22', ‘2016-09-23'],
    dtype='datetime64[ns]', freq=None)
    pd.to_datetime(['September 22nd, 2016', 'September 22nd, 2016'])
    DatetimeIndex(['2016-09-22', ‘2016-09-22’],
    dtype='datetime64[ns]', freq=None)
    pd.to_datetime(['22 Sep 2016', '23 Sep 2016'])
    DatetimeIndex(['2016-09-22', ‘2016-09-23'],
    dtype='datetime64[ns]', freq=None)

    View Slide

  24. ೔࣌ͷύʔε (pd.to_datetime)
    • ϑΥʔϚοτࢦఆʹΑΔॊೈͳύʔε΋Մೳ
    pd.to_datetime(['2016೥9݄22೔', '2016೥9݄23೔'])
    ValueError: Unknown string format
    pd.to_datetime(['2016೥9݄22೔', '2016೥9݄23೔'], format='%Y೥%m݄%d೔')
    DatetimeIndex(['2016-09-22', ‘2016-09-23'],
    dtype='datetime64[ns]', freq=None)

    View Slide

  25. σʔλબ୒
    • ΍Γ͍ͨ͜ͱ

    • 1. ͋Δ೔࣌Λબ୒͍ͨ͠

    • 2. ͋ΔظؒΛબ୒͍ͨ͠

    • 3. ͋Δ৚݅Λຬͨ͢೔࣌Λબ୒͍ͨ͠

    View Slide

  26. σʔλબ୒
    idx = pd.date_range('2016-01-01', freq='D', periods=366)
    df = pd.DataFrame({'঎඼A': np.random.randint(100, size=366),
    '঎඼B': np.random.randint(100, size=366)},
    index=idx)
    df.loc[datetime.datetime(2016, 1, 2)]
    ঎඼A 12
    ঎඼B 64
    Name: 2016-01-02 00:00:00, dtype: int64
    ͷߦΛબ୒
    ݁Ռ͸4FSJFT
    df.loc['2016-01-02']
    ঎඼A 12
    ঎඼B 64
    Name: 2016-01-02 00:00:00, dtype: int64
    จࣈྻ͸೔࣌ͱͯ͠ѻΘΕΔ
    df

    View Slide

  27. εϥΠεʹΑΔબ୒
    Ҏ߱Λબ୒
    df.loc['2016-09-22':]
    df
    df.loc['2016-09-01':'2016-09-30':2]
    ʙ·Ͱ
    ೔͓͖ʹબ୒

    View Slide

  28. ෦෼จࣈྻʹΑΔબ୒
    df['2016-03']
    df
    df['2016-03':'2016-05']
    ݄ʙ݄ͷσʔλΛ
    બ୒
    จࣈྻ͕೔෇Λؚ·ͳ͍
    ݄ͷσʔλΛબ୒

    View Slide

  29. ৚݅ʹΑΔબ୒
    df.index.month
    df.loc[(df.index.month == 1) | (df.index.month == 3)]
    array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    ...
    12, 12, 12, 12, 12, 12, 12, 12, 12], dtype=int32)
    ϕΫτϧԽ͞Εͨ

    ϓϩύςΟΞΫηε
    (df.index.month == 1) | (df.index.month == 3)
    array([ True, True, True, True, True, True, True,
    ...
    False, False, False, False, False, False], dtype=bool)

    View Slide

  30. ϓϩύςΟΞΫηε
    • ೔࣌ͷଐੑʹԠͨ͡ॲཧ͕؆୯ʹॻ͚Δ
    ϓϩύςΟ ϓϩύςΟ
    ZFBS EBUF
    NPOUI UJNF
    EBZ EBZPGZFBS
    IPVS XFFLPGZFBS
    NJOVUF XFFL
    TFDPOE EBZPGXFFL
    NJDSPTFDPOE XFFLEBZ
    OBOPTFDPOE [email protected]
    RVBSUFS

    View Slide

  31. ೔࣌σʔλͷલॲཧ
    • ΍Γ͍ͨ͜ͱ

    • 1. ೔࣌ͷपظΛม͍͑ͨ

    • 2. ܽଛ஋Λิ׬͍ͨ͠

    • 3. લޙͷ஋ͱൺֱ / ܭࢉ͍ͨ͠

    View Slide

  32. ϦαϯϓϦϯά (.resample)
    • αϯϓϧσʔλ

    • ਖ਼نཚ਺ͷྦྷੵ࿨ (ϥϯμϜ΢ΥʔΫ)
    idx = pd.date_range('2016-09-22', freq='H', periods=50)
    df = pd.DataFrame({'val': np.random.randn(50)}, index=idx)
    df = df.cumsum()
    df

    View Slide

  33. ϦαϯϓϦϯά (.resample)
    df.resample('6H').mean()
    df.resample('30T').interpolate()
    μ΢ϯαϯϓϦϯά
    ΞοϓαϯϓϦϯά

    View Slide

  34. ϦαϯϓϦϯά (.resample)
    • ༷ʑͳू໿͕Մೳ
    ू໿ϝιου ू໿ϝιου
    ⒏MM NFEJBO
    CBDLpMM NJO
    QBE PIMD
    pMMOB QSPE
    JOUFSQPMBUF TJ[F
    DPVOU TFN
    OVOJRVF TUE
    pSTU
    MB
    TVN
    MBTU WBS
    NBY

    View Slide

  35. ิ׬ (.interpolate)
    • αϯϓϧσʔλ

    • ஋ʹܽଛ (NaN) ΛؚΉ
    indexer = np.random.randint(4, size=50) == 1
    df.loc[indexer] = np.nan
    df
    ܽଛ
    ܽଛ

    View Slide

  36. ิ׬ (.interpolate)
    • ܽଛ஋ͷิ׬

    • ಺෦Ͱ scipy.interpolate Λར༻
    df.interpolate()

    View Slide

  37. ΢Οϯυ΢ؔ਺ (.rolling)
    • .resample ͱಉ͘͡ɺू໿ϝιουΛνΣΠϯͰ
    ͖Δ
    df.rolling(3).mean()

    View Slide

  38. γϑτ (.shift)
    • ஋Λࢦఆ͞Εͨ periods ͚ͩͣΒ͢

    • લޙͷ஋ͱͷൺֱ / ܭࢉΛ͢Δࡍʹศར
    df.shift(periods=1)

    View Slide

  39. ࠩ෼ (.diff)
    • ࢦఆ͞Εͨ periods ͱͷࠩΛͱΔ

    • df - df.shift() ͱಉ͡
    df.diff(periods=1)

    View Slide

  40. γϑτͷ࢖͍ํ (ྫ)
    idx = pd.date_range('2016-09-22 10:00', freq='T', periods=50)
    df = pd.DataFrame({'val': np.repeat([0, 1, 0, 1], [10, 20, 10, 10])},
    index=idx)
    df
    df.index[df['val'] != df[‘val'].shift()]
    DatetimeIndex(['2016-09-22 10:00:00', '2016-09-22 10:10:00',
    '2016-09-22 10:30:00', '2016-09-22 10:40:00'],
    dtype='datetime64[ns]', freq=None)

    View Slide

  41. ूܭ
    • ΍Γ͍ͨ͜ͱ

    • ೔࣌ΛؚΉੜσʔλΛूܭ͍ͨ͠

    View Slide

  42. ೔࣌σʔλͷूܭ
    • αϯϓϧσʔλ

    • ঎඼ͷൃ஫σʔλ
    df = pd.DataFrame({'਺ྔ': np.random.randint(100, size=1000),
    '঎඼໊': np.random.choice(list('ABC'), 1000),
    'ൃ஫೔': np.random.choice(idx, 1000)})
    df

    View Slide

  43. ೔࣌σʔλͷूܭ
    • pd.Grouper

    • ྻ໊ͱपظΛࢦఆͨ͠άϧʔϓԽ
    df.groupby([pd.Grouper(key='ൃ஫೔', freq='M'), '঎඼໊']).sum()

    View Slide

  44. ϓϩύςΟΞΫηε (.dt)
    df['ൃ஫೔'].dt.weekday
    df.groupby(df['ൃ஫೔'].dt.weekday).sum()
    0 6
    1 4
    2 6
    ..
    997 6
    998 5
    999 2
    Name: ൃ஫೔, dtype: int64
    EUϓϩύςΟΛ௨ͯ͡ɺ
    ೔࣌ϓϩύςΟ΁ͷΞΫηε͕Մೳ

    View Slide

  45. Ϋϩεूܭ (pd.pivot_table)
    • pd.pivot_table +

    • pd.Grouper

    • ϓϩύςΟΞΫηε
    pd.pivot_table(df, index=pd.Grouper(key='ൃ஫೔', freq='M'),
    columns='঎඼໊', values='਺ྔ', aggfunc='sum')

    View Slide

  46. ΧϨϯμʔ
    • pandas.tseries.offsets.CustomBusinessDay

    • ॕ೔Λߟྀͨ͠ॲཧ

    • japandas (https://github.com/sinhrks/japandas)

    • JapaneseHolidayCalendar
    from pandas.tseries.offsets import CustomBusinessDay
    import japandas
    cal = japandas.JapaneseHolidayCalendar()
    cbd = CustomBusinessDay(calendar=cal)
    idx = pd.DatetimeIndex(['2016-09-20', '2016-09-21', ‘2016-09-22'])
    idx + cbd
    DatetimeIndex(['2016-09-21', '2016-09-23', '2016-09-23'],
    dtype='datetime64[ns]', freq=None)
    ͸ॕ೔

    View Slide

  47. ՄࢹԽ
    • ࣌ܥྻσʔλͷपظΛࣗಈͰௐ੔ͯ͠ϓϩοτ
    idx1 = pd.date_range('2016-09-01', freq='D', periods=50)
    df1 = pd.DataFrame({'val1': np.random.randn(50)}, index=idx1)
    df1.plot()

    View Slide

  48. ՄࢹԽ
    • ࣌ܥྻσʔλͷपظΛࣗಈͰௐ੔ͯ͠ϓϩοτ

    • पظ͕ҟͳΔ৔߹΋ࣗಈௐ੔
    idx2 = pd.date_range('2016-09-01', freq='M', periods=3)
    df2 = pd.DataFrame({'val1': np.random.randn(3)}, index=idx2)
    ax = df1.plot()
    df2.plot(ax=ax)

    View Slide

  49. ࣌ܥྻσʔλͷ

    ౷ܭϞσϧ

    View Slide

  50. ࣌ܥྻσʔλͷ౷ܭϞσϧ
    • ໨త

    • ࣌ܥྻͷؔ܎Λௐ΂͍ͨ

    • কདྷͷ༧ଌΛ͍ͨ͠

    • มԽ఺ / ҟৗ஋Λݕ஌͍ͨ͠

    • …

    • ࣌ܥྻσʔλͷཹҙ఺

    • ͋Δ࣌఺Ҏલͷσʔλ͔ΒͷӨڹ͕͋Δ͔ʁ

    • τϨϯυ΍قઅੑ͕͋Δ͔ʁ

    View Slide

  51. ࣌ܥྻϞσϧΛؚΈPythonύοέʔδ
    • ར༻͍ͨ͠Ϟσϧʹ߹ΘͤͯύοέʔδΛબͿ

    • ඞཁʹԠ͡ R Λར༻ (rpy2, pypeR)
    4UBUT.PEFMT 1Z'MVY
    ౷ܭྔݕఆ ✅
    "3*." ✅ ✅
    7"3 ✅ ✅
    ("3$)

    TBOECPY


    ("4 ✅
    4UBUF4QBDF

    SD


    View Slide

  52. αϯϓϧσʔλ
    • AirPassengers

    • ݄࣍ͷࠃࡍઢ౥৐ਓ਺ (ઍਓ)

    • ୯มྔɺτϨϯυͱقઅੑΛ࣋ͭ
    df = pd.read_csv('airpassengers.csv', index_col=0, parse_dates=[0])
    df

    View Slide

  53. • ࣌ܥྻΛτϨϯυɺقઅੑɺ࢒ࠩʹ෼ղ
    ࣌ܥྻͷ੒෼෼ղ
    res = sm.tsa.seasonal_decompose(df)
    fig = res.plot();
    ݩσʔλ
    τϨϯυ
    قઅੑ
    ࢒ࠩ

    View Slide

  54. ౷ܭྔ
    • ඪຊࣗݾ૬ؔ (ACF)

    • ҟ࣌఺ؒͷڞ෼ࢄΛඪ४Խͨ͠΋ͷ

    • ඪຊภࣗݾ૬ؔ (PACF)
    fig, axes = plt.subplots(1, 2)
    sm.tsa.graphics.plot_acf(df, ax=axes[0]);
    sm.tsa.graphics.plot_pacf(df, ax=axes[1]);

    View Slide

  55. ౷ܭྔ
    • ਖ਼نཚ਺ (ϗϫΠτϊΠζ) ͷ৔߹

    • աڈͷ஋ͱ૬͕ؔͳ͍
    wn = pd.Series(np.random.randn(100))
    fig, axes = plt.subplots(1, 2)
    sm.tsa.graphics.plot_acf(wn, ax=axes[0]);
    sm.tsa.graphics.plot_pacf(wn, ax=axes[1]);

    View Slide

  56. SARIMAϞσϧ
    • قઅతࣗݾճؼ࿨෼ҠಈฏۉϞσϧ

    • ࣗݾճؼ࿨෼ҠಈฏۉϞσϧ (ARIMA)

    • + قઅมಈ (ARIMA)

    • ARIMA (p, d, q)

    • d֊ࠩ෼Λͱͬͨ࣌ܥྻ yt ͕

    • (ऑ)ఆৗ (ฏۉɺࣗݾڞ෼ࢄ͕࣌ؒʹΑΒͣҰఆ)

    • ҎԼͷաఔʹै͏

    • yt = c + φ1yt-1 + … + φpyp + εt + θ1εt-1 + … + θqεt-q

    View Slide

  57. τϨϯυͷআڈ
    df.plot()
    df.diff().plot()
    ෼ࢄ͕େ͖͘ͳ͍ͬͯΔ
    ֊ࠩ

    View Slide

  58. ର਺ม׵
    ldf = np.log(df)
    ldf.plot()
    ldf.diff().plot()
    ର਺ม׵

    View Slide

  59. قઅੑͷআڈ
    res = sm.tsa.seasonal_decompose(ldf)
    seasonal_adjust = (ldf - res.seasonal)
    seasonal_adjust.plot()
    قઅ੒෼ΛҾ͘

    View Slide

  60. ୯Ґࠜݕఆ
    • Augmented Dickey-Fullerݕఆ
    sm.tsa.adfuller(df['Air passengers'])[1]
    0.99188024343764114
    sm.tsa.adfuller(ldf['Air passengers'])[1]
    0.42236677477038415
    sm.tsa.adfuller(ldf['Air passengers'].diff().dropna())[1]
    0.071120548150854057
    ݩσʔλ
    ର਺Խ
    ର਺Խ֊ࠩ
    sm.tsa.adfuller(seasonal_adjust['Air passengers'].diff().dropna())[1]
    8.0990048658604878e-09
    ର਺Խقઅੑআڈ֊ࠩ

    View Slide

  61. SARIMAϞσϧͷਪఆ
    mod_seasonal = sm.tsa.SARIMAX(ldf, trend='c', order=(1, 1, 1),
    seasonal_order=(0, 1, 2, 12))
    res_seasonal = mod_seasonal.fit()
    res_seasonal.summary()
    ʜ
    ʜ
    "3*."قઅ੒෼ͷύϥϝʔλ
    SD͕ඞཁ

    View Slide

  62. Ϟσϧ͔Βͷ༧ଌ
    pred = res_seasonal.forecast(36)
    pred
    1961-01-01 6.110548
    1961-02-01 6.052912
    1961-03-01 6.174690
    ...
    1963-10-01 6.388955
    1963-11-01 6.242262
    1963-12-01 6.345214
    Freq: MS, dtype: float64
    ax = ldf.plot()
    pred.plot(ax=ax)
    ظઌΛ༧ଌ
    ݩσʔλ༧ଌ஋Λϓϩοτ

    View Slide

  63. ·ͱΊ
    • pandas Λ࢖ͬͯ࣌ܥྻ·ΘΓͷॲཧΛ͢Δํ๏

    • PythonͰ࣌ܥྻϞσϧΛѻ͏ํ๏ (ͷ৮Γ)

    View Slide

  64. ։ൃϩʔυϚοϓ
    • ܭը

    • 0.19 (ݱࡏrc) ὎ 0.20 ὎ 1.0 ΛϦϦʔε

    • pandas 1.0

    • API ౚ݁

    • Long Time Support

    • pandas 2.0 (under discussion)

    • Python 3.xͷΈΛαϙʔτ

    • 2࣍ݩҎԼͷσʔλʹಛԽ

    • όοΫΤϯυΛ C++ ʹҠߦ (Apache Arrow)

    View Slide