How to Use In-Memory Streams

0350fb935be160186cd72472c9e5543b?s=47 HayaoSuzuki
August 29, 2020

How to Use In-Memory Streams

PyCon JP 2020

0350fb935be160186cd72472c9e5543b?s=128

HayaoSuzuki

August 29, 2020
Tweet

Transcript

  1. ΠϯϝϞϦʔετϦʔϜ׆༻ज़ How to Use In-Memory Streams Hayao Suzuki PyCon JP

    2020 August 29, 2020
  2. ൃදʹࡍͯ͠ GitHub ʹࢿྉ͕͋Γ·͢ › https://github.com/HayaoSuzuki/pyconjp2020 Twitter ͷϋογϡλά › #pyconjp_1 PyCon

    JP Fellow Slack › #jp-2020-track-1 2 / 27
  3. Who am I ? ͓લ୭Α Name Hayao Suzukiʢླ໦ɹॣʣ Twitter @CardinalXaro

    Work Python Programmer at iRidge, Inc. 3 / 27
  4. Who am I ? Technical Reviewer › Effective Python ୈ

    2 ൛ (O’Reilly Japan) › ಈֶ͔ͯ͠Ϳྔࢠίϯϐϡʔλϓϩάϥϛϯά (O’Reilly Japan) https://xaro.hatenablog.jp/ ʹϦετ͕͋Γ·͢ɻ 4 / 27
  5. Who am I ? Selected Talks › ϨΨγʔ Django ΞϓϦέʔγϣϯͷݱ୅Խ

    (DjangoCongress JP 2018) › SymPy ʹΑΔ਺ࣜॲཧ (PyCon JP 2018) › Python ͱָ͠Ήॳ౳੔਺࿦ (PyCon mini Hiroshima 2019) › ܅͸ cmath Λ஌͍ͬͯΔ͔ (PyCon mini Shizuoka 2020) https://xaro.hatenablog.jp/ ʹϦετ͕͋Γ·͢ɻ 5 / 27
  6. ࠓ೔ͷ໨ඪ ͜Μͳ՝୊Λղܾ͍ͨ͠ʂ › Πϯλʔωοτܦ༝Ͱ਺ GB αΠζͷσʔλΛऔಘ͠ɺCSV ϑΝΠϧʹՃ޻͢Δ › Ϋϥ΢υ্ʹߏஙͨ͠طଘͷγεςϜʹ௥Ճ͢ΔܗͰ࣮૷͢Δ ›

    ຖ೔࣮ߦ͢Δ Ϋϥ΢υαʔϏε͸ैྔ՝ۚ ͳΔ΂͘ਝ଎ʹॲཧ͍ͨ͠ʂ 6 / 27
  7. ࠓ೔ͷ໨ඪ ॲཧͷྲྀΕ › Πϯλʔωοτܦ༝Ͱ਺ GB αΠζͷσʔλΛऔಘ͢Δ › ਺ GB αΠζͷσʔλΛ

    CSV ϑΝΠϧʹՃ޻͢Δ › CSV ϑΝΠϧΛ ZIP ѹॖ͢Δ › ZIP ѹॖσʔλΛΫϥ΢υετϨʔδʹΞοϓϩʔυ͢Δ ෼ੳ › σʔλαΠζ͕େ͖͍ › σʔλͷՃ޻͸୯७ͳॲཧ 7 / 27
  8. ࠓ೔ͷ໨ඪ ϘτϧωοΫ͸Ͳ͔͜ › ZIP ѹॖ͸ͦΕ΄ͲେมͰ͸ͳ͍ › σʔλՃ޻͸୯७ͳॲཧ › ϘτϧωοΫ͸ I/O

    ॲཧʹ͋Γͦ͏ Կͱ͔ͯ͠ I/O ॲཧΛਝ଎ʹॲཧ͍ͨ͠ʂʂʂ 8 / 27
  9. Today’s Theme In-Memory Streams 9 / 27

  10. Stream? ͦ΋ͦ΋ετϦʔϜͬͯԿʁ ετϦʔϜ͸ϑΝΠϧΦϒδΣΫτͰ͋Δɻ 10 / 27

  11. File Object? ϑΝΠϧΦϒδΣΫτͬͯԿʁ › read() ΍ write() ͳͲͷϝιουΛ࣋ͭΦϒδΣΫτ › σΟεΫ্ͷϑΝΠϧ΍ผͷ৔ॴʹ͋ΔετϨʔδɺೖग़ྗػثͱ

    ΍ΓͱΓ͕Ͱ͖Δ 11 / 27
  12. File Object? ϑΝΠϧΦϒδΣΫτͨͪ › ੜόΠφϦϑΝΠϧ › όοϑΝ෇͖όΠφϦϑΝΠϧ › ςΩετϑΝΠϧ 12

    / 27
  13. ࢖͍ํ ςΩετϑΝΠϧ f = open("myfile.txt", "r") όοϑΝ෇͖όΠφϦ f = open("myfile.jpg",

    "rb") 13 / 27
  14. open ؔ਺ͷཪଆ open ͸ԿΛ͍ͯ͠Δͷ͔ʁ OS ͷγεςϜίʔϧ API ΛݺͿ 14 /

    27
  15. open ؔ਺ͷཪଆ ྫɿCSV ʹՃ޻͢Δ with open("events.csv", "w") as csv_file: fieldnames

    = ["title", "started_at", "ended_at"] writer = csv.DictWriter(csv_file, fieldnames) writer.writeheader() writer.writerows(events) 15 / 27
  16. open ؔ਺ͷཪଆ ྫɿWindows › CreateFileʢϑΝΠϧͷΞΫηεݖऔಘʣ › QueryAllInformationFileʢϑΝΠϧ৘ใͷऔಘʣ › WriteFileʢϑΝΠϧ΁ॻ͖ࠐΉʣ ›

    CloseFileʢϑΝΠϧΛด͡Δʣ Process Monitor ܦ༝Ͱ֬ೝͨ͠ɻ 16 / 27
  17. open ؔ਺ͷཪଆ ྫɿUbuntu on WSL › openat ʢϑΝΠϧͷΦʔϓϯʣ › fstatʢϑΝΠϧ৘ใͷऔಘʣ

    › ioctlʢσόΠε੍ޚʣ › lseekʢϑΝΠϧͷγʔΫʣ › writeʢϑΝΠϧ΁ॻ͖ࠐΉʣ › closeʢϑΝΠϧΛด͡Δʣ strace ܦ༝Ͱ֬ೝͨ͠ɻ 17 / 27
  18. ࠷ޙʹস͏ͷ͸୭ͩ ࠷ऴతͳ੒Ռ෺͸Ͳ͜ʹஔ͘ʁ › ϑΝΠϧΛϩʔΧϧʹอଘ͢Δͷ͕ΰʔϧͰ͸ͳ͍ › ϑΝΠϧΛ AWS S3 ͳͲͷ֎෦ʹஔ͖͍ͨ ϩʔΧϧσόΠεʹϑΝΠϧΛॻ͖ࠐΈͨ͘ͳ͍ʂ

    18 / 27
  19. Today’s Theme In-Memory Streams 19 / 27

  20. ΠϯϝϞϦʔετϦʔϜ ΠϯϝϞϦʔετϦʔϜͱ͸ › str ΍ bytes ΛϑΝΠϧΦϒδΣΫτͷΑ͏ʹѻ͑Δ › ಡΈॻ͖ՄೳɺϥϯμϜΞΫηεՄೳ 20

    / 27
  21. StringIO StringIO ςΩετϑΝΠϧͷͨΊͷΠϯϝϞϦετϦʔϜ ྫɿCSV Λ StringIO ͰऔΓѻ͏ import io with

    io.StringIO() as csv_file: fieldnames = ["title", "started_at", "ended_at"] writer = csv.DictWriter(csv_file, fieldnames) writer.writeheader() writer.writerows(events) 21 / 27
  22. BytesIO BytesIO όοϑΝ෇͖όΠφϦϑΝΠϧͷͨΊͷΠϯϝϞϦετϦʔϜ ྫɿPNG Λ BytesIO ͰऔΓѻ͏ import io with

    io.BytesIO(png_bytes) as f: png_header = f.read(8) print(png_header) # b'\x89PNG\r\n\x1a\n' 22 / 27
  23. ෮शɿࠓ೔ͷ໨ඪ ॲཧͷྲྀΕ › Πϯλʔωοτܦ༝Ͱ਺ GB αΠζͷσʔλΛऔಘ͢Δ › ਺ GB αΠζͷσʔλΛ

    CSV ϑΝΠϧʹՃ޻͢Δ › CSV ϑΝΠϧΛ ZIP ѹॖ͢Δ › ZIP ѹॖσʔλΛΫϥ΢υετϨʔδʹΞοϓϩʔυ͢Δ 23 / 27
  24. σʔλΛΠϯλʔωοτܦ༝Ͱऔಘ͢Δ ྫɿConnpass API Λίʔϧ͢Δ with urllib.request.urlopen(url) as response: events =

    json.load(response)["events"] 24 / 27
  25. σʔλΛՃ޻͢Δ ྫɿAPI ͷऔಘ݁ՌΛ CSV ʹ͢Δ with io.StringIO() as ts: header

    = ["title", "started_at", "ended_at"] writer = csv.DictWriter(ts, fieldnames=header) writer.writeheader() writer.writerows(events) 25 / 27
  26. σʔλΛѹॖ&Ξοϓϩʔυ ྫɿZIP ʹѹॖͯ͠ AWS S3 ʹΞοϓϩʔυ with io.BytesIO() as bs:

    with zipfile.ZipFile(bytes_stream, "w") as zf: zf.writestr("events.csv", ts.getvalue()) bs.seek(0) # ϑΝΠϧγʔΫ͕ϙΠϯτ s3.upload_fileobj(bs, "bucket", "events.zip") 26 / 27
  27. Conclusion ·ͱΊ › io Ϟδϡʔϧʹ͸ΠϯϝϞϦʔετϦʔϜؚ͕·ΕΔɻ › str ΍ bytes ΛϑΝΠϧΦϒδΣΫτͷΑ͏ʹѻ͏͜ͱ͕Ͱ͖Δɻ

    › ௨ৗͷ open ͱҟͳΓγεςϜίʔϧ͕ݺ͹Εͳ͍ɻ › σΟεΫ΁ͷ I/O ΛݮΒ͍ͨ͠ɺ·ͨ͸Ͱ͖ͳ͍ঢ়گԼͰͷར༻ ͕࠷దͰ͋Δɻ io ϞδϡʔϧΛօ༷ͷಓ۩ശʹೖΕ͍ͯͩ͘͞ʂ 27 / 27