Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Manage Large Data Sets with Streams

Manage Large Data Sets with Streams

Working with streams sounds scary and complicated but we’ll show you how to leverage streams to process large data imports without having to sell your house to buy RAM. Streams are a way to generalize data for easier processing in a linear way including the ability to seek around the stream. We’ll cover our own tips and tricks we’ve learned along the way to help you dive deep into processing streams.

Streams have been in PHP since back in the 4.x days however we continually see developers trying to iterate on huge data sets and often run out of memory. We’ll show you a better solution instead of “ ini_set(‘memory_limit’,’16GB’);”

Joe Ferguson

May 23, 2019
Tweet

More Decks by Joe Ferguson

Other Decks in Programming

Transcript

  1. Manage Large Data Sets
    with Streams
    Joe Ferguson

    View Slide

  2. Who Am I?
    Joe Ferguson
    Senior Full Stack Developer @ Preteckt
    Twitter: @JoePFerguson
    OSMI Board Member
    The Joindin Foundation & Joindin
    Leadership Team

    View Slide

  3. Agenda
    Streams: What they are and why you shouldn’t cross them
    Searching a 5 million line CSV
    Guzzling Streams with Guzzle

    View Slide

  4. https://www.php.net/manual/en/intro.stream.php
    a stream is a resource object which
    exhibits streamable behavior

    View Slide

  5. Linear Data
    0
    17.5
    35
    52.5
    70
    Record 1 Record 2 Record 3 Record 4
    Age

    View Slide

  6. scheme://target

    View Slide

  7. You’re already using them
    file()
    open()
    fwrite()
    fclose()
    file_get_contents()
    file_put_contents()

    View Slide

  8. Stream Transports

    View Slide

  9. Stream Wrappers

    View Slide

  10. Stream Filters

    View Slide

  11. Stream Context

    View Slide

  12. Size Matters

    View Slide

  13. Size Matters

    View Slide

  14. “I know how to fix it”

    View Slide

  15. “Job’s Done Boss!”

    View Slide

  16. Wait a minute…
    2G
    2G
    2G
    2G 2G
    2G

    View Slide

  17. What’s Burning?

    View Slide

  18. Save the memory…

    View Slide

  19. Now I have this ISO

    View Slide

  20. Streaming People
    (Don’t worry, they’re not real)

    View Slide

  21. file_get_contents()

    View Slide

  22. Reading from pointers

    View Slide

  23. What’s a pointer?

    View Slide

  24. Reading from pointers

    View Slide

  25. Memory Usage

    View Slide

  26. Memory Usage

    View Slide

  27. Double Down

    View Slide

  28. Memory Usage

    View Slide

  29. Rewinding Streams

    View Slide

  30. Rewinding Streams

    View Slide

  31. Rewinding Streams

    View Slide

  32. Rewinding Streams

    View Slide

  33. Rewinding Streams

    View Slide

  34. Rewinding Streams

    View Slide

  35. Seeking Around Streams

    View Slide

  36. Seeking Around Streams

    View Slide

  37. Guzzling Streams

    View Slide

  38. Guzzling Streams

    View Slide

  39. Guzzling Streams

    View Slide

  40. Testing Our Response

    View Slide

  41. Guzzling Streams

    View Slide

  42. Read 100 bytes

    View Slide

  43. Read 100 bytes

    View Slide

  44. Content Type

    View Slide

  45. Content Type

    View Slide

  46. Content Type

    View Slide

  47. Drilling Down

    View Slide

  48. Drilling Down

    View Slide

  49. Inspecting a Post

    View Slide

  50. Interesting!

    View Slide

  51. Interesting!

    View Slide

  52. When to use Streams
    Reading files that may not fit in memory
    Downloading files from a remote system
    Fetching data from APIs

    View Slide

  53. Resources

    View Slide

  54. Joe Ferguson
    Twitter: @JoePFerguson
    Email: [email protected]
    Contact Info:

    View Slide