to Numpy & Numba A brief look at Modin for in-RAM faster Pandas ops What does Covid 19 do to the (UK) economy? Today’s goal By [ian]@ianozsvald[.com] Ian Ozsvald
fallback Young project, drop-in replacement Uses Ray for parallel computation Easy to experiment with Modin By [ian]@ianozsvald[.com] Ian Ozsvald
Swifter multicore See blog for my classes, also Thoughts & Jobs email list I’d love a postcard if you learned something new Summary By [ian]@ianozsvald[.com] Ian Ozsvald
& lazy computation New string dtype (RAM efficient) See article (single laptop, billions of samples) -> Vaex By [ian]@ianozsvald[.com] Ian Ozsvald https://towardsdatascience.com/ml-impossible-train-a-1-billion-sample-model-in-20- minutes-with-vaex-and-scikit-learn-on-your-9e2968e6f385
RAM Probably only single core, built for in-RAM computation Complex 10yr codebase, hard to optimise When does Pandas get smelly? By [ian]@ianozsvald[.com] Ian Ozsvald
for Pandas – row blocks, not cols Dask Distributed DataFrame By [ian]@ianozsvald[.com] Ian Ozsvald https://dask.readthedocs.io/en/latest/dataframe.html
Lots of docs & help on StackOverflow Great for 1 or n machines for bigger-than-RAM tasks Give Workers lots of RAM (else they die!) Dask Distributed DataFrame By [ian]@ianozsvald[.com] Ian Ozsvald