2016 - Dillon Niederhut - What to do when your ...

August 20, 2016

110

2016 - Dillon Niederhut - What to do when your data is large, but not big

Description
This talk will present strategies in Python for handling data that is too large to fit in memory and/or too slow to process in one thread, but small enough to still fit in one machine.

Abstract
Unless you work at a large internet company, you probably don't have BIG data, but you might have LARGE data. Large data consume an unacceptable amount of time and memory when medium strategies are used, but also incur unnecessary financial and latency costs when big strategies are used. Two basic strategies for handling large data, chunking and parallelization, will be discussed with live coded examples in Python.

Bio
I'm a research scientist currently living in the Bay Area and working in neuroethology, human evolution, and natural language processing. I currently work at D-Lab, where I help researchers apply advances in computation to their research paradigms.

https://youtu.be/g-YCaX3ml2Q

PyBay

August 20, 2016

Tweet

More Decks by PyBay

See All by PyBay

2017 - The Packaging Gradient

2

950

2017 - Building Bridges: Stopping Python 2 without damages

0

670

2017 - Bringing Python 3 to LinkedIn

1

570

2017 - Python Debugging with PUDB

0

730

2017 - Opening up to Open Source

0

260

2017 - A Gentle Introduction to Text Classification with Deep Learning

2

200

2017 - Performant Asynchronous Programming at Quora

1

390

2017 - latus - a Personal Cloud Storage App written in Python

2

530

2017 - Everything You Ever Wanted to Know About Web Authentication in Python

3

650

Other Decks in Programming

See All in Programming

After go func(): Goroutines Through a Beginner’s Eye

0

370

Things You Thought You Didn’t Need To Care About That Have a Big Impact On Your Job

0

220

Server Side Kotlin Meetup vol.16: 内部動作を理解してハイパフォーマンスなサーバサイド Kotlin アプリケーションを書こう

3

180

XP, Testing and ninja testing ZOZ5

3

620

止められない医療アプリ、そっと Swift 6 へ

1

160

SpecKitでどこまでできる？コストはどれくらい？

0

690

Foundation Modelsを実装日本語学習アプリを作ってみた！

0

100

複雑化したリポジトリをなんとかした話　pipenvからuvによるモノレポ構成への移行

satoshi256kbyte

1

1k

monorepo の Go テストをはやくした〜い！~最小の依存解決への道のり~ / faster-testing-of-monorepos

2

470

アメ車でサンノゼを走ってきたよ！

0

220

タスクの特性や不確実性に応じた最適な作業スタイルの選択（ペアプロ・モブプロ・ソロプロ）と実践 / Optimal Work Style Selection: Pair, Mob, or Solo Programming.

3

160

ポスターセッション: 「まっすぐ行って、右！」って言ってラズパイカーを動かしたい〜生成AI × Raspberry Pi Pico × Gradioの試作メモ〜

0

1.3k

Featured

See All Featured

Creating an realtime collaboration tool: Agile Flush - .NET Oxford

32

2.3k

Making Projects Easy

119

6.4k

XXLCSS - How to scale CSS and keep your sanity

248

1.3M

4 Signs Your Business is Dying

185

22k

Learning to Love Humans: Emotional Interface Design

274

41k

Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End

252

21k

ピンチをチャンスに：未来をつくるプロダクトロードマップ #pmconf2020

127

53k

For a Future-Friendly Web

180

9.9k

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

26

3.1k

[RailsConf 2023] Rails as a piece of cake

57

5.9k

Making the Leap to Tech Lead

135

9.6k

The World Runs on Bad Software

PRO

71

11k

Transcript

Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
What to do when your data are large but not big Dillon Niederhut PyBay – the San Francisco Bay Area Python Conference 20 August 2016
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
about this talk • data at github.com/deniederhut/pybay 2016 • python libraries : celery, h5py, numpy, pandas, pymongo • other libraries : mongodb, rabbitmq, sqlite
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
about me • dlab.berkeley.edu • @DLabAtBerkeley
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
size concerns 1 1from xkcd
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
time concerns 2 2always relevant
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
code concerns 3 3thanks Randall!
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
sequential processing
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
parallel processing
Large data in python Dillon Niederhut Introduction Motivation Strategies Closing
contact • dillon.niederhut.us • @dillonniederhut