Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
big data
Search
ngarneau
March 09, 2012
Programming
5
400
big data
big data keynote at the opencode quebec, introducing cassandra, hadoop and pig.
ngarneau
March 09, 2012
Tweet
Share
More Decks by ngarneau
See All by ngarneau
Introduction au machine learning avec Scitkit-learn
ngarneau
0
46
Mocks, stubs & seams
ngarneau
0
110
Other Decks in Programming
See All in Programming
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
16
5.3k
gunshi
kazupon
1
140
Kotlin Multiplatform Meetup - Compose Multiplatform 외부 의존성 아키텍처 설계부터 운영까지
wisemuji
0
170
クラウドに依存しないS3を使った開発術
simesaba80
0
220
Data-Centric Kaggle
isax1015
1
250
Deno Tunnel を使ってみた話
kamekyame
0
320
QAフローを最適化し、品質水準を満たしながらリリースまでの期間を最短化する #RSGT2026
shibayu36
0
2k
AI Agent Tool のためのバックエンドアーキテクチャを考える #encraft
izumin5210
6
1.6k
Python札幌 LT資料
t3tra
7
1.1k
Grafana:建立系統全知視角的捷徑
blueswen
0
280
AI時代を生き抜く 新卒エンジニアの生きる道
coconala_engineer
1
530
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
1
220
Featured
See All Featured
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Tell your own story through comics
letsgokoyo
1
780
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
1
360
Crafting Experiences
bethany
0
29
Test your architecture with Archunit
thirion
1
2.1k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
0
1k
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
[SF Ruby Conf 2025] Rails X
palkan
0
710
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Making the Leap to Tech Lead
cromwellryan
135
9.7k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Transcript
big data
cassandra - Facebook - 2007. - Apache - 2008. -
Netflix, Digg, Twitter, Rackspace...
cassandra - non-relationnal - schema-less - open-source - horizontally scalable
- easy replication - large datasets
cassandra - datacenters - «no single point of failure».
cassandra data model - no joins (maybe joints, we don’t
know as of version 1.0.9..) - denormalization
cassandra data model - keyspace - column family - row
key - super column - column / value
cassandra data model application = { users = { ‘ngarneau’:
{ ‘first_name’: ‘nicolas’, ‘last_name’: ‘garneau’ } } }
cassandra data model application = { users = { ‘ngarneau’:
{ ‘first_name’: ‘nicolas’, ‘last_name’: ‘garneau’ } } } keyspace
cassandra data model application = { users = { ‘ngarneau’:
{ ‘first_name’: ‘nicolas’, ‘last_name’: ‘garneau’ } } } keyspace column family
cassandra data model application = { users = { ‘ngarneau’:
{ ‘first_name’: ‘nicolas’, ‘last_name’: ‘garneau’ } } } keyspace column family row key
cassandra data model application = { users = { ‘ngarneau’:
{ ‘first_name’: ‘nicolas’, ‘last_name’: ‘garneau’ } } } keyspace column family row key column
cassandra data model application = { users = { ‘ngarneau’:
{ ‘first_name’: ‘nicolas’, ‘last_name’: ‘garneau’ } } } keyspace column family row key column value
cassandra keep in mind memory disk memtable commit log
cassandra keep in mind memory disk memtable commit log
cassandra keep in mind memory disk memtable commit log
cassandra keep in mind memory disk memtable commit log memtable
cassandra keep in mind memory disk memtable commit log memtable
memtable
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables SSTables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables SSTables SSTables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables SSTables SSTables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables SSTables SSTables SSTables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables SSTables SSTables SSTables SSTables
cassandra keep in mind memory disk memtable commit log memtable
memtable memtables SSTables SSTables SSTables SSTables SSTables SSTables
hadoop - Yahoo! - 2006. - Apache - 2008.
hadoop - mapreduce - hadoop distributed filesystem
hadoop mapreduce - map - reduce
hadoop mapreduce
hadoop mapreduce
hadoop mapreduce
hadoop HDFS
hadoop HDFS data
hadoop HDFS hadoop data
hadoop HDFS hadoop data
hadoop HDFS hadoop data hadoop
hadoop HDFS hadoop data hadoop hadoop
hadoop HDFS hadoop data hadoop hadoop data
hadoop HDFS hadoop data hadoop hadoop data data
hadoop HDFS hadoop data hadoop hadoop data data data
hadoop HDFS hadoop data hadoop hadoop data data data
hadoop HDFS hadoop data hadoop hadoop data data data
hadoop HDFS hadoop data hadoop hadoop data data data cassandra
hadoop HDFS hadoop data hadoop hadoop data data data cassandra
cassandra
hadoop HDFS hadoop data hadoop hadoop data data data cassandra
cassandra cassandra
hadoop keep in mind - business intelligence - machine learning
- collective intelligence
pig - Yahoo! - 2007. - Apache - 2008.
pig - pigs eat anything. - pigs live anywhere. -
pigs are domestic. - pigs fly.
pig keep in mind
let’s play! https://
[email protected]
/ngarneau/opencode.git
let’s play! dataset Salons = { ’1’: { ‘id’: 1,
‘attendants’: 47, ‘name’: ‘Salon Laval’, ‘year’: 2010 } } Commandes = { ’1’: { ‘amount’: 799, ‘salon’: 1 } }
let’s play! we want to know what is the correlation
between the number of attendants and the total revenues by salon.