Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How the Web Works: Lecture 9
Search
Abhinav Sharma
January 09, 2014
Education
0
63
How the Web Works: Lecture 9
This talk was designed for a class (98-135) taught at Carnegie Mellon University in Spring 2010.
Abhinav Sharma
January 09, 2014
Tweet
Share
More Decks by Abhinav Sharma
See All by Abhinav Sharma
How the Web Works: Lecture 5
abhinavsharma
1
73
How the Web Works: Lecture 6
abhinavsharma
0
47
How the Web Works: Lecture 7
abhinavsharma
0
45
How the Web Works: Lecture 8
abhinavsharma
0
110
How the Web Works: Lecture 3
abhinavsharma
0
35
How the Web Works: Lecture 2
abhinavsharma
1
46
How the Web Works: Lecture 1
abhinavsharma
2
120
Other Decks in Education
See All in Education
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visualisation (4019538FNR)
signer
PRO
1
2.4k
検索/ディスプレイ/SNS
takenawa
0
4.8k
The Art of Note Taking
kanaya
1
130
女子商アプリ開発の軌跡
asial_edu
0
390
探究的な学び:Monaca Educationで学ぶプログラミングとちょっとした課題解決
asial_edu
0
380
America and the World
oripsolob
0
510
2025年度春学期 統計学 第2回 統計資料の収集と読み方(講義前配付用) (2025. 4. 17)
akiraasano
PRO
0
140
小さなチャレンジが生んだチームの大きな変化 -私のふりかえり探求の原点
callas1900
0
530
RELC_2025_KYI
otamayuzak
0
110
i-GIP 2025 中高生のみなさんへ資料
202200
0
480
生成AIとの上手な付き合い方【公開版】/ How to Get Along Well with Generative AI (Public Version)
handlename
0
470
アウトプット0のエンジニアが半年でアウトプットしまくった話 With JAWS-UG
masakiokuda
2
300
Featured
See All Featured
Agile that works and the tools we love
rasmusluckow
329
21k
Art, The Web, and Tiny UX
lynnandtonic
299
21k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Balancing Empowerment & Direction
lara
1
370
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
The Straight Up "How To Draw Better" Workshop
denniskardys
233
140k
Six Lessons from altMBA
skipperchong
28
3.8k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
Writing Fast Ruby
sferik
628
61k
Code Review Best Practice
trishagee
68
18k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Documentation Writing (for coders)
carmenintech
71
4.9k
Transcript
None
Lecture 9 Distributed Computing & Scaling
None
Homeworks
Homeworks Overall, I failed =(
Homeworks Overall, I failed =( Should’ve done it in winter
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass Hopefully, 10 by the end
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass Hopefully, 10 by the end I don’t want to fail anyone
Zeliveau Please Start Soon
None
Rankmaniac http://scienceoftheweb.org/15-396/assignments/hw6.pdf
None
None
“Essentially, using nofollow causes us to drop the target links
from our overall graph of the web”
None
None
SSL/TLS That HTTPS business...
None
None
None
None
None
None
Visible to Wireless Network, ISP, Server LAN
None
Let encrypt with a key!
Let encrypt with a key! ENC(K, “MES”) = “NFT” |
DEC(K, “NFT”) = “MES” K = “Shift One Alphabet”
Let encrypt with a key! But how do we share
the key? ENC(K, “MES”) = “NFT” | DEC(K, “NFT”) = “MES” K = “Shift One Alphabet”
Public Key Encryption Insanely Awesomely Brilliant
n = p * q
n = p * q Given These
n = p * q Given These Easy to Compute
n = p * q Given These Easy to Compute
Given This
n = p * q Given These Easy to Compute
Given This Possible but...
RSA
Rivest RSA
Rivest Shamir RSA
Rivest Shamir Adleman RSA
Public Key Encryption
Public Key Encryption Create an Algorithm that...
Public Key Encryption Create an Algorithm that... uses n to
encrypt
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key Keep p & q
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key Keep p & q Heard of PGP?
None
Who are you anyway?
Who are you anyway? Aha, an Imposter!
Who are you anyway? Aha, an Imposter! The Verification Problem
None
None
None
Browsers Preinstalled with some CAs
Browsers Preinstalled with some CAs
None
Install if you trust CMU
None
None
P2P ... and the indexing problem
Client-Server Model
Client-Server Model I can haz music!
None
P2P Model
P2P Model Who has my file?
More Generally Distributed Hash Table Given a Key, get the
Value Stored across computers Google’s Index (GFS) So, how do you find a file?
Ask Everyone
Ask Everyone What Not to Do!
None
Computers (N)
Computers (N) Files (K)
Computers (N) Files (K) 2^m > max{N,K}
8 12 16 Example
1 3 4 5 7 9 12 15
Label Nodes between 0 and 2^m 1 3 4 5
7 9 12 15
1 6 12 2 8 13 4 9 15 5
11 16
Label Keys between 0 and 2^m 1 6 12 2
8 13 4 9 15 5 11 16
Assignment Assign Key K to Node K If Node K
doesn’t exist ... assign to next node 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
1 6 12 2 8 13 4 9 15 5
11 16 1 3 4 5 7 9 12 15
1 6 12 2 8 13 4 9 15 5
11 16 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
Searching For key K ~ For Node K Linear Search
Start at Machine 1, goto next ... so on until found! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
But Wait They’re sorted, seems familiar?
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9 6 8 9 8 9
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Each machine stores address to some others!
Finger Table 1 3 4 5 7 9 1 1
1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 1 3 4
5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
addr(N7 + 1) addr(N7 + 2) addr(N7 + 4) = addr(N7 + 2^(m-1)) 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
addr(N7 + 1) addr(N7 + 2) addr(N7 + 4) = addr(N7 + 2^(m-1)) Can Take short-cuts! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
1 3 4 5 7 9 12 15
1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Make Biggest Jump | Too Low | Use N7’s table
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Make Biggest Jump | Too Low | Use N7’s table
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Halve the remaining ring
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Halve the remaining ring
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Done!
Chord Protocol 1 3 4 5 7 9 1 1
1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance 1 3 4 5 7 9
1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P 1 3 4
5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
... hence closed down! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
... hence closed down! RIAA/MPAA: Oh Noes! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
MapReduce But first, an OCD Programmer
alert("get the lobster"); PutInPot("lobster"); PutInPot("water"); alert("get the chicken"); BoomBoom("chicken"); BoomBoom("coconut");
function Cook( i1, i2, f ) { alert("get the "
+ i1); f(i1); f(i2); } Cook( "lobster", "water", PutInPot ); Cook( "chicken", "coconut", BoomBoom );
Map
var a = [1,2,3]; for (i=0; i<a.length; i++) { a[i]
= a[i] * 2; } for (i=0; i<a.length; i++) { alert(a[i]); }
function map(fmap, a) { for (i = 0; i <
a.length; i++) { a[i] = fmap(a[i]); } } map( function(x){return x*2;}, a ); map( alert, a );
Reduce
function sum(a) { var s = 0; for (i =
0; i < a.length; i++) s += a[i]; return s; } function join(a) { var s = ""; for (i = 0; i < a.length; i++) s += a[i]; return s; }
function reduce(fred, a, init) { var s = init; for
(i = 0; i < a.length; i++) s = fred( s, a[i] ); return s; }
function sum(a) { return reduce( function(a, b){ return a +
b; }, a, 0 ); } function join(a) { return reduce( function(a, b){ return a + b; }, a, "" ); }
Map Reduce
Map [1, 2, 3, 4, 5] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1]
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1] 5
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1] 5 “11111”
“Without understanding functional programming, you can't invent MapReduce. The very
fact that Google invented MapReduce, and Microsoft didn't, says something about why Microsoft is still playing catch up” - Joel Spolsky
Pop Quiz [1, 2, 3, 4, 5] [“odd”, “even”, “odd”,
“even”, “odd”] “oddevenoddevenoddeven”
How is that useful?
Word Count Given a document # occurrences of each word
Let’s try the intuitive way...
None
bigFile is too big? Have two files!
None
None
Lets See that Again
None
BoomBoom("chicken"); BoomBoom("coconut");
BoomBoom("chicken"); BoomBoom("coconut"); Map
BoomBoom("chicken"); BoomBoom("coconut"); Map
BoomBoom("chicken"); BoomBoom("coconut"); Map function reduce(union, [d1,d2], [])
BoomBoom("chicken"); BoomBoom("coconut"); Map Reduce function reduce(union, [d1,d2], [])
foo foo baz bar gor baz goo bar
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor]
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor] foo reducer foo reducer foo reducer foo reducer foo reducer
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor] foo reducer foo reducer foo reducer foo reducer foo reducer 2 2 2 1 1
Who Does What?
Who Does What? User: Write Mapper and Reducer
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm Pros: Generalized, Safe
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm Pros: Generalized, Safe Implementing can be tricky!
None
Abstract, Complex? But you know what...
The Point People talk about scaling ... but now you
know it! Distributing Files Distributing Computation
Higher Level Point This isn’t a CS class... ... but
I’m a CS major =P Its not all HTML/CSS There’s some serious CS here!
http://hadoop.apache.org/ http://www.cloudera.com/resources/?type=Training
Poor Man’s Scaling
Redundancy Replicate across computers Main server balances load Other servers
serve content Also useful for data backups Usually Host Managed
Caching PHP is dynamic ... usually unnecessarily Calculate, cache, reserve
Memoization PHP/memcached http://en.wikipedia.org/wiki/Memcached
Bottlenecks Content Bandwidth Databases External APIs Script busy computing etc...
S3 EC2 http://www.youtube.com/watch?v=Iaxu-NLecm4 http://www.youtube.com/watch?v=bBajLxeKqoY
Homework 7 is out No Class Next Week
None
Photo Credits http://mi9.com/datawallpapers/data/12/993/1217993797/eye-with-black-background_1280x1024.jpg http://www.aemmp.org/site/wp-content/uploads/2009/10/imgname-riaa_training_video_leaked_more_stupid_than_expected-50226711-RIAA.jpg http://jasonjeffrey.files.wordpress.com/2007/09/drm.jpg http://www.sdtimes.com/blog/post/2009/image.axd?picture=2009%2F7%2Fhadoopephant.jpg
None