Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How the Web Works: Lecture 9
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Abhinav Sharma
January 09, 2014
Education
0
64
How the Web Works: Lecture 9
This talk was designed for a class (98-135) taught at Carnegie Mellon University in Spring 2010.
Abhinav Sharma
January 09, 2014
Tweet
Share
More Decks by Abhinav Sharma
See All by Abhinav Sharma
How the Web Works: Lecture 5
abhinavsharma
1
73
How the Web Works: Lecture 6
abhinavsharma
0
48
How the Web Works: Lecture 7
abhinavsharma
0
45
How the Web Works: Lecture 8
abhinavsharma
0
110
How the Web Works: Lecture 3
abhinavsharma
0
38
How the Web Works: Lecture 2
abhinavsharma
1
48
How the Web Works: Lecture 1
abhinavsharma
2
120
Other Decks in Education
See All in Education
SJRC 2526
cbtlibrary
0
190
20251119 如果是勇者欣美爾的話, 他會怎麼做? 東海資工
pichuang
0
170
子どもが自立した学習者となるデジタルの活用について
naokikato
PRO
0
180
Web Application Frameworks - Lecture 3 - Web Technologies (1019888BNR)
signer
PRO
0
3.2k
2025年度伊藤正彦ゼミ紹介
imash
0
160
1202
cbtlibrary
0
200
Evaluation Methods - Lecture 6 - Human-Computer Interaction (1023841ANR)
signer
PRO
0
1.3k
栃木にいても「だいじ」だっぺ〜! 栃木&全国アジャイルコミュニティへの参加・運営の魅力
sasakendayo
1
130
Semantic Web and Web 3.0 - Lecture 9 - Web Technologies (1019888BNR)
signer
PRO
2
3.2k
Introdución ás redes
irocho
0
530
多様なメンター、多様な基準
yasulab
PRO
5
19k
学習指導要領と解説に基づく学習内容の構造化の試み / Course of study Commentary LOD JAET 2025
masao
0
120
Featured
See All Featured
WCS-LA-2024
lcolladotor
0
430
Building Applications with DynamoDB
mza
96
6.9k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
900
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.3k
Navigating Weather and Climate Data
rabernat
0
77
RailsConf 2023
tenderlove
30
1.3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
48
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
710
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Designing for humans not robots
tammielis
254
26k
Faster Mobile Websites
deanohume
310
31k
Transcript
None
Lecture 9 Distributed Computing & Scaling
None
Homeworks
Homeworks Overall, I failed =(
Homeworks Overall, I failed =( Should’ve done it in winter
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass Hopefully, 10 by the end
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass Hopefully, 10 by the end I don’t want to fail anyone
Zeliveau Please Start Soon
None
Rankmaniac http://scienceoftheweb.org/15-396/assignments/hw6.pdf
None
None
“Essentially, using nofollow causes us to drop the target links
from our overall graph of the web”
None
None
SSL/TLS That HTTPS business...
None
None
None
None
None
None
Visible to Wireless Network, ISP, Server LAN
None
Let encrypt with a key!
Let encrypt with a key! ENC(K, “MES”) = “NFT” |
DEC(K, “NFT”) = “MES” K = “Shift One Alphabet”
Let encrypt with a key! But how do we share
the key? ENC(K, “MES”) = “NFT” | DEC(K, “NFT”) = “MES” K = “Shift One Alphabet”
Public Key Encryption Insanely Awesomely Brilliant
n = p * q
n = p * q Given These
n = p * q Given These Easy to Compute
n = p * q Given These Easy to Compute
Given This
n = p * q Given These Easy to Compute
Given This Possible but...
RSA
Rivest RSA
Rivest Shamir RSA
Rivest Shamir Adleman RSA
Public Key Encryption
Public Key Encryption Create an Algorithm that...
Public Key Encryption Create an Algorithm that... uses n to
encrypt
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key Keep p & q
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key Keep p & q Heard of PGP?
None
Who are you anyway?
Who are you anyway? Aha, an Imposter!
Who are you anyway? Aha, an Imposter! The Verification Problem
None
None
None
Browsers Preinstalled with some CAs
Browsers Preinstalled with some CAs
None
Install if you trust CMU
None
None
P2P ... and the indexing problem
Client-Server Model
Client-Server Model I can haz music!
None
P2P Model
P2P Model Who has my file?
More Generally Distributed Hash Table Given a Key, get the
Value Stored across computers Google’s Index (GFS) So, how do you find a file?
Ask Everyone
Ask Everyone What Not to Do!
None
Computers (N)
Computers (N) Files (K)
Computers (N) Files (K) 2^m > max{N,K}
8 12 16 Example
1 3 4 5 7 9 12 15
Label Nodes between 0 and 2^m 1 3 4 5
7 9 12 15
1 6 12 2 8 13 4 9 15 5
11 16
Label Keys between 0 and 2^m 1 6 12 2
8 13 4 9 15 5 11 16
Assignment Assign Key K to Node K If Node K
doesn’t exist ... assign to next node 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
1 6 12 2 8 13 4 9 15 5
11 16 1 3 4 5 7 9 12 15
1 6 12 2 8 13 4 9 15 5
11 16 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
Searching For key K ~ For Node K Linear Search
Start at Machine 1, goto next ... so on until found! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
But Wait They’re sorted, seems familiar?
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9 6 8 9 8 9
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Each machine stores address to some others!
Finger Table 1 3 4 5 7 9 1 1
1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 1 3 4
5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
addr(N7 + 1) addr(N7 + 2) addr(N7 + 4) = addr(N7 + 2^(m-1)) 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
addr(N7 + 1) addr(N7 + 2) addr(N7 + 4) = addr(N7 + 2^(m-1)) Can Take short-cuts! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
1 3 4 5 7 9 12 15
1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Make Biggest Jump | Too Low | Use N7’s table
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Make Biggest Jump | Too Low | Use N7’s table
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Halve the remaining ring
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Halve the remaining ring
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Done!
Chord Protocol 1 3 4 5 7 9 1 1
1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance 1 3 4 5 7 9
1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P 1 3 4
5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
... hence closed down! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
... hence closed down! RIAA/MPAA: Oh Noes! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
MapReduce But first, an OCD Programmer
alert("get the lobster"); PutInPot("lobster"); PutInPot("water"); alert("get the chicken"); BoomBoom("chicken"); BoomBoom("coconut");
function Cook( i1, i2, f ) { alert("get the "
+ i1); f(i1); f(i2); } Cook( "lobster", "water", PutInPot ); Cook( "chicken", "coconut", BoomBoom );
Map
var a = [1,2,3]; for (i=0; i<a.length; i++) { a[i]
= a[i] * 2; } for (i=0; i<a.length; i++) { alert(a[i]); }
function map(fmap, a) { for (i = 0; i <
a.length; i++) { a[i] = fmap(a[i]); } } map( function(x){return x*2;}, a ); map( alert, a );
Reduce
function sum(a) { var s = 0; for (i =
0; i < a.length; i++) s += a[i]; return s; } function join(a) { var s = ""; for (i = 0; i < a.length; i++) s += a[i]; return s; }
function reduce(fred, a, init) { var s = init; for
(i = 0; i < a.length; i++) s = fred( s, a[i] ); return s; }
function sum(a) { return reduce( function(a, b){ return a +
b; }, a, 0 ); } function join(a) { return reduce( function(a, b){ return a + b; }, a, "" ); }
Map Reduce
Map [1, 2, 3, 4, 5] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1]
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1] 5
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1] 5 “11111”
“Without understanding functional programming, you can't invent MapReduce. The very
fact that Google invented MapReduce, and Microsoft didn't, says something about why Microsoft is still playing catch up” - Joel Spolsky
Pop Quiz [1, 2, 3, 4, 5] [“odd”, “even”, “odd”,
“even”, “odd”] “oddevenoddevenoddeven”
How is that useful?
Word Count Given a document # occurrences of each word
Let’s try the intuitive way...
None
bigFile is too big? Have two files!
None
None
Lets See that Again
None
BoomBoom("chicken"); BoomBoom("coconut");
BoomBoom("chicken"); BoomBoom("coconut"); Map
BoomBoom("chicken"); BoomBoom("coconut"); Map
BoomBoom("chicken"); BoomBoom("coconut"); Map function reduce(union, [d1,d2], [])
BoomBoom("chicken"); BoomBoom("coconut"); Map Reduce function reduce(union, [d1,d2], [])
foo foo baz bar gor baz goo bar
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor]
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor] foo reducer foo reducer foo reducer foo reducer foo reducer
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor] foo reducer foo reducer foo reducer foo reducer foo reducer 2 2 2 1 1
Who Does What?
Who Does What? User: Write Mapper and Reducer
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm Pros: Generalized, Safe
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm Pros: Generalized, Safe Implementing can be tricky!
None
Abstract, Complex? But you know what...
The Point People talk about scaling ... but now you
know it! Distributing Files Distributing Computation
Higher Level Point This isn’t a CS class... ... but
I’m a CS major =P Its not all HTML/CSS There’s some serious CS here!
http://hadoop.apache.org/ http://www.cloudera.com/resources/?type=Training
Poor Man’s Scaling
Redundancy Replicate across computers Main server balances load Other servers
serve content Also useful for data backups Usually Host Managed
Caching PHP is dynamic ... usually unnecessarily Calculate, cache, reserve
Memoization PHP/memcached http://en.wikipedia.org/wiki/Memcached
Bottlenecks Content Bandwidth Databases External APIs Script busy computing etc...
S3 EC2 http://www.youtube.com/watch?v=Iaxu-NLecm4 http://www.youtube.com/watch?v=bBajLxeKqoY
Homework 7 is out No Class Next Week
None
Photo Credits http://mi9.com/datawallpapers/data/12/993/1217993797/eye-with-black-background_1280x1024.jpg http://www.aemmp.org/site/wp-content/uploads/2009/10/imgname-riaa_training_video_leaked_more_stupid_than_expected-50226711-RIAA.jpg http://jasonjeffrey.files.wordpress.com/2007/09/drm.jpg http://www.sdtimes.com/blog/post/2009/image.axd?picture=2009%2F7%2Fhadoopephant.jpg
None