Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How the Web Works: Lecture 9
Search
Abhinav Sharma
January 09, 2014
Education
0
64
How the Web Works: Lecture 9
This talk was designed for a class (98-135) taught at Carnegie Mellon University in Spring 2010.
Abhinav Sharma
January 09, 2014
Tweet
Share
More Decks by Abhinav Sharma
See All by Abhinav Sharma
How the Web Works: Lecture 5
abhinavsharma
1
73
How the Web Works: Lecture 6
abhinavsharma
0
48
How the Web Works: Lecture 7
abhinavsharma
0
45
How the Web Works: Lecture 8
abhinavsharma
0
110
How the Web Works: Lecture 3
abhinavsharma
0
38
How the Web Works: Lecture 2
abhinavsharma
1
48
How the Web Works: Lecture 1
abhinavsharma
2
120
Other Decks in Education
See All in Education
HTML5 and the Open Web Platform - Lecture 3 - Web Technologies (1019888BNR)
signer
PRO
2
3.1k
3Dプリンタでロボット作るよ#5_ロボット向け3Dプリンタ材料
shiba_8ro
0
130
HCI Research Methods - Lecture 7 - Human-Computer Interaction (1023841ANR)
signer
PRO
0
1.3k
Going over the Edge
jonoalderson
0
110
焦りと不安を、技術力に変える方法 - 新卒iOSエンジニアの失敗談と成長のフレームワーク
hypebeans
1
610
Microsoft Office 365
matleenalaakso
0
2k
授業レポート:共感と協調のリーダーシップ(2025年上期)
jibunal
1
180
仏教の源流からの奈良県中南和_奈良まほろば館‗飛鳥・藤原DAO/asuka-fujiwara_Saraswati
tkimura12
0
170
SJRC 2526
cbtlibrary
0
170
IKIGAI World Fes:program
tsutsumi
1
2.6k
Master of Applied Science & Engineering: Computer Science & Master of Science in Applied Informatics: Artificial Intelligence and Data Science
signer
PRO
0
870
ThingLink
matleenalaakso
28
4.2k
Featured
See All Featured
Evolving SEO for Evolving Search Engines
ryanjones
0
73
Ruling the World: When Life Gets Gamed
codingconduct
0
100
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
17
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
170
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
750
Being A Developer After 40
akosma
91
590k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.3k
So, you think you're a good person
axbom
PRO
0
1.8k
エンジニアに許された特別な時間の終わり
watany
106
220k
Done Done
chrislema
186
16k
Tell your own story through comics
letsgokoyo
0
760
Transcript
None
Lecture 9 Distributed Computing & Scaling
None
Homeworks
Homeworks Overall, I failed =(
Homeworks Overall, I failed =( Should’ve done it in winter
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass Hopefully, 10 by the end
Homeworks Overall, I failed =( Should’ve done it in winter
Get 3 Points to Pass Hopefully, 10 by the end I don’t want to fail anyone
Zeliveau Please Start Soon
None
Rankmaniac http://scienceoftheweb.org/15-396/assignments/hw6.pdf
None
None
“Essentially, using nofollow causes us to drop the target links
from our overall graph of the web”
None
None
SSL/TLS That HTTPS business...
None
None
None
None
None
None
Visible to Wireless Network, ISP, Server LAN
None
Let encrypt with a key!
Let encrypt with a key! ENC(K, “MES”) = “NFT” |
DEC(K, “NFT”) = “MES” K = “Shift One Alphabet”
Let encrypt with a key! But how do we share
the key? ENC(K, “MES”) = “NFT” | DEC(K, “NFT”) = “MES” K = “Shift One Alphabet”
Public Key Encryption Insanely Awesomely Brilliant
n = p * q
n = p * q Given These
n = p * q Given These Easy to Compute
n = p * q Given These Easy to Compute
Given This
n = p * q Given These Easy to Compute
Given This Possible but...
RSA
Rivest RSA
Rivest Shamir RSA
Rivest Shamir Adleman RSA
Public Key Encryption
Public Key Encryption Create an Algorithm that...
Public Key Encryption Create an Algorithm that... uses n to
encrypt
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key Keep p & q
Public Key Encryption Create an Algorithm that... uses n to
encrypt but needs p & q to decrypt Publish n as public key Keep p & q Heard of PGP?
None
Who are you anyway?
Who are you anyway? Aha, an Imposter!
Who are you anyway? Aha, an Imposter! The Verification Problem
None
None
None
Browsers Preinstalled with some CAs
Browsers Preinstalled with some CAs
None
Install if you trust CMU
None
None
P2P ... and the indexing problem
Client-Server Model
Client-Server Model I can haz music!
None
P2P Model
P2P Model Who has my file?
More Generally Distributed Hash Table Given a Key, get the
Value Stored across computers Google’s Index (GFS) So, how do you find a file?
Ask Everyone
Ask Everyone What Not to Do!
None
Computers (N)
Computers (N) Files (K)
Computers (N) Files (K) 2^m > max{N,K}
8 12 16 Example
1 3 4 5 7 9 12 15
Label Nodes between 0 and 2^m 1 3 4 5
7 9 12 15
1 6 12 2 8 13 4 9 15 5
11 16
Label Keys between 0 and 2^m 1 6 12 2
8 13 4 9 15 5 11 16
Assignment Assign Key K to Node K If Node K
doesn’t exist ... assign to next node 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
1 6 12 2 8 13 4 9 15 5
11 16 1 3 4 5 7 9 12 15
1 6 12 2 8 13 4 9 15 5
11 16 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
Searching For key K ~ For Node K Linear Search
Start at Machine 1, goto next ... so on until found! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
But Wait They’re sorted, seems familiar?
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9 6 8 9
Binary Search Is 8 in the list? What position is
it? 1 3 5 6 8 9 1 3 5 6 8 9 6 8 9 8 9
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Each machine stores address to some others!
Finger Table 1 3 4 5 7 9 1 1
1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 1 3 4
5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
addr(N7 + 1) addr(N7 + 2) addr(N7 + 4) = addr(N7 + 2^(m-1)) 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
Finger Table Total Machines (2^m) = 8 Machine N7 stores:
addr(N7 + 1) addr(N7 + 2) addr(N7 + 4) = addr(N7 + 2^(m-1)) Can Take short-cuts! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16
1 3 4 5 7 9 12 15
1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Make Biggest Jump | Too Low | Use N7’s table
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Make Biggest Jump | Too Low | Use N7’s table
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Halve the remaining ring
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Halve the remaining ring
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15
16 1 6 12 2 8 13 4 9 15
5 11 1 3 4 5 7 9 12 15 Done!
Chord Protocol 1 3 4 5 7 9 1 1
1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance 1 3 4 5 7 9
1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P 1 3 4
5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
... hence closed down! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
Chord Protocol log(N) Performance Fully Decentralizes P2P Napster was Centralized
... hence closed down! RIAA/MPAA: Oh Noes! 1 3 4 5 7 9 1 1 1 6 12 2 8 13 4 9 15 5 11 16 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
MapReduce But first, an OCD Programmer
alert("get the lobster"); PutInPot("lobster"); PutInPot("water"); alert("get the chicken"); BoomBoom("chicken"); BoomBoom("coconut");
function Cook( i1, i2, f ) { alert("get the "
+ i1); f(i1); f(i2); } Cook( "lobster", "water", PutInPot ); Cook( "chicken", "coconut", BoomBoom );
Map
var a = [1,2,3]; for (i=0; i<a.length; i++) { a[i]
= a[i] * 2; } for (i=0; i<a.length; i++) { alert(a[i]); }
function map(fmap, a) { for (i = 0; i <
a.length; i++) { a[i] = fmap(a[i]); } } map( function(x){return x*2;}, a ); map( alert, a );
Reduce
function sum(a) { var s = 0; for (i =
0; i < a.length; i++) s += a[i]; return s; } function join(a) { var s = ""; for (i = 0; i < a.length; i++) s += a[i]; return s; }
function reduce(fred, a, init) { var s = init; for
(i = 0; i < a.length; i++) s = fred( s, a[i] ); return s; }
function sum(a) { return reduce( function(a, b){ return a +
b; }, a, 0 ); } function join(a) { return reduce( function(a, b){ return a + b; }, a, "" ); }
Map Reduce
Map [1, 2, 3, 4, 5] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1]
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1] 5
Map [1, 2, 3, 4, 5] [2, 4, 6, 8
, 10] [One, Two, Three, Four , Five] Reduce [1, 1, 1, 1, 1] 5 “11111”
“Without understanding functional programming, you can't invent MapReduce. The very
fact that Google invented MapReduce, and Microsoft didn't, says something about why Microsoft is still playing catch up” - Joel Spolsky
Pop Quiz [1, 2, 3, 4, 5] [“odd”, “even”, “odd”,
“even”, “odd”] “oddevenoddevenoddeven”
How is that useful?
Word Count Given a document # occurrences of each word
Let’s try the intuitive way...
None
bigFile is too big? Have two files!
None
None
Lets See that Again
None
BoomBoom("chicken"); BoomBoom("coconut");
BoomBoom("chicken"); BoomBoom("coconut"); Map
BoomBoom("chicken"); BoomBoom("coconut"); Map
BoomBoom("chicken"); BoomBoom("coconut"); Map function reduce(union, [d1,d2], [])
BoomBoom("chicken"); BoomBoom("coconut"); Map Reduce function reduce(union, [d1,d2], [])
foo foo baz bar gor baz goo bar
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor]
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor] foo reducer foo reducer foo reducer foo reducer foo reducer
foo foo baz bar gor baz goo bar foo foo
baz bar gor baz goo bar Mapper Mapper foo 1 foo 1 baz 1 bar 1 gor 1 baz 1 goo 1 bar 1 Bucket by Key [foo, foo] [baz, baz] [bar, bar] [goo] [gor] foo reducer foo reducer foo reducer foo reducer foo reducer 2 2 2 1 1
Who Does What?
Who Does What? User: Write Mapper and Reducer
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm Pros: Generalized, Safe
Who Does What? User: Write Mapper and Reducer Hadoop: Splitting,
Bucketing Cons: Restricted Paradigm Pros: Generalized, Safe Implementing can be tricky!
None
Abstract, Complex? But you know what...
The Point People talk about scaling ... but now you
know it! Distributing Files Distributing Computation
Higher Level Point This isn’t a CS class... ... but
I’m a CS major =P Its not all HTML/CSS There’s some serious CS here!
http://hadoop.apache.org/ http://www.cloudera.com/resources/?type=Training
Poor Man’s Scaling
Redundancy Replicate across computers Main server balances load Other servers
serve content Also useful for data backups Usually Host Managed
Caching PHP is dynamic ... usually unnecessarily Calculate, cache, reserve
Memoization PHP/memcached http://en.wikipedia.org/wiki/Memcached
Bottlenecks Content Bandwidth Databases External APIs Script busy computing etc...
S3 EC2 http://www.youtube.com/watch?v=Iaxu-NLecm4 http://www.youtube.com/watch?v=bBajLxeKqoY
Homework 7 is out No Class Next Week
None
Photo Credits http://mi9.com/datawallpapers/data/12/993/1217993797/eye-with-black-background_1280x1024.jpg http://www.aemmp.org/site/wp-content/uploads/2009/10/imgname-riaa_training_video_leaked_more_stupid_than_expected-50226711-RIAA.jpg http://jasonjeffrey.files.wordpress.com/2007/09/drm.jpg http://www.sdtimes.com/blog/post/2009/image.axd?picture=2009%2F7%2Fhadoopephant.jpg
None