Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Architectures - Lecture 2 - Web Technologies (1019888BNR)

Web Architectures - Lecture 2 - Web Technologies (1019888BNR)

This lecture forms part of the course Web Technologies given at the Vrije Universiteit Brussel.

Beat Signer
PRO

October 04, 2022
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005
    Web Technologies
    Web Architectures
    Prof. Beat Signer
    Department of Computer Science
    Vrije Universiteit Brussel
    beatsigner.com

    View Slide

  2. Beat Signer - Department of Computer Science - [email protected] 2
    October 4, 2022
    Basic Client-Server Web Architecture
    ▪ Effect of typing http://www.vub.be in the browser bar
    (1) use a Domain Name Service (DNS) to get the IP address for
    www.vub.ac.be (answer 134.184.0.178)
    (2) create a TCP connection to 134.184.0.178
    (3) send an HTTP request message over the TCP connection
    (4) visualise the received HTTP response message in the browser
    Internet
    Client Server
    HTTP Request
    HTTP Response

    View Slide

  3. Beat Signer - Department of Computer Science - [email protected] 3
    October 4, 2022
    Web Server
    ▪ Tasks of a web server
    (1) setup connection
    (2) receive and process
    HTTP request
    (3) fetch resource
    (4) create and send
    HTTP response
    (5) logging
    ▪ The most prominent web servers are nginx and
    the Apache HTTP Server
    ▪ A lot of devices have an embedded web server
    ▪ printers, WLAN routers, TVs, ...
    Worldwide Web Servers, https://news.netcraft.com

    View Slide

  4. Beat Signer - Department of Computer Science - [email protected] 4
    October 4, 2022
    Example HTTP Request Message
    GET / HTTP/1.1
    Host: www.vub.be
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0)
    Gecko/20100101 Firefox/103.0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: en-GB,en;q=0.5
    Accept-Encoding: gzip, deflate, br
    Connection: keep-alive
    Upgrade-Insecure-Requests: 1
    ...

    View Slide

  5. Beat Signer - Department of Computer Science - [email protected] 5
    October 4, 2022
    Example HTTP Response Message
    HTTP/1.1 200 OK
    Date: Mon, 03 Oct 2022 14:39:43 GMT
    Content-Encoding: gzip
    Content-Language: en
    Content-Type: text/html; charset=UTF-8
    Cache-Control: public, max-age=10900
    Keep-Alive: timeout=15, max=100
    Etag: "1633600847-1 "
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    ...



    ...
    Vrije Universiteit Brussel | Redelijk eigenzinnig

    ...

    View Slide

  6. Beat Signer - Department of Computer Science - [email protected] 6
    October 4, 2022
    HTTP Protocol
    ▪ Request/response communication model
    ▪ HTTP Request
    ▪ HTTP Response
    ▪ Communication always has to be initiated by the client
    ▪ Stateless protocol (no sessions)
    ▪ HTTP can be used on top of various reliable protocols
    ▪ TCP is by far the most commonly used one
    ▪ runs on TCP port 80 by default
    ▪ HTTPS scheme used for encrypted connections

    View Slide

  7. Beat Signer - Department of Computer Science - [email protected] 7
    October 4, 2022
    Uniform Resource Identifier (URI)
    ▪ A Uniform Resource Identifier (URI) uniquely
    identifies a resource
    ▪ There are two types of URIs
    ▪ Uniform Resource Locator (URL)
    - contains information about the exact location of a resource
    - consists of a scheme, a host and the path (resource name)
    - e.g. https://vub.academia.edu/BeatSigner
    - problem: the URL changes if resource is moved!
    • Persistent Uniform Resource Locators (PURLs) [https://archive.org/services/purl/]
    ▪ Uniform Resource Name (URN)
    - unique and location independent name for a resource
    - consists of a scheme name, a namespace identifier and a namespace-specific
    string (separated by colons)
    - e.g. urn:ISBN:3837027139

    View Slide

  8. Beat Signer - Department of Computer Science - [email protected] 8
    October 4, 2022
    HTTP Message Format
    ▪ Request and response messages have the same format

    ...

    HTTP/1.1 200 OK
    Date: Mon, 03 Oct 2022 16:59:21 GMT
    Server: Apache/2.4.54 (Ubuntu)
    X-Powered-By: PHP/8.0.23
    Transfer-Encoding: chunked
    Content-Type: text/html
    header field(s)
    blank line (CRLF)
    message body (optional)
    start line
    HTTP_message = start_line , {header} , "CRLF" , {body};

    View Slide

  9. Beat Signer - Department of Computer Science - [email protected] 9
    October 4, 2022
    HTTP Request Message
    ▪ Request-specific start line
    ▪ Methods
    ▪ GET : get a resource from the server
    ▪ HEAD : get the header only (no body)
    ▪ POST : send data (in the body) to the server
    ▪ PUT : store request body on server
    ▪ TRACE : get the "final" request (after it has potentially been modified by proxies)
    ▪ OPTIONS : get a list of methods supported by the server
    ▪ DELETE: delete a resource on the server
    start_line = method, " " , resource , " " , version;
    method = "GET" , "HEAD" , "POST" , "PUT" , "TRACE" ,
    "OPTIONS" , "DELETE";
    resource = complete_URL | path;
    version = "HTTP/" , major_version, "." , minor_version;

    View Slide

  10. Beat Signer - Department of Computer Science - [email protected] 10
    October 4, 2022
    HTTP Response Message
    ▪ Response-specific start line
    ▪ Status codes
    ▪ 100-199 : informational
    ▪ 200-299 : success (e.g. 200 for 'OK')
    ▪ 300-399 : redirection
    ▪ 400-499 : client error (e.g. 404 for 'Not Found')
    ▪ 500-599 : server error (e.g. 503 for 'Service Unavailable')
    start_line = version , status_code , reason;
    version = "HTTP/" , major_version, "." , minor_version;
    status_code = digit , digit , digit;
    reason = string_phrase;

    View Slide

  11. Beat Signer - Department of Computer Science - [email protected] 11
    October 4, 2022
    HTTP Header Fields
    ▪ There exist general headers (for requests and
    responses), request headers, response headers, entity
    headers and extension headers
    ▪ Some important headers
    ▪ Accept
    - request header defining the Media Type (formerly MIME type) that
    the client will accept
    ▪ User-Agent
    - request header specifying the type of client
    ▪ Keep-Alive (HTTP/1.0) and Persistent (HTTP/1.1)
    - general header helping to improve the performance since otherwise a new
    HTTP connection has to be established for every single webpage element
    ▪ Content-Type
    - entity header specifying the body's MIME type

    View Slide

  12. Beat Signer - Department of Computer Science - [email protected] 12
    October 4, 2022
    HTTP Header Fields ...
    ▪ Some important headers ...
    ▪ If-Modified-Since
    - request header that is used in combination with a GET request
    (conditional GET); the resource is only returned if it has been modified since
    the specified date

    View Slide

  13. Beat Signer - Department of Computer Science - [email protected] 13
    October 4, 2022
    Media Types (MIME Types)
    ▪ The Media Type defines the request or response body's
    content (used for appropriate processing)
    ▪ 7 top-level media types
    ▪ Standard Media Types are registered with the Internet
    Assigned Numbers Authority (IANA) [RFC-6838]
    mediaType = toplevel_type , "/" , subtype;
    Media Type Description
    text/plain Human-readable text without formatting information
    text/html HTML document
    image/jpeg JPEG-encoded image
    ... ...

    View Slide

  14. Beat Signer - Department of Computer Science - [email protected] 14
    October 4, 2022
    HTTP Message Information
    ▪ Various developer tools for HTTP message logging
    ▪ e.g. developer console window in Chrome (Ctrl+Shift+I)
    ▪ Simple telnet connection
    ▪ Until 1999 the W3C has been working on HTTP Next
    Generation (HTTP-NG) as a replacement for HTTP/1.1
    ▪ never introduced
    ▪ HTTP/2.0 since May 2015
    ▪ inspired by Google’s development of SPDY
    telnet wise.vub.ac.be 80 (press Enter)
    GET /beat-signer HTTP/1.1 (press Enter)
    Host: wise.vub.ac.be (press Enter 2 times)

    View Slide

  15. Beat Signer - Department of Computer Science - [email protected] 15
    October 4, 2022
    Proxies
    ▪ A web proxy is situated between the client and the server
    ▪ acts as a server to the client and as a client to the server
    ▪ can for example be specified in the browser settings; used for
    - firewalls and content filters
    - transcoding (on the fly transformation of HTTP message body)
    - content router (e.g. select optimal server in content distribution networks)
    - anonymous browsing, ...
    Internet
    Client Server
    Proxy

    View Slide

  16. Beat Signer - Department of Computer Science - [email protected] 16
    October 4, 2022
    Caches
    ▪ A proxy cache is a special type of proxy server
    ▪ can reduce server load if multiple clients share the same cache
    ▪ often multi-level hierarchies of caches (e.g. continent, country
    and regional level) with communication between sibling and
    parent caches as defined by the Internet Cache Protocol (ICP)
    ▪ passive or active (prefetching) caches
    Internet
    Client 1
    Server
    Proxy Cache
    Client 2
    1
    2
    1
    2

    View Slide

  17. Beat Signer - Department of Computer Science - [email protected] 17
    October 4, 2022
    Caches ...
    ▪ Special HTTP cache control header fields
    ▪ Expires
    - expiration date after which the cached resource has to be refetched
    ▪ Cache-Control: max-age
    - maximum age of a document (in seconds) after it has been added to the cache
    ▪ Cache-Control: no-cache
    - response cannot be directly served from the cache (has to be revalidated first)
    ▪ ...
    ▪ Validators
    ▪ Last-modified time as validator
    - cache with resource that has been last modified at time t uses an
    If-Modified-Since t request for updates
    ▪ Entity tags (ETag)
    - changed by the publisher if content has changed; If-None-Match etag request

    View Slide

  18. Beat Signer - Department of Computer Science - [email protected] 18
    October 4, 2022
    Caches ...
    ▪ Advantages
    ▪ reduces latency and used network bandwidth
    ▪ reduces server load (client and reverse proxy caches)
    ▪ transparent to client and server
    ▪ Disadvantages
    ▪ additional resources (hardware) required
    ▪ might get stale data out of the cache
    ▪ creates additional network traffic if we use an active caching
    approach (prefetching) but achieve a low cache hit rate
    ▪ server loses control (e.g. access statistics) since no longer all
    requests have to be sent to the server

    View Slide

  19. Beat Signer - Department of Computer Science - [email protected] 19
    October 4, 2022
    Tunneling
    ▪ Transmit one protocol encapsulated inside another protocol
    ▪ e.g. HTTP as a carrier for SSL connections
    ▪ Often used to "open" a firewall to protocols that would
    otherwise be blocked
    ▪ e.g. tunneling of SSL connections through an open HTTP port
    Internet
    SSL Client SSL Server
    SSL
    HTTP
    SSL
    HTTP[SSL] HTTP[SSL]

    View Slide

  20. Beat Signer - Department of Computer Science - [email protected] 20
    October 4, 2022
    Gateways
    ▪ A gateway can act as a kind of "glue" between
    applications (client) and resources (server)
    ▪ translate between two protocols (e.g. from HTTP to FTP)
    ▪ security accelerator (e.g. HTTPS/HTTP on the server side)
    ▪ often the gateway and destination server are combined in a single
    application server (HTTP to server application translator)
    Internet
    HTTP Client FTP Server
    HTTP/FTP
    Gateway
    HTTP
    FTP

    View Slide

  21. Beat Signer - Department of Computer Science - [email protected] 21
    October 4, 2022
    Session Management
    ▪ HTTP is a stateless protocol
    ▪ Session (state) tracking solutions
    ▪ use of IP address
    - problem: IP address is often not uniquely assigned to a single user
    ▪ browser login
    - use of special HTTP authenticate headers
    - after a login the browser sends the user information in each request
    ▪ URL rewriting
    - add information to the URL in each request
    ▪ hidden form fields
    - similar to URL rewriting but information can also be in body (POST request)
    ▪ cookies
    - the server stores a piece of information on the client which is then sent back to
    the server with each request

    View Slide

  22. Beat Signer - Department of Computer Science - [email protected] 22
    October 4, 2022
    Cookies
    ▪ Introduced by Netscape in June 1994
    ▪ A cookie is a piece of information that is
    assigned to a client on their first visit
    ▪ list of pairs
    ▪ often just a unique identifier
    ▪ sent via Set-Cookie HTTP response headers
    ▪ Browser stores the information in a "cookie database" and
    sends it back every time the same server is accessed
    ▪ Potential privacy issues
    ▪ persistent cookies with long lifetime
    ▪ third-party cookies for user tracking across different websites
    ▪ Cookies can be disabled in the browser settings

    View Slide

  23. Beat Signer - Department of Computer Science - [email protected] 23
    October 4, 2022
    Hypertext Markup Language (HTML)
    ▪ Dominant markup language for webpages
    ▪ If you never heard about HTML have a look at
    ▪ https://www.w3schools.com/html/
    ▪ More details in the exercise and in the next lecture



    Beat Signer: Cross-Media Technology, DataPhys, ...


    Beat Signer is Professor of Computer Science at the VUB
    and director of the WISE laboratory ...


    View Slide

  24. Beat Signer - Department of Computer Science - [email protected] 24
    October 4, 2022
    Dynamic Web Content
    ▪ Often it is not enough to serve static web pages but
    content should be changed on the client or server side
    ▪ Server-side processing
    ▪ Common Gateway Interface (CGI)
    ▪ Java Servlets
    ▪ JavaServer Pages (JSP)
    ▪ PHP: Hypertext Preprocessor (PHP)
    ▪ Node.js
    ▪ …
    ▪ Client-side processing
    ▪ JavaScript
    ▪ Java Applets
    ▪ ...

    View Slide

  25. Beat Signer - Department of Computer Science - [email protected] 25
    October 4, 2022
    Common Gateway Interface (CGI)
    ▪ CGI was the first server-side processing solution
    ▪ transparent to the user
    ▪ certain requests (e.g. /account.pl) are forwarded via CGI to a
    program by creating a new process
    ▪ program processes the request and creates an answer with
    optional HTTP response headers
    Internet
    Client Server
    HTTP Request
    HTTP Response
    Program in
    Perl, Tcl, C,
    C++, Java, ..
    HTML Pages
    CGI

    View Slide

  26. Beat Signer - Department of Computer Science - [email protected] 26
    October 4, 2022
    Common Gateway Interface (CGI) ...
    ▪ CGI Problems
    ▪ a new process has to be started for each request
    ▪ if the CGI program for example acts as a gateway to a database,
    a new DB connection has to be established for each request
    which results in a very poor performance
    ▪ FastCGI solves some of the problems by introducing
    persistent processes and process pools
    ▪ CGI/FastCGI became more and more replaced by other
    technologies (e.g. Java Servlets)

    View Slide

  27. Beat Signer - Department of Computer Science - [email protected] 27
    October 4, 2022
    Java Servlets
    ▪ A Java servlet is a Java class that has to extend the
    abstract HTTPServlet class
    ▪ The Java servlet class is loaded by a servlet container
    and relevant requests (based on a servlet binding) are
    forwarded to the servlet instance for further processing
    Internet
    Client Web Server
    HTTP Request
    HTTP Response
    HTML Pages
    Servlet
    Container
    Servlets

    View Slide

  28. Beat Signer - Department of Computer Science - [email protected] 28
    October 4, 2022
    Java Servlets ...
    ▪ Main HttpServlet methods
    ▪ Servlet life cycle
    ▪ a servlet is initialised once via the init() method
    ▪ the doGet(), doPost() methods may be executed multiple times (by
    different HTTP requests)
    ▪ finally the servlet container may unload a servlet (upcall of the
    destroy() method before that happens)
    ▪ Servlet container (e.g. Apache Tomcat) either integrated
    with web server or as standalone component
    doGet(HttpServletRequest req, HttpServletResponse resp)
    doPost(HttpServletRequest req, HttpServletResponse resp)
    init(ServletConfig config)
    destroy()

    View Slide

  29. Beat Signer - Department of Computer Science - [email protected] 29
    October 4, 2022
    Java Servlet Example
    package org.vub.wise;
    import java.io.*;
    import java.util.Date;
    import javax.servlet.http.*;
    import javax.servlet.*;
    public class HelloWorldServlet extends HttpServlet {
    public void doGet (HttpServletRequest req, HttpServletResponse res)
    throws ServletException, IOException {
    PrintWriter out = res.getWriter();
    out.println("");
    out.println("Hello World");
    out.println("The time is " + new Date().toString() + "");
    out.println("");
    out.close();
    }
    }

    View Slide

  30. Beat Signer - Department of Computer Science - [email protected] 30
    October 4, 2022
    Jakarta Server Pages (JSP)
    ▪ A "drawback" of Java servlets is that the whole page
    (e.g. HTML) has to be defined within the servlet
    ▪ not easy to share tasks between web designer and programmer
    ▪ Add program code through scriptlets and markup to
    existing HTML pages
    ▪ These JSP documents are then either interpreted on the
    fly (Apache Tomcat) or compiled into Java servlets
    ▪ The JSP approach is similar to PHP or Active Server
    Pages (ASP)
    ▪ Note that Java Servlets have become more and more an
    enabling technology (as with JSP)

    View Slide

  31. Beat Signer - Department of Computer Science - [email protected] 31
    October 4, 2022
    JavaScript
    ▪ Interpreted scripting language for client-side processing
    ▪ JavaScript functionality often embedded in HTML
    documents but can also be provided in separate files
    ▪ JavaScript often used to
    ▪ validate data (e.g. in a form)
    ▪ dynamically add content to a webpage
    ▪ process events (onLoad, onFocus, etc.)
    ▪ change parts of the original HTML document
    ▪ create cookies
    ▪ ...
    ▪ Note: Java and JavaScript are completely different
    languages!

    View Slide

  32. Beat Signer - Department of Computer Science - [email protected] 32
    October 4, 2022
    JavaScript Example
    ▪ More details about JavaScript in lecture 6 and in the
    exercise session


    <br/>document.write("<h1>Hello World!</h1>");<br/>


    View Slide

  33. Beat Signer - Department of Computer Science - [email protected] 33
    October 4, 2022
    Node.js
    ▪ Server-side JavaScript
    ▪ low-level, comparable to functionality offered by Servlets
    ▪ handling post/get requests, database, sessions, …
    ▪ Write your entire app in one language
    ▪ however, server-side and client-side code still separated
    ▪ Built-in web server (no need for Apache, Tomcat, etc.)
    ▪ High modularity
    ▪ packages can be added for additional functionality (via npm)
    ▪ many available frameworks (Express, Passport, Sequelize,…)
    ▪ HTTP utility methods (sessions, routing, ...)
    ▪ template engines (Jade, EJS, …)

    View Slide

  34. Beat Signer - Department of Computer Science - [email protected] 34
    October 4, 2022
    Java Applets
    ▪ A Java applet is a program delivered to the client side in
    the form of Java bytecode and runs in a sandbox
    ▪ executed in the browser using a Java Virtual Machine (JVM)
    ▪ an applet has to extend the Applet or JApplet class
    ▪ Advantages
    ▪ the user automatically always has the most recent version
    ▪ high security for untrusted applets
    ▪ full Java API available for trusted signed applets
    ▪ Disadvantages
    ▪ requires a browser Java plug-in
    ▪ only signed applets can get more advanced functionality
    - e.g. network connections to other machines than the source machine

    View Slide

  35. Beat Signer - Department of Computer Science - [email protected] 35
    October 4, 2022
    Java Applets ...
    ▪ More recently Java Web Start (JavaWS) is replacing
    Java Applets
    ▪ program no longer runs within the browser
    - less problematic security restrictions
    - less browser compatibility issues
    ▪ Math and Physics Applet Examples
    ▪ http://www.falstad.com/mathphysics.html

    View Slide

  36. Beat Signer - Department of Computer Science - [email protected] 36
    October 4, 2022
    Web Application Frameworks
    ▪ There exist dozens of web application frameworks and
    we will present various of these framework in 2 weeks!
    A web application framework is a software framework that
    is designed to support the development of dynamic web-
    sites, web applications, web services and web resources.
    The framework aims to alleviate the overhead associated
    with common activities performed in web development.
    For example, many frameworks provide libraries for
    database access, templating frameworks and session
    management, and they often promote code reuse.
    [https://en.wikipedia.org/wiki/Web_application_framework]

    View Slide

  37. Beat Signer - Department of Computer Science - [email protected] 37
    October 4, 2022
    Exercise 2
    ▪ Hands-on experience with the HTTP protocol

    View Slide

  38. Beat Signer - Department of Computer Science - [email protected] 38
    October 4, 2022
    References
    ▪ David Gourley et al., HTTP: The Definitive
    Guide, O'Reilly Media, September 2002
    ▪ M. Belshe et al., RFC7540 - Hypertext Transfer
    Protocol Version 2 (HTTP/2.0)
    ▪ http://www.faqs.org/rfcs/rfc7540.html
    ▪ N. Freed et al., RFC6838 - Media Type Specifications
    and Registration Procedures
    ▪ http://www.faqs.org/rfcs/rfc6838.html
    ▪ HTML and JavaScript Tutorials
    ▪ https://www.w3schools.com

    View Slide

  39. Beat Signer - Department of Computer Science - [email protected] 39
    October 4, 2022
    References ...
    ▪ M. Knutson, HTTP: The Hypertext Transfer
    Protocol (refcardz #172)
    ▪ https://dzone.com/refcardz/http-hypertext-transfer-0
    ▪ W. Jason Gilmore, PHP 5.4 (refcardz #23)
    ▪ https://dzone.com/refcardz/php-54-scalable
    ▪ Java Servlet Tutorial
    ▪ https://www.tutorialspoint.com/servlets/

    View Slide

  40. 2 December 2005
    Next Lecture
    HTML5 and the Open Web Platform

    View Slide