Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creating A Python Web Server From Scratch

Nitya
July 15, 2015

Creating A Python Web Server From Scratch

My PyCon Proposal Draft

Nitya

July 15, 2015
Tweet

Other Decks in Programming

Transcript

  1. WHAT IS A WEB SERVICE? A Web Service allows electronic

    devices to communicate over a network following open protocols. Two kinds of Web services are: 1. REST-compliant Web services, which make up representations of Web resources using a uniform set of stateless operations. 2. Arbitrary Web services, using an arbitrary set of operations. Different software systems exchange data with each other, using web services as a method of communication over the internet. The software system that requests data is called a service requester, whereas the software system that would process the request and provide the data is called a service provider. Different software might be built using different programming languages, and hence there is a need for a method of data exchange that doesn't depend upon a particular programming language. Most types of software can, however, interpret XML tags. Thus, Web services can use XML files for data exchange, such as programs, objects, messages or documents. Applications written in different languages can communicate through web servers thanks to the open protocols used. Web services can be activated by HTTP requests, to allow your code to function over the network. It allows applications to share data and services, while following standardized protocol. Each web service has a provider, requestor and registry. HTTP is simple, stable and quite popular. Web services are built on top of open standards such as TCP/IP, HTTP, Java, HTML, and XML. HTTP PROTOCOL: Hypertext Transfer Protocol (HTTP) is an application layer protocol for hypermedia information systems. The Word Wide Web uses this protocol for data transfers. Logical links between text nodes called hyperlinks are exchanged or transferred. It is a request-response protocol for client-server transactions. Clients send request messages and servers provide resources such as HTML files, content, or functions, along with a response message and status information. Often web browsers are the clients and applications running on websites hosted by a computer act as servers. Transmission Control Protocol (TCP) is often used as the underlying transport layer protocol. Uniform Resource Locators (URLs) or specifically Identifiers (URIs) use http or https URI schemes to identify or locate HTTP Resources. A connection can be reused to download images, scripts, stylesheets etc.
  2. An HTTP client makes a TCP connection to a port

    and makes a request. An HTTP server listening on that port receives the request and returns an HTTP status response, along with the requested resource, or else another message. These kind of transactions make up an HTTP session. HTTP request methods are performed on the identified resources provided by the server. Any client can use any method and the server can be configured to support the method. If the method is unknown it will be treated as unsafe and non- idempotent. Infinite methods can be defined. The most common methods are: GET: Requests a representation of the specified resource. Only retrieves data. HEAD: Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content. POST: Makes a request for the server to accept the object sent along with the request under the URI, such as an update. PUT: Requests that the object being sent should be stored under the URI, or modified if the URI is already in use. DELETE: Requests the resource to be deleted. TRACE: Shows the client what all changes and additions have been made upon the request by the server. OPTIONS: Returns the HTTP methods that the server supports for the specified URL. CONNECT: Converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy. PATCH: Applies partial modifications to a resource. All general-purpose HTTP servers are required to implement at least the GET and HEAD methods and, whenever possible, also the OPTIONS method. Some of the methods such as HEAD, GET, OPTIONS and TRACE are intended only for information retrieval and should not change the state of the server. By contrast, methods such as POST, PUT, DELETE and PATCH are intended for actions that may cause side effects either on the server, or external side effects such as financial transactions or transmission of email. Despite the prescribed safety of GET requests, in practice their handling by the server is not technically limited in any way. Therefore, careless/deliberate programming can cause non-trivial changes on the server, leading to problems for web caching, search engines and other automated agents, which can make unintended changes on the server. HTTP is a stateless protocol. A stateless protocol does not require the HTTP server to retain information or status about each user for the duration of multiple requests. However, some web applications implement states or server side sessions using for instance HTTP cookies or Hidden variables within web forms. Below is a sample conversation between an HTTP client and an HTTP server running on www.example.com, port 80. Client request GET /index.html HTTP/1.1 Host: www.example.com Server response HTTP/1.1 200 OK Date: Mon, 23 May 2005 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT ETag: "3f80f-1b6-3e1cb03b" Content-Type: text/html; charset=UTF-8 Content-Length: 138 Accept-Ranges: bytes Connection: close
  3. <html> <head> <title>Example</title> </head> <body> Hello World </body> </html> COOKIES:

    An HTTP/web/browser cookie is a small piece of data sent from a website and stored in a user's web browser while the user is browsing that website. Every time the user loads the website, the browser sends the cookie back to the server to notify the website of the user's previous activity. Cookies were designed to be a reliable mechanism for websites to remember stateful information or to record the user's browsing activity. Other kinds of cookies are also used. For example, authentication cookies are used by web servers to know whether the user is logged in or not, and which account they are logged in with. Without such a mechanism, the site would not know whether to send a page containing sensitive information, or require the user to authenticate themselves by logging in. The security of an authentication cookie generally depends on the security of the issuing website and the user's web browser, and on whether the cookie data is encrypted. Security vulnerabilities may allow a cookie's data to be read by a hacker, used to gain access to user data, or used to gain access (with the user's credentials) to the website to which the cookie belongs (see cross-site scripting and cross-site request forgery for examples). Web pages have no memories, so session cookies are used to keep track of page visits and retain information sent to the site. You can proceed through many pages of a site without having to authenticate and reprocess every website you visit again. Users are recognized by their unique cookies, so any changes or selections you do is remembered. Examples include online shopping cart applications, where cookies remember your selection and past activities on cart items. You can accept and adjust session cookies on your browser settings. Session cookies are temporary files created only as long as that browser session remains open. It will store your browsing information. Persistent cookies are files created in one of your browser subfolders and stay alive until manually removed or until a set lifetime is over. Cookies are plain text files which aren't compiled, and therefore have no functions or copies. They cannot browse your computer or hard disk on their own. They merely deliver the features of websites such as smooth page transactions. Primarily they are used to unlock your computers memory to store information by using different content or services. Data in a cookie is set by that particular website. They are text files with alphanumeric characters, to help access sites efficiently. Any personal information that DOES get accessed by a cookie can only be a result of your input on a website. Usually when a cookie stores personal information, the data is coded in a way that third parties can't read it if they access your cookie folder. Only the server that created the cookie can read and decode the information. Some sites encrypt and add security measures such as adding only unique anonymous content on local cookies, or store personal information on the site server to be read only by the anonymous cookie stored on your computer. Browsers have their own default settings to manage site cookies, and can be manually changed by the user. You can clear, prevent, delete, manage, and create cookies, as well as set the browser to suit your cookie preferences. Your browser saves cookies in its folders. Some websites can track your movement or data input across websites for analytical records, and at times this may be malicious. Today, newer browsers from Internet Explorer, Firefox, and Opera allows better degree of control in selecting which sites can or cannot send cookies. Cookies have no size restrictions as they contain only a unique session identifier. They can improve page loading, as cookies are usually small and require less bandwidth. Some sites use cookies to remember personalized settings on sites accessed by the user as encoded data.
  4. Cookies are used as state management, and are usually se

    by the server to be stored and returned by the client. Here is a simple example below: import Cookie HTTP_COOKIE = r'integer=5; string_with_quotes="He said, \"Hello, World!\""' print 'From constructor:' c = Cookie.SimpleCookie(HTTP_COOKIE) print c print 'From load():' c = Cookie.SimpleCookie() c.load(HTTP_COOKIE) print c The output is a valid Set-Cookie header passed to the client as part of the HTTP Response: From constructor: Set-Cookie: integer=5 Set-Cookie: string_with_quotes="He said, \"Hello, World!\"" From load(): Set-Cookie: integer=5 Set-Cookie: string_with_quotes="He said, \"Hello, World!\"" Once the Set-Cookie headers are received by the client, it will return those cookies to the server on subsequent requests using the Cookie header. The incoming header will look like: Cookie: integer=5; string_with_quotes="He said, \"Hello, World!\"" WHAT ARE SOCKETS?: There are many different channels used for communication, such as TCP, UDP, Unix domain sockets etc. A communication channel has both a socket at the beginning and end. They can communicate within a process, or between processes, even if they are between machines. Python is extremely useful for socket programming, as various socket libraries can provide and handle an interface to help you access higher-level application network protocols such as HTTP or FTP. At a low level, you can access the basic socket support in your operating system, to implement client- server protocols. Sockets have their own vocabulary: Domain: The family of protocols that is used as the transport mechanism. These values are constants such as AF_INET, PF_INET, PF_UNIX, PF_X25, and so on.
  5. Type: The type of communications between the two endpoints, typically

    SOCK_STREAM for connection-oriented protocols and SOCK_DGRAM for connectionless protocols. Protocol: Typically zero, this may be used to identify a variant of a protocol within a domain and type. Hostname: The identifier of a network interface: A string, which can be a host name, a dotted-quad address, or an IPV6 address in colon (and possibly dot) notation, A string "<broadcast>", which specifies an INADDR_BROADCAST address, A zero-length string, which specifies INADDR_ANY, or address in colon (and possibly dot) notation, A string "<broadcast>", which specifies an INADDR_BROADCAST address, A zero-length string, which specifies INADDR_ANY, or an Integer, interpreted as a binary address in host byte order. Port: Each server listens for clients calling on one or more ports. A port may be a Fixnum port number, a string containing a port number, or the name of a service. The Socket Module To create a socket, you must use the socket.socket() function available in socket module, which has the general syntax s=socket.socket(socket_family, socket_Type, protocol=0) Here is the description of the parameters −  socket_family: This is either AF_UNIX or AF_INET, as explained earlier.  socket_type: This is either SOCK_STREAM or SOCK_DGRAM.  protocol: This is usually left out, defaulting to 0. Once you have socket object, then you can use required functions to create your client or server program. Following is the list of functions required − Server Socket Methods: s.bind(): Binds address (hostname, port number pair) to socket. s.listen(): Sets up/starts TCP listener. s.accept(): This passively accepts TCP client connection, waiting until connection arrives (blocking). Client Socket Method: s.connect(): Actively initiates TCP server connection. General Socket Methods: s.recv(): Receives TCP message. {s.recv(bufsize[, flags]), s.recvfrom(bufsize[, flags]), s.recvfrom_into(buffer[, nbytes[, flags]]), s.recv_into(buffer[, nbytes[, flags]])} s.send(): Transmits TCP message. {s.send(string[, flags]), s.sendall(string[, flags]), s.sendto(string, (flags), address)}
  6. s.recvfrom(): Receives UDP message. s.sendto(): Transmits UDP message. s.close(): Closes

    socket. s.gethostname(): Returns the hostname. Now we are ready to start building our python server from scratch. It would be a network server using our computer's physical server, waiting for a client to send a request. After this, it generates a response and sends it back to the client. A Simple Server To write Internet servers, we use the socket function available in socket module to create a socket object. A socket object is then used to call other functions to setup a socket server. Now call bind(hostname, port) function to specify a port for your service on the given host. Next, call the accept method of the returned object. This method waits until a client connects to the port you specified, and then returns a connection object that represents the connection to that client. import socket s = socket.socket() host = socket.gethostname() # Get local machine name port = 12345 # Reserve a port for your service. s.bind((host, port)) # Bind to the port s.listen(5) # Now wait for client connection. while True: c, addr = s.accept() # Establish connection with client. print 'Got connection from', addr c.send('Thank you for connecting') c.close() # Close the connection A Simple Client Let us write a very simple client program which opens a connection to a given port 12345 and given host. This is very simple to create a socket client using Python's socket module function. The socket.connect(hostname, port) opens a TCP connection to hostname on the port. Once you have a socket open, you can read from it like any IO object. When done, remember to close it, as you would close a file. The following code is a very simple client that connects to a given host and port, reads any available data from the socket, and then exits. import socket s = socket.socket() host = socket.gethostname() port = 12345
  7. s.connect((host, port)) print s.recv(1024) s.close # Close the socket when

    done Now run this server.py in background and then run above client.py to see the result. # Following would start a server in background. $ python server.py # Once server is started run client to produce following result − Got connection from ('127.0.0.1', 48437) Thank you for connecting Python Internet modules A list of some important modules in Python Network/Internet programming. Protocol Common function Port No Python module HTTP Web pages 80 httplib, urllib, xmlrpclib NNTP Usenet news 119 Nntplib FTP File transfers 20 ftplib, urllib SMTP Sending email 25 Smtplib POP3 Fetching email 110 Poplib IMAP4 Fetching email 143 Imaplib Telnet Command lines 23 Telnetlib Gopher Document transfers 70 gopherlib, urllib USING A BROWSER AS A CLIENT: This follows HTTP Protocol. We will use our browser as an example of a client. import socket HOST, PORT = '', 8888 listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) //listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind((HOST, PORT)) listen_socket.listen(1) print 'Serving HTTP on port %s ...' % PORT while True: client_connection, client_address = listen_socket.accept() request = client_connection.recv(1024) print request http_response = """\ HTTP/1.1 200 OK Hello, World! """
  8. client_connection.sendall(http_response) client_connection.close() Running this code: $ python webserver.py Serving HTTP

    on port 8888 Now on http://localhost:8888/hello, “Hello, World!” will be displayed in your browser. This web address http://localhost:8088/hello is called your URL. It contains your HTTP Protocol, your host name localhost, your port number 8088, and your path leading you to the text, hello. Your browser finds this address of the web server to connect to, and the path of the page on the server to retrieve. It establishes a TCP connection with the web server USING SOCKETS, to send an HTTP request over the TCP connection to the server, then wait for the server to send a response back, and then displays the page. To create a TCP connection using sockets, we create a socket to bind to the port, then we allow the socket to listen for any responses and accept data, in order to finally establish a connection. An example of an HTTP Request is GET /hello HTTP/1.1, where GET is the HTTP method which as asking the server to return data , /hello is the path, and HTTP/1.1 is the HTTP version. An example of an HTP Response you might receive is HTTP/1.1 200 OK , along with the page information to be displayed after a line break, also known as the response body. 200 OK is the status response code. WORKING OF SOCKET: When you click on a link, your browser creates a socket to connect to the web server on a particular port. Basically your socket will contain the port and IP address where it will act as an end-point for a network connection on a machine. When the socket connection is complete, the socket can now send in a request for the text of the page. The same socket will read the reply, and then be destroyed, as client sockets are normally only used for one exchange, or small set of sequential exchanges. In a web server, it first creates a server socket to bind to the host port. Now we must allow the socket to listen. This tells the socket library to queue up the maximum number of connect (usually 5) requests before refusing outside connections. Now we have a server socket up and running. Now we can handle and produce client socket(s). Now these sockets can communicate through the channel established at the port, which will be recycled when the channel is destroyed. Now we can use various commands for communication between sockets, such as 'send' or 'recv.' However, send and recv handle network buffers more than bytes. That is, they return whatever the have received, or emptied, and the number of bytes they handled. Using a while loop, you can call the socket again and again to recv or send information until your data has been completely handled. This is how you can handle payloads. You can also specify the number of bytes you wish to end or receive at a time, when you call the socket to perform an action. If you have created your own client and server programs you may program your server code in such a way that it receives buffered information and receives the amount of data to be read, such that it reads all the data by breaking it up and reading it over a period of time. However, if you have created both your server and client application programs, the following methods can help you retrieve data efficiently: def mysend(self, msg): totalsent = 0 while totalsent < MSGLEN: sent = self.sock.send(msg[totalsent:]) if sent == 0: raise RuntimeError("socket connection broken") totalsent = totalsent + sent def myreceive(self): chunks = [] bytes_recd = 0 while bytes_recd < MSGLEN: chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048)) if chunk == '': raise RuntimeError("socket connection broken") chunks.append(chunk) bytes_recd = bytes_recd + len(chunk) return ''.join(chunks) On determining the message format type, and the length, you can use a loop to retrieve all of the data. If you decide to go the delimited route, you’ll be receiving in some arbitrary chunk size, (4096 or 8192 is frequently a good match for network buffer sizes), and scanning what you’ve received for a delimiter. In any case, use 2 loops to first determine the length and then get the data, as not all the data is sent or received in one pass.
  9. When a recv returns nothing, it means the socket connection

    has closed forever. HTTP protocol uses sockets for only one transfer. If the connection has not been broken, you need to specify a fixed length of bytes to be read, delimited, or end at, such that the connection does not last forever in the event that there is no end of transfer of data in the socket connection. You must also keep in mind that if you allow two-way communication simultaneously and pass recv with an arbitrary chunk size you may end up reading the start of a following message, and therefore must put it aside until needed. When you're done, close your socket so that the channel is also closed on both sides. Now can create a socket based server, and add cookies as well. We can process and utilize data sent through the socket. Our client socket will now receive data through the established connection, including a series of headers including the HTTP Response headers and parameters, such as the response method, path, data items, port numbers, HTTP URL and version number etc. You can now extract this information yourself by parsing the headers. I did this by creating a function that I pass my data into. I can now pass methods, and allow path redirection to any url I wish by setting the path to the document root of my wish. I can also open and close file handles of my wish, such that I can open folders of my wish in response, such as json or HTML text files. If you wish to not use files, you can input the required data to be displayed directly by creating your own response as a template to be sent to all by the client socket, in an HTTP Response format. Finally, you can send HTTP Response statuses in the same manner. Here is an example code below: import logging import os.path import socket import sys RESPONSE_TEMPLATE = """HTTP/1.1 200 OK {headers} {content}""" LOGGER = logging.getLogger(__name__) HTML_CONTENT_TYPES = ('text/html') JSON_CONTENT_TYPES = ('application/json') def parse_headers(request): """Return a dictionary in the form Header => Value for all headers in *request*.""" headers = {} for line in request.split('\n')[1:]: # blank line separates headers from content if line == '\r': break header_line = line.partition(':') headers[header_line[0].lower()] = header_line[2].strip() return headers def is_content_type_negotiable(accepts, extension): """Return the content-type we must reply with or None if no acceptable content-type can be chosen.""" # For now, just check if the extensions is included somewhere in the Accepts # header, or that Accepts is "*/*" return extension in accepts or accepts == '*/*' def response_with_cookies(content): return RESPONSE_TEMPLATE.format(headers='Set-Cookie: HasVisited = 1;', content=content) def main(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.bind(('', int(sys.argv[1]))) listen_socket.listen(1) document_root = sys.argv[2] while True: connection, address = listen_socket.accept() request = connection.recv(1024) headers = parse_headers(request) cookies = None if 'cookie' in headers: cookies = {e.split('=')[0]: e.split('=')[1] for e in headers['cookie'].split(';')} if 'HasVisited' in cookies:
  10. print 'User has already visited!' start_line = request.split('\n')[0] method, uri,

    version = start_line.split() path = document_root + uri extension = os.path.splitext(path)[1][1:] if 'accept' not in headers or not is_content_type_negotiable( headers['accept'], extension): connection.sendall('HTTP/1.1 406 Not Acceptable\n') elif not os.path.exists(path): connection.sendall('HTTP/1.1 404 Not Found\n') else: with open(path) as file_handle: file_contents = file_handle.read() connection.sendall(response_with_cookies(file_contents)) connection.close() if __name__ == '__main__': sys.exit(main()) HTTP responses, as shown can be set, or retrieved via a specified path, such as a folder containing text or json. The request variable is set to connection.recv(1024). This can be set in a while loop to retrieve all the information precisely. The method can also be extracted and put into a switch case, enabling different pages to be retrieved depending on the request method being performed. This also shows that HTTP Response Headers and Status codes can be established manually via syntax such as connection.sendall('HTTP/1.1 200 OK\nContent-Type: text/html\n\n'), etc.