Wednesday, April 9, 2014

What happens after typing a URL into a browser

Problem

Explain what happens, step by step, after you type a URL into a browser. Use as much detail as possible.

Solution

Take the URL of my website as example: k2code.blogspot.com.
There’s no right, or even complete, answer for this question This question allows you to go into arbitrary amounts of detail depending on what you’re comfortable with. Here’s a start though:
Lets assume a simple http request.

In an extremely rough and simplified sketch, assuming the simplest possible HTTP request, no proxies and IPv4 (this would work similarly for IPv6-only client, but I have yet to see such workstation):
  1. Browser checks cache/cookie; if requested object is in cache and is fresh, skip to #8
  2. Browser asks your OS for server's IP address, and if your OS' DNS doesn't know the IP address,  then go to
  3. OS makes a DNS (Domain Name Service) lookup and replies the IP address to the browser
  4. Browser opens a TCP connection to server at port 80 (this step is much more complex with HTTPS) and sends the HTTP request (generally HTTP 1.1) through TCP connection
  5. browser receives HTTP response and may close the TCP connection, or reuse it for another request
  6. browser checks if the response is a redirect (3xx result status codes), authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal responses (2xx)
  7. if cacheable, response is stored in cache
  8. browser decodes response (e.g. if it's gzipped)
  9. browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound clip?)
  10. browser renders response, or offers a download dialog for unrecognized types

    One of the most interesting steps is Step 2 and 3 – “Domain Name Resolution”. The web addresses we type are nothing but an alias to an IP address in human readable form. Mapping of domain names and their associated Internet Protocol (IP) addresses is managed by the Domain Name System (DNS), which is a distributed but hierarchical entity.
    Each domain name server is divided into zones. A single server may only be responsible for knowing the host names and IP addresses for a small subset of a zone, but DNS servers can work together to map all domain names to their IP addresses. That means if one domain name server is unable to find the IP addresses of a requested domain then it requests the information from other domain name servers.

    For details on http request, you can refer here.

    References

    0 comments:

    Post a Comment