How Does the Internet Work?

本文章是转载 原文地址

目录
  1. 1. 介绍
  2. 2. 从哪里开始?互联网地址
  3. 3. 协议栈和数据包
  4. 4. 网络基础设施
  5. 5. 互联网基础设施
  6. 6. 互联网路由层次结构
  7. 7. 域名和地址解析
  8. 8. 重新审视互联网协议
  9. 9. 应用协议:HTTP和万维网
  10. 10. 应用协议:SMTP 和电子邮件
  11. 11. 传输控制协议
  12. 12. Internet Protocol 协议
  13. 13. Wrap Up

network

© 2002 Rus Shuler @ Pomeroy IT Solutions, all rights reserved

介绍

互联网是如何工作的?这是个好问题!互联网已经开始爆炸性的增长,我们每个人似乎都不可能逃脱www.com的轰炸,你会经常在电视上、收音机里和杂志上看到。由于互联网已经成为我们生活中很大的一部分,要最有效地使用这一新工具,需要对其有很好的理解。

本文解释了使 Internet 工作的底层基础设施和技术。它并没有很深入,但涵盖了足够的每个领域,以便对所涉及的概念有一个基本的了解。

原文

How does the Internet work? Good question! The Internet’s growth has become explosive and it seems impossible to escape the bombardment of www.com's seen constantly on television, heard on radio, and seen in magazines. Because the Internet has become such a large part of our lives, a good understanding is needed to use this new tool most effectively.

This whitepaper explains the underlying infrastructure and technologies that make the Internet work. It does not go into great depth, but covers enough of each area to give a basic understanding of the concepts involved.

从哪里开始?互联网地址

因为 Internet 是一个全球计算机网络,连接到 Internet 的每台计算机都必须有一个唯一的地址。 Internet 地址的格式为 nnn.nnn.nnn.nnn,其中 nnn 必须是 0 到 255 之间的数字。此地址称为 IP 地址。 (IP 代表 Internet 协议)

下图说明了两台连接到 Internet 的计算机;您的 IP 地址为 1.2.3.4 的计算机和另一台 IP 地址为 5.6.7.8 的计算机。互联网被表示为介于两者之间的抽象对象。
network1.gif

如果您通过 Internet 服务提供商 (ISP) 连接到 Internet,通常会在拨入会话期间为您分配一个 临时 IP 地址 。如果您从局域网 (LAN) 连接到 Internet,您的计算机可能有一个永久 IP 地址,或者它可能从 DHCP(动态主机配置协议)服务器获得一个临时地址。在任何情况下,如果您连接到 Internet,您的计算机就有一个唯一的 IP 地址。

Ping 程序

如果您使用的是 Microsoft Windowslinux 并且连接到 Internet,那么有一个方便的程序可以查看 Internet 上的计算机是否 ”在线“ 。它被称为 ping,名字可能来自旧潜艇声纳系统发出的声音。启动命令提,键入 ping www.yahoo.comping 程序将向指定的计算机发送 “ping”(实际上是 ICMP(Internet 控制消息协议)回显请求消息)。被 ping 的计算机将回复。 ping 程序将计算过期时间,直到回复返回(如果确实如此)。此外,如果您输入域名(即 www.yahoo.com)而不是 IP 地址,ping 将解析域名并显示计算机的 IP 地址。稍后将详细介绍域名和地址解析。

原文

Because the Internet is a global network of computers each computer connected to the Internet must have a unique address. Internet addresses are in the form nnn.nnn.nnn.nnn where nnn must be a number from 0 - 255. This address is known as an IP address. (IP stands for Internet Protocol; more on this later.)

The picture below illustrates two computers connected to the Internet; your computer with IP address 1.2.3.4 and another computer with IP address 5.6.7.8. The Internet is represented as an abstract object in-between. (As this paper progresses, the Internet portion of Diagram 1 will be explained and redrawn several times as the details of the Internet are exposed.)

If you connect to the Internet through an Internet Service Provider (ISP), you are usually assigned a temporary IP address for the duration of your dial-in session. If you connect to the Internet from a local area network (LAN) your computer might have a permanent IP address or it might obtain a temporary one from a DHCP (Dynamic Host Configuration Protocol) server. In any case, if you are connected to the Internet, your computer has a unique IP address.

Check It Out - The Ping Program

If you’re using Microsoft Windows or a flavor of Unix and have a connection to the Internet, there is a handy program to see if a computer on the Internet is alive. It’s called ping, probably after the sound made by older submarine sonar systems.If you are using Windows, start a command prompt window. If you’re using a flavor of Unix, get to a command prompt. Type ping www.yahoo.com. The ping program will send a ‘ping’ (actually an ICMP (Internet Control Message Protocol) echo request message) to the named computer. The pinged computer will respond with a reply. The ping program will count the time expired until the reply comes back (if it does). Also, if you enter a domain name (i.e. www.yahoo.com) instead of an IP address, ping will resolve the domain name and display the computer’s IP address. More on domain names and address resolution later.

协议栈和数据包

因此,您的计算机已连接到 Internet 并具有唯一的ip地址。它如何与连接到 Internet 的其他计算机“对话”?此处应提供一个示例:
假设您的 IP 地址是 1.2.3.4,并且您想向计算机 5.6.7.8 发送消息。您要发送的消息是“Hello computer 5.6.7.8!”。显然,必须通过将计算机连接到 Internet 的线路来传输消息。假设您已从家里拨入您的 ISP,消息必须通过电话线传输。因此,消息从字母文本翻译成电子信号,通过 Internet 传输,然后再翻译回字母文本。这是如何实现的?通过使用协议栈。每台计算机都需要一台在 Internet 上进行通信,并且它通常内置于计算机的操作系统(即 WindowsUnix 等)中。由于使用了两种主要的通信协议,因此 Internet 上使用的协议栈被称为 TCP/IP 协议栈。 TCP/IP 堆栈如下所示:

协议层 解释
应用程序协议 特定于应用程序(如 WWW、电子邮件、FTP 等)的协议层协议
传输控制协议层 TCP 使用端口号将数据包定向到计算机上的特定应用程序。
互联网协议层 IP 使用 IP 地址将数据包定向到特定计算机。
硬件层 将二进制数据包数据转换为网络信号并返回。(例如以太网网卡、电话线调制解调器等)

如果我们要按照消息 “Hello computer 5.6.7.8!” 从我们的计算机传输到 IP 地址为 5.6.7.8 的计算机的路径,则会发生如下情况:

network2.gif

  1. 该消息将从计算机上协议栈的顶部开始,然后向下工作。
  2. 如果要发送的消息很长,则消息通过的每个堆栈层可能会将消息分解为较小的数据块。这是因为通过 Internet(以及大多数计算机网络)发送的数据是以可管理的块形式发送的。在互联网上,这些数据块被称为数据包
  3. 数据包将通过应用层并继续到 TCP 层。每个数据包都分配有一个端口号。稍后将解释端口,但足以说明许多程序可能正在使用 TCP/IP 堆栈并发送消息。我们需要知道目标计算机上的哪个程序需要接收消息,因为它将侦听特定端口
  4. 通过 TCP 层后,数据包将进入 IP 层。这是每个数据包接收其目标地址 (5.6.7.8) 的位置。
  5. 现在,我们的消息数据包具有端口号IP 地址,它们已准备好通过 Internet 发送。硬件层负责将包含消息字母文本的数据包转换为电子信号,并通过电话线传输它们。
  6. 在电话线的另一端,您的ISP直接连接到互联网。ISP 路由器检查每个数据包中的目标地址,并确定将其发送到何处。通常,数据包的下一站是另一个路由器。
  7. 最终,数据包到达计算机 5.6.7.8。在这里,数据包从目标计算机的 TCP/IP 堆栈的底部开始,向上工作。
  8. 当数据包向上通过堆栈时,发送计算机堆栈添加的所有路由数据(如 IP 地址和端口号)将从数据包中剥离。
  9. 当数据到达堆栈顶部时,数据包已重新组装成其原始形式,“您好计算机5.6.7.8!
原文

So your computer is connected to the Internet and has a unique address. How does it ‘talk’ to other computers connected to the Internet? An example should serve here: Let’s say your IP address is 1.2.3.4 and you want to send a message to the computer 5.6.7.8. The message you want to send is “Hello computer 5.6.7.8!”. Obviously, the message must be transmitted over whatever kind of wire connects your computer to the Internet. Let’s say you’ve dialed into your ISP from home and the message must be transmitted over the phone line. Therefore the message must be translated from alphabetic text into electronic signals, transmitted over the Internet, then translated back into alphabetic text. How is this accomplished? Through the use of a protocol stack. Every computer needs one to communicate on the Internet and it is usually built into the computer’s operating system (i.e. Windows, Unix, etc.). The protocol stack used on the Internet is refered to as the TCP/IP protocol stack because of the two major communication protocols used. The TCP/IP stack looks like this:

Protocol LayerComments
Application Protocols LayerProtocols specific to applications such as WWW, e-mail, FTP, etc.
Transmission Control Protocol LayerTCP directs packets to a specific application on a computer using a port number.
Internet Protocol LayerIP directs packets to a specific computer using an IP address.
Hardware LayerConverts binary packet data to network signals and back.
(E.g. ethernet network card, modem for phone lines, etc.)

If we were to follow the path that the message “Hello computer 5.6.7.8!” took from our computer to the computer with IP address 5.6.7.8, it would happen something like this:

  1. The message would start at the top of the protocol stack on your computer and work it’s way downward.
  2. If the message to be sent is long, each stack layer that the message passes through may break the message up into smaller chunks of data. This is because data sent over the Internet (and most computer networks) are sent in manageable chunks. On the Internet, these chunks of data are known as packets.
  3. The packets would go through the Application Layer and continue to the TCP layer. Each packet is assigned a port number. Ports will be explained later, but suffice to say that many programs may be using the TCP/IP stack and sending messages. We need to know which program on the destination computer needs to receive the message because it will be listening on a specific port.
  4. After going through the TCP layer, the packets proceed to the IP layer. This is where each packet receives it’s destination address, 5.6.7.8.
  5. Now that our message packets have a port number and an IP address, they are ready to be sent over the Internet. The hardware layer takes care of turning our packets containing the alphabetic text of our message into electronic signals and transmitting them over the phone line.
  6. On the other end of the phone line your ISP has a direct connection to the Internet. The ISPs router examines the destination address in each packet and determines where to send it. Often, the packet’s next stop is another router. More on routers and Internet infrastructure later.
  7. Eventually, the packets reach computer 5.6.7.8. Here, the packets start at the bottom of the destination computer’s TCP/IP stack and work upwards.
  8. As the packets go upwards through the stack, all routing data that the sending computer’s stack added (such as IP address and port number) is stripped from the packets.
  9. When the data reaches the top of the stack, the packets have been re-assembled into their original form, “Hello computer 5.6.7.8!”

网络基础设施

所以现在您知道数据包如何通过互联网从一台计算机传输到另一台计算机。但两者之间有什么呢?互联网到底是什么?让我们看另一个图表:
network3.gif

ISP 为其拨入客户维护调制解调器池。这由某种形式的计算机(通常是专用计算机)管理,该计算机控制从调制解调器池到骨干网或专用线路路由器的数据流。此设置可以称为端口服务器,因为它“服务”对网络的访问。通常也在此处收集计费和使用信息。

数据包遍历电话网络和 ISP 的本地设备后,它们将路由到 ISP 的主干网或 ISP 从中购买带宽的主干网。从这里开始,数据包通常会通过多个路由器和多个骨干网,专用线路和其他网络,直到找到目的地,地址为5.6.7.8的计算机。但是,如果我们知道数据包在互联网上的确切路线,那不是很好吗?事实证明,有一种方法…

跟踪路由程序

如果您使用的是 Microsoft Windowslinux,并且连接到Internet,那么这是另一个方便的Internet程序。这个称为traceroute,它显示了您的数据包到给定Internet目的地的路径。与 ping 一样,必须从命令提示符处使用跟踪路由。在 Windows 中,使用 tracert www.yahoo.com。在 Unix 提示符下,键入 traceroute www.yahoo.com。与ping一样,您也可以输入IP地址而不是域名。Traceroute 将打印出数据包必须经过才能到达目的地的所有路由器、计算机和任何其他 Internet 实体的列表。

如果使用跟踪路由,您会注意到数据包必须经过许多操作才能到达目的地。大多数都有很长的名字,如 sjc2-core1-h2-0-0.atlas.digex.netfddi0-0.br4.SJC.globalcenter.net。这些是决定将数据包发送到何处的互联网路由器。

原文

So now you know how packets travel from one computer to another over the Internet. But what’s in-between? What actually makes up the Internet? Let’s look at another diagram:

Here we see Diagram 1 redrawn with more detail. The physical connection through the phone network to the Internet Service Provider might have been easy to guess, but beyond that might bear some explanation.
The ISP maintains a pool of modems for their dial-in customers. This is managed by some form of computer (usually a dedicated one) which controls data flow from the modem pool to a backbone or dedicated line router. This setup may be refered to as a port server, as it ‘serves’ access to the network. Billing and usage information is usually collected here as well.

After your packets traverse the phone network and your ISP’s local equipment, they are routed onto the ISP’s backbone or a backbone the ISP buys bandwidth from. From here the packets will usually journey through several routers and over several backbones, dedicated lines, and other networks until they find their destination, the computer with address 5.6.7.8. But wouldn’t it would be nice if we knew the exact route our packets were taking over the Internet? As it turns out, there is a way…

Check It Out - The Traceroute Program

If you’re using Microsoft Windows or a flavor of Unix and have a connection to the Internet, here is another handy Internet program. This one is called traceroute and it shows the path your packets are taking to a given Internet destination. Like ping, you must use traceroute from a command prompt. In Windows, use tracert www.yahoo.com. From a Unix prompt, type traceroute www.yahoo.com. Like ping, you may also enter IP addresses instead of domain names. Traceroute will print out a list of all the routers, computers, and any other Internet entities that your packets must travel through to get to their destination.

If you use traceroute, you’ll notice that your packets must travel through many things to get to their destination. Most have long names such as sjc2-core1-h2-0-0.atlas.digex.net and fddi0-0.br4.SJC.globalcenter.net. These are Internet routers that decide where to send your packets. Several routers are shown in Diagram 3, but only a few. Diagram 3 is meant to show a simple network structure. The Internet is much more complex.

互联网基础设施

互联网骨干网由许多相互互连的大型网络组成。这些大型网络称为网络服务提供商( Network Service Providers )又称 NSP。一些大型NSPUUNetCerfNetIBMBBN PlanetSprintNetPSINet以及其他。这些网络相互对等以交换数据包流量。每个 NSP 都需要连接到三个网络接入点(Network Access Points)又称NAP。在NAP上,数据包流量可能从一个NSP的骨干网跳转到另一个NSP的骨干网。NSP还在Metropolitan Area Exchanges又称MAE互连。MAENAP具有相同的目的,但为私人所有。NAP是最初的互联网互连点。NAPMAE 都称为互联网交换点又称 IXNSP还向较小的网络出售带宽,例如ISP和较小的带宽提供商。下图显示了此分层基础结构。

network3.gif

原文

The Internet backbone is made up of many large networks which interconnect with each other. These large networks are known as Network Service Providers or NSPs. Some of the large NSPs are UUNet, CerfNet, IBM, BBN Planet, SprintNet, PSINet, as well as others. These networks peer with each other to exchange packet traffic. Each NSP is required to connect to three Network Access Points or NAPs. At the NAPs, packet traffic may jump from one NSP’s backbone to another NSP’s backbone. NSPs also interconnect at Metropolitan Area Exchanges or MAEs. MAEs serve the same purpose as the NAPs but are privately owned. NAPs were the original Internet interconnect points. Both NAPs and MAEs are referred to as Internet Exchange Points or IXs. NSPs also sell bandwidth to smaller networks, such as ISPs and smaller bandwidth providers. Below is a picture showing this hierarchical infrastructure.

This is not a true representation of an actual piece of the Internet. Diagram 4 is only meant to demonstrate how the NSPs could interconnect with each other and smaller ISPs. None of the physical network components are shown in Diagram 4 as they are in Diagram 3. This is because a single NSP’s backbone infrastructure is a complex drawing by itself. Most NSPs publish maps of their network infrastructure on their web sites and can be found easily. To draw an actual map of the Internet would be nearly impossible due to it’s size, complexity, and ever changing structure.

互联网路由层次结构

那么,数据包如何在互联网上找到它们的方式呢?连接到 Internet 的每台计算机是否都知道其他计算机的位置?数据包是否只是简单地“广播”到互联网上的每台计算机?前面的两个问题的答案都是“不”。没有计算机知道任何其他计算机的位置,并且数据包不会发送到每台计算机。用于将数据包发送到其目标的信息包含在连接到 Internet 的每个路由器保存的路由表中。

路由器是分组交换机。路由器通常在网络之间连接,以在网络之间路由数据包。每个路由器都知道它的子网以及它们使用的IP地址。路由器通常不知道哪些IP地址“高于”它。检查下面的图 5。连接骨干网的黑匣子是路由器。顶部的较大 NSP 骨干在 NAP 处连接。在它们下面有几个子网络,在它们下面,有更多的子网络。底部是两个连接了计算机的局域网。

network5.gif

当数据包到达路由器时,路由器会检查原始计算机上的 IP 协议层放在那里的 IP 地址。路由器检查其路由表。如果找到包含 IP 地址的网络,则将数据包发送到该网络。如果找不到包含 IP 地址的网络,则路由器会通过默认路由发送数据包,通常沿主干层次结构向上发送到下一个路由器。希望下一个路由器会知道将数据包发送到哪里。如果没有,则再次向上路由数据包,直到它到达 NSP 主干网。连接到NSP骨干网的路由器拥有最大的路由表,在这里,数据包将被路由到正确的主干网,在那里它将通过越来越小的网络开始“向下”的旅程,直到找到它的目的地。

原文

So how do packets find their way across the Internet? Does every computer connected to the Internet know where the other computers are? Do packets simply get ‘broadcast’ to every computer on the Internet? The answer to both the preceeding questions is ‘no’. No computer knows where any of the other computers are, and packets do not get sent to every computer. The information used to get packets to their destinations are contained in routing tables kept by each router connected to the Internet.

Routers are packet switches. A router is usually connected between networks to route packets between them. Each router knows about it’s sub-networks and which IP addresses they use. The router usually doesn’t know what IP addresses are ‘above’ it. Examine Diagram 5 below. The black boxes connecting the backbones are routers. The larger NSP backbones at the top are connected at a NAP. Under them are several sub-networks, and under them, more sub-networks. At the bottom are two local area networks with computers attached.

When a packet arrives at a router, the router examines the IP address put there by the IP protocol layer on the originating computer. The router checks it’s routing table. If the network containing the IP address is found, the packet is sent to that network. If the network containing the IP address is not found, then the router sends the packet on a default route, usually up the backbone hierarchy to the next router. Hopefully the next router will know where to send the packet. If it does not, again the packet is routed upwards until it reaches a NSP backbone. The routers connected to the NSP backbones hold the largest routing tables and here the packet will be routed to the correct backbone, where it will begin its journey ‘downward’ through smaller and smaller networks until it finds it’s destination.

域名和地址解析

但是,如果您不知道要连接到的计算机的IP地址怎么办?如果您需要访问称为 www.anothercomputer.com 的 Web 服务器,该怎么办?您的网络浏览器如何知道这台计算机在互联网上的什么位置?所有这些问题的答案是域名服务DNSDNS是一个分布式数据库,它跟踪计算机的名称及其在互联网上的相应IP地址

许多连接到 Internet 的计算机都托管 DNS 数据库的一部分以及允许其他人访问它的软件。这些计算机称为 DNS 服务器。没有DNS服务器包含整个数据库;它们只包含它的一个子集。如果 DNS 服务器不包含另一台计算机请求的域名,则 DNS 服务器会将请求计算机重定向到另一台 DNS 服务器。

network6.gif

域名服务的结构类似于 IP 路由层次结构。请求名称解析的计算机将被重新定向到层次结构的“上移”,直到找到可以解析请求中的域名的 DNS 服务器。图 6 说明了层次结构的一部分。树的顶部是域根。一些较旧的,更常见的域在顶部附近看到。没有显示的是世界各地构成层次结构其余部分的众多DNS服务器。

当设置互联网连接时(例如,对于Windows中的LAN或拨号网络),通常将一个主DNS服务器和一个或多个辅助DNS服务器指定为安装的一部分。这样,任何需要域名解析的互联网应用程序都能够正常运行。例如,当您在 Web 浏览器中输入 Web 地址时,浏览器首先连接到主 DNS 服务器。获取您输入的域名的IP地址后,浏览器将连接到目标计算机并请求所需的网页。

在Windows中禁用DNS

如果您使用的是Windows 95 / NT并访问Internet,则可以查看您的DNS服务器,甚至禁用它们。

如果使用拨号网络:
打开“拨号网络”窗口(可在 Windows 资源管理器中的 CD-ROM 驱动器下和“网上邻居”上方找到该窗口)。右键单击您的互联网连接,然后单击属性。在连接属性窗口底部附近,按 TCP/IP 设置…按钮。

如果您有永久连接到互联网:
右键单击“网上邻居”,然后单击“属性”。单击“TCP/IP 属性”。选择顶部的“DNS 配置”选项卡。

您现在应该查看DNS服务器的IP地址。在这里,您可以禁用DNS或将DNS服务器设置为0.0.0.0。(首先记下 DNS 服务器的 IP 地址。您可能还必须重新启动Windows。现在,在 Web 浏览器中输入地址。浏览器将无法解析域名,您可能会看到一个令人讨厌的对话框,说明找不到DNS服务器。但是,如果您输入相应的IP地址而不是域名,浏览器将能够检索所需的网页。(在禁用 DNS 之前,使用 ping 获取 IP 地址。其他微软操作系统也类似。

原文

But what if you don’t know the IP address of the computer you want to connect to? What if the you need to access a web server referred to as www.anothercomputer.com? How does your web browser know where on the Internet this computer lives? The answer to all these questions is the Domain Name Service or DNS. The DNS is a distributed database which keeps track of computer’s names and their corresponding IP addresses on the Internet.

Many computers connected to the Internet host part of the DNS database and the software that allows others to access it. These computers are known as DNS servers. No DNS server contains the entire database; they only contain a subset of it. If a DNS server does not contain the domain name requested by another computer, the DNS server re-directs the requesting computer to another DNS server.

The Domain Name Service is structured as a hierarchy similar to the IP routing hierarchy. The computer requesting a name resolution will be re-directed ‘up’ the hierarchy until a DNS server is found that can resolve the domain name in the request. Figure 6 illustrates a portion of the hierarchy. At the top of the tree are the domain roots. Some of the older, more common domains are seen near the top. What is not shown are the multitude of DNS servers around the world which form the rest of the hierarchy.

When an Internet connection is setup (e.g. for a LAN or Dial-Up Networking in Windows), one primary and one or more secondary DNS servers are usually specified as part of the installation. This way, any Internet applications that need domain name resolution will be able to function correctly. For example, when you enter a web address into your web browser, the browser first connects to your primary DNS server. After obtaining the IP address for the domain name you entered, the browser then connects to the target computer and requests the web page you wanted.

Check It Out - Disable DNS in Windows

If you’re using Windows 95/NT and access the Internet, you may view your DNS server(s) and even disable them.
If you use Dial-Up Networking:
Open your Dial-Up Networking window (which can be found in Windows Explorer under your CD-ROM drive and above Network Neighborhood). Right click on your Internet connection and click Properties. Near the bottom of the connection properties window press the TCP/IP Settings… button.

If you have a permanent connection to the Internet:
Right click on Network Neighborhood and click Properties. Click TCP/IP Properties. Select the DNS Configuration tab at the top.

You should now be looking at your DNS servers’ IP addresses. Here you may disable DNS or set your DNS servers to 0.0.0.0. (Write down your DNS servers’ IP addresses first. You will probably have to restart Windows as well.) Now enter an address into your web browser. The browser won’t be able to resolve the domain name and you will probably get a nasty dialog box explaining that a DNS server couldn’t be found. However, if you enter the corresponding IP address instead of the domain name, the browser will be able to retrieve the desired web page. (Use ping to get the IP address prior to disabling DNS.) Other Microsoft operating systems are similar.

重新审视互联网协议

正如前面关于协议栈的部分所暗示的那样,人们可能会推测互联网上使用了许多协议。这是事实。互联网运行需要许多通信协议。这些协议包括TCPIP协议,路由协议中型访问控制协议应用程序级协议等。以下各节介绍互联网上一些更重要和最常用的协议。首先讨论较高级别的协议,然后讨论较低级别的协议。

原文

As hinted to earlier in the section about protocol stacks, one may surmise that there are many protocols that are used on the Internet. This is true; there are many communication protocols required for the Internet to function. These include the TCP and IP protocols, routing protocols, medium access control protocols, application level protocols, etc. The following sections describe some of the more important and commonly used protocols on the Internet. Higher level protocols are discussed first, followed by lower level protocols.

应用协议:HTTP和万维网

互联网上最常用的服务之一是万维网(WWW)。使Web工作的应用程序协议是超文本传输协议HTTP。不要将其与超文本标记语言 (HTML) 混淆。HTML是用于编写网页的语言,而HTTP是Web浏览器和Web服务器用来通过Internet相互通信的协议。它是一种应用程序级协议,因为它位于协议栈中TCP层的顶部,并且被特定应用程序用于相互通信。在这种情况下,应用程序是Web浏览器和Web服务器。

HTTP 是一种基于文本的无连接协议。客户端(Web 浏览器)向 Web 服务器发送 Web 元素(如网页和图像)的请求。在请求由服务器提供服务后,Internet 上的客户端和服务器之间的连接将断开连接。必须为每个请求建立新的连接。大多数协议都是面向连接的。这意味着两台计算机相互通信,使连接通过 Internet 保持打开状态。但是,HTTP 不会在客户端发出 HTTP 请求之前,必须与服务器建立新连接。

当您在 Web 浏览器中键入 URL 时,将发生以下情况:

  1. 如果 URL 包含域名,浏览器首先连接到域名服务器,并检索 Web 服务器的相应 IP 地址。
  2. Web浏览器连接到Web服务器并发送所需网页的HTTP请求(通过协议栈)。
  3. Web 服务器接收请求并检查所需的页面。如果该页面存在,Web 服务器将发送该页面。如果服务器找不到请求的页面,它将发送 HTTP 404 错误消息。(404的意思是“页面未找到”,因为任何在网上冲浪的人都可能知道。
  4. Web 浏览器接收回页面并关闭连接。
  5. 然后,浏览器解析页面并查找完成网页所需的其他页面元素。这些通常包括图像,小程序等。
  6. 对于所需的每个元素,浏览器会为每个元素向服务器发出其他连接和 HTTP 请求。
  7. 当浏览器完成加载所有图像,小程序等时,页面将完全加载到浏览器窗口中。
使用 Telnet 客户端检索使用 HTTP 的网页

Telnet 是互联网上使用的远程终端服务。它的使用最近有所下降,但它是研究互联网的非常有用的工具。在 Windows 中查找默认的 telnet 程序。它可能位于名为 telnet.exe 的 Windows 目录中。打开后,下拉“终端”菜单,然后选择“首选项”。在首选项窗口中,选中本地回显。(这样,您就可以在键入 HTTP 请求时看到它。现在下拉“连接”菜单,然后选择“远程系统”。输入 www.google.com 作为主机名,输入 80 作为端口。(默认情况下,Web 服务器通常侦听端口 80。按连接。现在键入

1
GET / HTTP/1.0

,然后按两次 Enter 键。这是对 Web 服务器的根页面的简单 HTTP 请求。您应该看到一个网页闪过,然后应该弹出一个对话框,告诉您连接已丢失。如果要保存检索到的页面,请在 Telnet 程序中打开日志记录。然后,您可以浏览网页并查看用于编写它的HTML

大多数互联网协议由称为 RFC 的互联网文档指定。RFC 可以在互联网上的多个位置找到。

原文

One of the most commonly used services on the Internet is the World Wide Web (WWW). The application protocol that makes the web work is Hypertext Transfer Protocol or HTTP. Do not confuse this with the Hypertext Markup Language (HTML). HTML is the language used to write web pages. HTTP is the protocol that web browsers and web servers use to communicate with each other over the Internet. It is an application level protocol because it sits on top of the TCP layer in the protocol stack and is used by specific applications to talk to one another. In this case the applications are web browsers and web servers.

HTTP is a connectionless text based protocol. Clients (web browsers) send requests to web servers for web elements such as web pages and images. After the request is serviced by a server, the connection between client and server across the Internet is disconnected. A new connection must be made for each request. Most protocols are connection oriented. This means that the two computers communicating with each other keep the connection open over the Internet. HTTP does not however. Before an HTTP request can be made by a client, a new connection must be made to the server.

When you type a URL into a web browser, this is what happens:

  1. If the URL contains a domain name, the browser first connects to a domain name server and retrieves the corresponding IP address for the web server.
  2. The web browser connects to the web server and sends an HTTP request (via the protocol stack) for the desired web page.
  3. The web server receives the request and checks for the desired page. If the page exists, the web server sends it. If the server cannot find the requested page, it will send an HTTP 404 error message. (404 means ‘Page Not Found’ as anyone who has surfed the web probably knows.)
  4. The web browser receives the page back and the connection is closed.
  5. The browser then parses through the page and looks for other page elements it needs to complete the web page. These usually include images, applets, etc.
  6. For each element needed, the browser makes additional connections and HTTP requests to the server for each element.
  7. When the browser has finished loading all images, applets, etc. the page will be completely loaded in the browser window.
Check It Out - Use Your Telnet Client to Retrieve a Web Page Using HTTP

Telnet is a remote terminal service used on the Internet. It’s use has declined lately, but it is a very useful tool to study the Internet. In Windows find the default telnet program. It may be located in the Windows directory named telnet.exe. When opened, pull down the Terminal menu and select Preferences. In the preferences window, check Local Echo. (This is so you can see your HTTP request when you type it.) Now pull down the Connection menu and select Remote System. Enter www.google.com for the Host Name and 80 for the Port. (Web servers usually listen on port 80 by default.) Press Connect. Now type
GET / HTTP/1.0

and press Enter twice. This is a simple HTTP request to a web server for it’s root page. You should see a web page flash by and then a dialog box should pop up to tell you the connection was lost. If you’d like to save the retrieved page, turn on logging in the Telnet program. You may then browse through the web page and see the HTML that was used to write it.

Most Internet protocols are specified by Internet documents known as a Request For Comments or RFCs. RFCs may be found at several locations on the Internet. See the Resources section below for appropriate URL’s. HTTP version 1.0 is specified by RFC 1945.

应用协议:SMTP 和电子邮件

另一个常用的互联网服务是电子邮件。电子邮件使用称为**简单邮件传输协议(Simple Mail Transfer Protocol)**又称 SMTP 的应用程序级协议。SMTP也是一种基于文本的协议,但与HTTP不同,SMTP是面向连接的。SMTP也比HTTP更复杂。SMTP中的命令和注意事项比 HTTP中的命令和注意事项多得多。

当您打开邮件客户端以阅读电子邮件时,通常会发生这种情况:

  1. 邮件客户端(Netscape Mail,Lotus Notes,Microsoft Outlook等)打开与其默认邮件服务器的连接。邮件服务器的 IP 地址或域名通常在安装邮件客户端时设置。
  2. 邮件服务器将始终传输第一条消息以标识自身。
  3. 客户端将发送 SMTP HELO 命令,服务器将使用 250 OK 消息响应该命令。
  4. 根据客户端是否正在检查邮件、发送邮件等,相应的 SMTP 命令将发送到服务器,服务器将做出相应的响应。
  5. 此请求/响应事务将继续,直到客户端发送 SMTP QUIT 命令。然后,服务器将说再见,连接将关闭。

SMTP 客户端和 SMTP 服务器之间的简单“对话”如下所示。
R:表示服务器(接收方)发送的消息,
S:表示客户端(发送方)发送的消息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
This SMTP example shows mail sent by Smith at host USC-ISIF, to
Jones, Green, and Brown at host BBN-UNIX. Here we assume that
host USC-ISIF contacts host BBN-UNIX directly. The mail is
accepted for Jones and Brown. Green does not have a mailbox at
host BBN-UNIX.

-------------------------------------------------------------

R: 220 BBN-UNIX.ARPA Simple Mail Transfer Service Ready
S: HELO USC-ISIF.ARPA
R: 250 BBN-UNIX.ARPA

S: MAIL FROM:<Smith@USC-ISIF.ARPA>
R: 250 OK

S: RCPT TO:<Jones@BBN-UNIX.ARPA>
R: 250 OK

S: RCPT TO:<Green@BBN-UNIX.ARPA>
R: 550 No such user here

S: RCPT TO:<Brown@BBN-UNIX.ARPA>
R: 250 OK

S: DATA
R: 354 Start mail input; end with <CRLF>.<CRLF>
S: Blah blah blah...
S: ...etc. etc. etc.
S: .
R: 250 OK

S: QUIT
R: 221 BBN-UNIX.ARPA Service closing transmission channel

此 SMTP 事务取自 RFC 821,它指定 SMTP。

原文

Another commonly used Internet service is electronic mail. E-mail uses an application level protocol called Simple Mail Transfer Protocol or SMTP. SMTP is also a text based protocol, but unlike HTTP, SMTP is connection oriented. SMTP is also more complicated than HTTP. There are many more commands and considerations in SMTP than there are in HTTP.

When you open your mail client to read your e-mail, this is what typically happens:

  1. The mail client (Netscape Mail, Lotus Notes, Microsoft Outlook, etc.) opens a connection to it’s default mail server. The mail server’s IP address or domain name is typically setup when the mail client is installed.
  2. The mail server will always transmit the first message to identify itself.
  3. The client will send an SMTP HELO command to which the server will respond with a 250 OK message.
  4. Depending on whether the client is checking mail, sending mail, etc. the appropriate SMTP commands will be sent to the server, which will respond accordingly.
  5. This request/response transaction will continue until the client sends an SMTP QUIT command. The server will then say goodbye and the connection will be closed.

A simple ‘conversation’ between an SMTP client and SMTP server is shown below. R: denotes messages sent by the server (receiver) and S: denotes messages sent by the client (sender).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
This SMTP example shows mail sent by Smith at host USC-ISIF, to
Jones, Green, and Brown at host BBN-UNIX. Here we assume that
host USC-ISIF contacts host BBN-UNIX directly. The mail is
accepted for Jones and Brown. Green does not have a mailbox at
host BBN-UNIX.

-------------------------------------------------------------

R: 220 BBN-UNIX.ARPA Simple Mail Transfer Service Ready
S: HELO USC-ISIF.ARPA
R: 250 BBN-UNIX.ARPA

S: MAIL FROM:<Smith@USC-ISIF.ARPA>
R: 250 OK

S: RCPT TO:<Jones@BBN-UNIX.ARPA>
R: 250 OK

S: RCPT TO:<Green@BBN-UNIX.ARPA>
R: 550 No such user here

S: RCPT TO:<Brown@BBN-UNIX.ARPA>
R: 250 OK

S: DATA
R: 354 Start mail input; end with <CRLF>.<CRLF>
S: Blah blah blah...
S: ...etc. etc. etc.
S: .
R: 250 OK

S: QUIT
R: 221 BBN-UNIX.ARPA Service closing transmission channel

This SMTP transaction is taken from RFC 821, which specifies SMTP.

传输控制协议

Transmission Control Protocol(TCP)
协议栈中的应用层下是 TCP 层。当应用程序打开与 Internet 上另一台计算机的连接时,它们发送的消息(使用特定的应用程序层协议)将沿着堆栈传递到 TCP 层。TCP 负责将应用程序协议路由到目标计算机上的正确应用程序。 为此,请使用端口号。可以将端口视为每台计算机上的单独通道。例如,您可以在阅读电子邮件时上网冲浪。这是因为这两个应用程序(Web 浏览器和邮件客户端)使用不同的端口号。当数据包到达计算机并在协议栈中向上移动时,TCP 层会根据端口号决定哪个应用程序接收数据包。

TCP的工作原理如下:

  • TCP层从上面接收到应用层协议数据时,它会将其分割成可管理的“块”,然后将具有特定TCP信息的TCP标头添加到每个“块”中。TCP 标头中包含的信息包括需要将数据发送到的应用程序的端口号。
  • TCP 层从其下方的 IP 层接收到数据包时,TCP 层会从数据包中剥离 TCP 标头数据,根据需要执行一些数据重建,然后使用从 TCP 标头获取的端口号将数据发送到正确的应用程序。

这就是 TCP 如何将通过协议栈移动的数据路由到正确的应用程序。
TCP 不是文本协议。TCP 是一种面向连接的可靠字节流服务。面向连接意味着使用 TCP 的两个应用程序在交换数据之前必须首先建立连接。TCP是可靠的,因为对于收到的每个数据包,都会向发送方发送确认以确认传递。TCP 还在其标头中包含校验和,用于对接收的数据进行错误检查。TCP 标头如下所示:

network7.gif

请注意,TCP 标头中没有 IP 地址的位置。这是因为TCPIP地址一无所知。TCP 的工作是可靠地从一个应用程序到另一个应用程序获取应用程序级数据。从一台计算机到另一台计算机获取数据的任务是IP的工作。

常见的互联网端口号

下面列出的是一些更常用的互联网服务的端口号。

FTP20/21
Telnet23
SMTP25
HTTP80
Quake III Arena27960
原文

Under the application layer in the protocol stack is the TCP layer. When applications open a connection to another computer on the Internet, the messages they send (using a specific application layer protocol) get passed down the stack to the TCP layer. TCP is responsible for routing application protocols to the correct application on the destination computer. To accomplish this, port numbers are used. Ports can be thought of as seperate channels on each computer. For example, you can surf the web while reading e-mail. This is because these two applications (the web browser and the mail client) used different port numbers. When a packet arrives at a computer and makes its way up the protocol stack, the TCP layer decides which application receives the packet based on a port number.

TCP works like this:

  • When the TCP layer receives the application layer protocol data from above, it segments it into manageable ‘chunks’ and then adds a TCP header with specific TCP information to each ‘chunk’. The information contained in the TCP header includes the port number of the application the data needs to be sent to.
  • When the TCP layer receives a packet from the IP layer below it, the TCP layer strips the TCP header data from the packet, does some data reconstruction if necessary, and then sends the data to the correct application using the port number taken from the TCP header.

This is how TCP routes the data moving through the protocol stack to the correct application.

TCP is not a textual protocol. TCP is a connection-oriented, reliable, byte stream service. Connection-oriented means that two applications using TCP must first establish a connection before exchanging data. TCP is reliable because for each packet received, an acknowledgement is sent to the sender to confirm the delivery. TCP also includes a checksum in it’s header for error-checking the received data. The TCP header looks like this:

Notice that there is no place for an IP address in the TCP header. This is because TCP doesn’t know anything about IP addresses. TCP’s job is to get application level data from application to application reliably. The task of getting data from computer to computer is the job of IP.

Check It Out - Well Known Internet Port Numbers

Listed below are the port numbers for some of the more commonly used Internet services.

FTP20/21
Telnet23
SMTP25
HTTP80
Quake III Arena27960

Internet Protocol 协议

TCP 不同,IP是一种不可靠的无连接协议。IP 不关心数据包是否到达其目的地。IP 也不知道连接和端口号。IP 将数据包发送和路由到其他计算机。IP 数据包是独立的实体,可能无序到达或根本没有到达。TCP的工作是确保数据包到达并按正确的顺序。IPTCP的唯一共同点是它接收数据并将自己的IP标头信息添加到TCP数据中的方式。IP 标头如下所示:

network8.gif

在上面,我们在IP标头中看到了发送和接收计算机的IP地址。下面是数据包通过应用层TCP 层IP 层后的外观。应用层数据在TCP层进行分段,添加TCP报头继续向IP层,添加IP报头,然后通过Internet传输数据包

network9.gif

原文

Unlike TCP, IP is an unreliable, connectionless protocol. IP doesn’t care whether a packet gets to it’s destination or not. Nor does IP know about connections and port numbers. IP’s job is too send and route packets to other computers. IP packets are independent entities and may arrive out of order or not at all. It is TCP’s job to make sure packets arrive and are in the correct order. About the only thing IP has in common with TCP is the way it receives data and adds it’s own IP header information to the TCP data. The IP header looks like this:

Above we see the IP addresses of the sending and receiving computers in the IP header. Below is what a packet looks like after passing through the application layer, TCP layer, and IP layer. The application layer data is segmented in the TCP layer, the TCP header is added, the packet continues to the IP layer, the IP header is added, and then the packet is transmitted across the Internet.

Wrap Up

Now you know how the Internet works. But how long will it stay this way? The version of IP currently used on the Internet (version 4) only allows 232 addresses. Eventually there won’t be any free IP addresses left. Surprised? Don’t worry. IP version 6 is being tested right now on a research backbone by a consortium of research institutions and corporations. And after that? Who knows. The Internet has come a long way since it’s inception as a Defense Department research project. No one really knows what the Internet will become. One thing is sure, however. The Internet will unite the world like no other mechanism ever has. The Information Age is in full stride and I am glad to be a part of it.

—— Rus Shuler, 1998
—— Updates made 2002

名词:

  • ISP:Internet 服务提供商
  • LAN:局域网
  • DHCP:动态主机配置协议
  • ICMP:Internet 控制消息协议
  • NSP:网络服务提供商,Network Service Providers
  • NAP:网络接入点,Network Access Points
  • MAE:大都市区交易所,Metropolitan Area Exchanges
  • NAPMAE 都称为互联网交换点又称 IX

扩展阅读:

分享文章:

我该怎样开始设计我的网站?
将自己的代码上传到npm
  1. 1. 介绍
  2. 2. 从哪里开始?互联网地址
  3. 3. 协议栈和数据包
  4. 4. 网络基础设施
  5. 5. 互联网基础设施
  6. 6. 互联网路由层次结构
  7. 7. 域名和地址解析
  8. 8. 重新审视互联网协议
  9. 9. 应用协议:HTTP和万维网
  10. 10. 应用协议:SMTP 和电子邮件
  11. 11. 传输控制协议
  12. 12. Internet Protocol 协议
  13. 13. Wrap Up