Wednesday, August 20, 2008

Troubleshooting Connectivity Problems on Windows Networks (Part 1)

This article series will explain various troubleshooting techniques that you can use when machines on a Windows network have difficulty communicating with each other.

Today’s network hardware and software is more reliable than ever but even so, things do occasionally go wrong. In this article series, I am going to discuss some troubleshooting techniques that you can use when a host on your Windows network has trouble communicating with other network hosts. For the sake of those with less experience in working with the TCP/IP protocol, I’m going to start with the basics, and then work toward the more advanced techniques.

Verify Network Connectivity

When one host has trouble communicating with another, the first thing that you must do is to gather some information about the problem. More specifically, you need to document the host’s configuration, find out if the host is having trouble communicating with any other machines on the network, and find out if the problem effects any other hosts.

For example, suppose that a workstation is having trouble communicating with a particular server. That in itself doesn’t really give you a lot to go on. However, if you were to dig a little bit deeper into the problem and found out that the workstation couldn’t communicate with any of the network servers, then you would know to check for a disconnected network cable, a bad switch port, or maybe a network configuration problem.

Likewise, if the workstation were able to communicate with some of the network servers, but not all of them, that too would give you a hint as to where to look for the problem. In that type of situation, you would probably want to check to see what the servers that could not be contacted had in common. Are they all on a common subnet? If so, then a routing problem is probably to blame.

If multiple workstations are having trouble communicating with a specific server, then the problem probably isn’t related to the workstations unless those workstations were recently reconfigured. More than likely, it is the server itself that is malfunctioning.

The point is that by starting out with a few basic tests, you can gain a lot of insight into the problem at hand. The tests that I am about to show you will rarely show you the cause of the problem, but they will help to narrow things down so that you will know where to begin the troubleshooting process.

PING

PING is probably the simplest TCP/IP diagnostic utility ever created, but the information that it can provide you with is invaluable. Simply put, PING tells you whether or not your workstation can communicate with another machine.

The first thing that I recommend doing is opening a Command Prompt window, and then entering the PING command, followed by the IP address of the machine that you are having trouble communicating with. When you do, the machine that you have specified should produce four replies, as shown in Figure A.

Figure A: The specified machine should generate four replies

The responses essentially tell you how long it took the specified machine to respond with thirty two bytes of data. For example, in Figure A, each of the four responses were received in less than four milliseconds.

Typically, when you issue the PING command, one of four things will happen, each of which has its own meaning.

The first thing that can happen is that the specified machine will produce four replies. This indicates that the workstation is able to communicate with the specified host at the TCP/IP level.

The second thing that can happen is that all four requests time out, as shown in Figure B. If you look at Figure A, you will notice that each response ends in TTL=128. TTL stands for Time To Live. What this means is that each of the four queries and responses must be completed within 128 milliseconds. The TTL is also decremented once for each hop on the way back. A hop occurs when a packet moves from one network to another. I will be talking a lot more about hops later on in this series.

Figure B: If all four requests time out, it could indicate a communications failure

At any rate, if all four requests have timed out, it means that the TTL expired before the reply was received. This can mean one of three things:

  • Communications problems are preventing packets from flowing between the two machines. This could be caused by a disconnected cable, a bad routing table, or a number of other issues.
  • Communications are occurring, but are too slow for PING to acknowledge. This can be caused by extreme network congestion, or by faulty network hardware or wiring.
  • Communications are functional, but a firewall is blocking ICMP traffic. PING will not work unless the destination machine’s firewall (and any firewalls between the two machines) allow ICMP echos.

A third thing that can happen when you enter the PING command is that some replies are received, while others time out. This can point to bad network cabling, faulty hardware, or extreme network congestion.

The fourth thing that can occur when pinging a host is that you receive an error similar to the one that is shown in Figure C.

Figure C: This type of error indicates that TCP/IP is not configured correctly

The PING: Transmit Failed error indicates that TCP/IP is not configured correctly on the machine on which you are trying to enter the PING command. This particular error is specific to Vista though. Older versions of Windows produce an error when TCP/IP is configured incorrectly, but the error message is “Destination Host Unreachable”

What if the PING is Successful?

Believe it or not, it is not uncommon for a ping to succeed, even though two machines are having trouble communicating with each other. If this happens, it means that the underlying network infrastructure is good, and that the machines are able to communicate at the TCP/IP level. Typically, this is good news, because it means that the problem that is occurring is not very serious.

If normal communications between two machines are failing, but the two machines can PING each other successfully (be sure to run the PING command from both machines), then there is something else that you can try. Rather than pinging the network host by IP address, try replacing the IP address with the host’s fully qualified domain name, as shown in Figure D.

Figure D: Try pinging the network host by its fully qualified domain name

If you are able to ping the machine by its IP address, but not by its fully qualified domain name, then you most likely have a DNS issue. The workstation may be configured to use the wrong DNS server, or the DNS server may not contain a host record for the machine that you are trying to ping.

If you look at Figure D, you can see that the machine’s IP address is listed just to the right of its fully qualified domain name. This proves that the machine was able to resolve the fully qualified domain name. Make sure that the IP address that the name was resolved to is correct. If you see a different IP address than the one that you expected, then you may have an incorrect DNS host record.





Comment Box is loading comments...