Thursday, September 5, 2019
Network Troubleshooting Symptoms And Solutions
Network Troubleshooting Symptoms And Solutions Data transfer from one point to another is the most important aspect of computers. Networks should be safe, reliable and secure for data movement. Any problem which occurs with computer networks causes havoc. Understanding networks, the way they work, and how they are built helps a network administrator to identify and fix the problems. 11.2 Procedure to troubleshoot network problems A simple formula that enables network administrators solve any kind of network problem is: Identify the symptoms. Identify the affected area. Determine what has changed Select the most probable cause Implement a solution Test the result Recognise the potential effects of the solution Document the solution 11,2,1 Identify Symptoms Indicators are either physical or logical symptoms that help determine the nature of the problem, the reach of the problem, etc. These symptoms enable a network administrator to take timely preventive measures to solve the problem before it grows beyond control. System or operator problems System errors arise from a computer, network device or a process that is not related to a users direct interaction with the system or network. Such errors can occur due to hardware failure, faults in the process of data transfer or manipulation. Operator errors are a direct consequence of a users action. The actions that can cause such errors may be incorrect log in, wrong connections to a server, misidentification of servers or network devices, incomplete network connections, etc. Mistakes on part of the network administrator that are common causes of operator errors are misconfiguration of devices, programs or services. Link lights When a networking device detects a network connection a green or amber Light-Emitting Diode (LED) is turned ON. This is the link light that shines when in the ON state. Components of a network are designed with link lights to show the state of the network connection. When a physical network connection is present a link light remains on and another light is present that displays the current activity of the network card and blinks, pluses, during data transfer.. Link lights are designed to not light up in case of an incorrect network cable connection. By examining the link light of a device, a user can determine if a network connection is running or not. Collision lights Lights that indicate whether a certain connection is facing problems due to packets colliding with one another is a collision light. The collision light (activity light) is green while sending and receiving data and is yellow or orange when a collision error is detected. The packet being received or sent is lost when a collision occurs. Faulty cables or hubs can result in packets being generated from other packets or electrical interfaces which when in large magnitude are called chatter. These network chatters can end up halting an entire network because of data packet collision. Network administrators and users should monitor these lights to detect network chatter and avoid it. Power lights A power light indicates if there is power supply to the networking device or not. In case there is no power supply the power light is off. The power supply cables or wall connectors should be checked for proper connections while troubleshooting a network problem. Error displays A device failure or malfunction is indicated by an error display. A visual error dialogue box on the computer or an LED error display on the network device is the form of an error display. These displays also describe the problem that is detected. Typically, an error display relates to an error code that should be referred by the user to identify the cause and a suitable solution. Every physical or logical problem has a unique solution provided by its manufacturer which can be found in its documentation. Error logs and displays A list of the errors encountered on a network device is the error log. The time of the error occurrence, the nature of the error and a suitable solution is what constitutes an error log. The information found in an error log is not sufficient to solve a problem and requires the support of the related documentation to resolve the problem. Error logs are important sources of information that includes the time of the error, probable cause and other processes affected by the error. An error display gives a visual alert of a problem and logs it in the error log. Not all error displays require immediate attention but a few are warnings that do not indicate an existing error but need attention. An Event Viewer is an error logging mechanism which is typical to Windows-based OS such as Windows clients and servers. Event viewer is a critical tool in diagnosing and resolving a problem. Red-X error entries that have occurred are recorded in an event viewer. This is an application that reads the binary log files stored at system32config folder. To view the error logs, the network administrator requires to view the config folder because the event viewer collects information from the files located in that folder. Error logs are of three types which should be monitored by the network administrator regularly and they are as follows:. System log Error messages that are related to device driver failure, service start failure and general information about OS events are recorded in the system log. Security log When auditing has been enabled all security related events are recorded in a security log. Application log Events that are generated by application running on top of the OS are recorded in an application log. Identify network problems Troubleshooting a network is one of the key concept in networking. Identifying network problems and determining methods of troubleshooting these problems is vitally important for smooth functioning of a network. The job of highlighting the problem is usually done by the network user. This should not be the basis on which an administrator should attack a problem. It is advisable to experience the problem in person along with the user who reported it. This helps the network administrator confirm if the problem is real or just an error. There are certain users whose knowledge about computers and networks is not vast. With such users when a problem is reported, a first-hand inspection and confirmation are necessary. The best approach to solving a problem is by determining its scope. The reason why an understanding of the nature of the problem is essential is that it determines the line of attack. Gathering information helps the network administrator to narrow down to the root of the problem. This approach avoids a network administrator spending unnecessary time on unnecessary jobs. After the network administrator is able to pinpoint the cause of the problem, then finding a solution is possible. A network administrator must first gather information to find out if the problem is with a single computer or with the entire network. In case the problem is local, then the entire network is not burdened and a solution can be found easily. The first logical step that is to be taken is to check all cable connections to and from the system. It is not advisable to look into bigger issues or target larger sources when the cause might be very simple. The reason why a system is not able to connect to other systems may be that the network cable is not plugged in properly into the system. When the cable is connected properly, the network connection is up and running. For example, if two systems in a network are not able to communicate with each other then the network administrator can perform simple checks like verifying the connections between the systems or connections to the network A network administrator should check if a problem is consistent and replicable. If the problem reported unique to a system or can it be replicated in other systems in the network. If the same problem is reported from another system also, then the problem is consistent and replicable. The degree of damage is high in such a case since many systems are affected. If the problem is identified to be with the network, then the network administrator must reach to the cause of the problem step-by-step. A large computer network requires a lot of effort from the network administrators and users for it to run smoothly. Adding to the huge task of maintenance is the job of fixing a problem. It becomes difficult to identify the actual problem for there are numerous workgroups and workstations. It is advisable to approach large network problems with the trial and error method. The administrator should first check the local system from where a problem has been reported. A thorough check of its cable connections, network links, power supplies and so on should be done. If the problem is not with a local system then other systems in the vicinity should be checked. The routers to and from the system should be verified for proper functioning. The various connections should be verified. One of the best methods to check and fix network problems is to try connecting to other systems and parts of the network by pinging to them. 11.2.2 Identify the affected area. After the cause of the problem is identified it is simple logic to isolate the affected area. This step helps an administrator narrow down to the core of the problem. With many issues to be handled at a time; administrators must prioritise the problems. Issues which affect work to a large extent should be fixed first and the rest should follow sequentially. By doing this the downtime can be reduced and the system can be fixed faster. Same line, different computer While trying to solve a problem it is important to isolate the affected system. A simple method of testing if the problem is replicable is by replacing the original workstation with another system which is known to have no problems. By confirming if the problem is with the local system or beyond it, the network administrator eliminates one factor of the problem. An easy way of isolating the system is by replacing it with another system. This step determines if a problem is computer specific or not. Same computer, different line When a user reports of a problem, another method by which the administrator can reach the cause of the problem is by changing the network cables for the system. By doing this, the possibility of a network error can be identified or nullified. In case the system works properly with a new connection, then it is for sure that the problem is with the original network and not the computer. Swapping components In a network, hubs, cables, terminators can be swapped with other systems to check for replication or consistency of a problem. This helps in case there is a faulty component and the work of the original system user is not disturbed since a spare is in place. This step helps determine the scope of the problem and address it appropriately. Prioritising work is very important for network administrators and network administrators since the time and effort used to fix a problem should help users to get back to work faster. Bigger and crucial problems require immediate action while the smaller ones can be addressed after the bigger ones are fixed. Isolating segments of the network By isolating parts of the network spread of the problem is checked such that the entire network does not collapse. The systems which have reported issues are required to be disconnected from the network and terminators plugged in. This is a safe practice as it helps the network administrators fix the problem with lesser overloads of data and work. Steps for problem isolation are given in Table 11.1 Step Action Determine which systems are and which are not showing symptoms. Separate the systems that are showing symptoms from those that are not with hubs or terminators. Rule out simple issues. Reset all major connections to and from the system. Eliminate cable problems. Check for physical damage or erroneous connections of cables. Eliminate serious cable issues. Use TDR to find out cable problems. Table 11.1 Methods to isolate network problems. 11.2.3 Determine what has changed Computer networks have many components, both hardware and software that can be replaced or reset to meet the requirements of the business. This can also be one of the causes of a network problem. For example, if a user reports of a problem after a system in a network is replaced then the administrator should check if the address of the changed system is properly recorded and connected with the existing network. To be able to fix problems when changes are made to the network, it is advisable to maintain proper records of all details about the new and existing network such as the points of change, the components of change, their versions, IP addresses and network cables. A detailed documentation system helps fix such issues quickly. Checking the status of servers Servers are an integral and crucial resource in a network. Their health is very important for functioning of a network. Therefore it is logical to check the server status when faced with a problem. If sever issues are not addresses on time, then the degree of damage can be high. A few server monitoring tasks that can monitor their status are as follows: Check services Check error logs Check connectivity Monitor performance and network traffic Confirm alerts and alarms Verify backup logs along with test restores. Checking error logs Error logs are an important source of information for a network administrator. It throws light on the errors that have occurred and their nature. The amount of damage can also be assessed from this log. The administrator can prioritise errors on the extent of damage and fix them in that order. It is important to review the error log on a daily basis because certain errors have dependencies which can spread the damage faster. It is advisable to make it a habit to check the error logs at some point of time in the day to help the network work better. Connectivity between systems or servers can be tested using the Ping feature. If the system at the other end of the ping message responds then the connection is intact, else a thorough check of the connection should be done. Regular checks should be carried out to look out for server overload problems. An overloaded server can slow down system performance and speed. Backup servers should also be monitored for updates and performance. In the event of an emergency backup data and resources are very crucial. An alert system can be designed which can raise alarms when predefined limits are exceeded. This is a good preventive measure that helps in proper functioning of a network. Checking for configuration problems Before introducing a new resource into the network it is necessary to verify the existing configurations and connections. In case the existing settings are incorrect, then the new resource cannot work. For example, before setting up a new server it is good to check the base OS, TCP/IP, network cables, error logs and memory allocation for accuracy. This improves performance and does not allow deterioration of the system. After the existing settings and connections are verified the new resource and additional services must be configured accurately. There are a few tasks which require the entire network to be turned off for a few minutes before rebooting. This activity has to be timed for low work load hours. A few critical services which determine functioning of the entire network and requries constant monitor and accurate configuration are: DNS a Microsoft active directory and other Internet-based applications supports this service. A detailed plan should be in place before configuring a DNS which require a list of information before installation. Domain name WINS WINS is a feature similar to DNS which resolves NetBIOS names to IP addresses. This is a dynamic service which can add, modify and delete name registrations and avoid human errors and save time. WINS has many configuration possibilities and the user can add a static mapping for clients or severs. Host file Host files and DNS are similar in function. Host files require manual configuration of database with exact mappings of hostname to IP addresses. Host files reside on every computer making the process of updating difficult. It is very important to provide correct hostname to IP address mappings so that all rules that apply to the DNS, also apply to the host files. To avoid typing errors while configuring host files, it is safe to copy the existing hosts to the newly created file and on each of the machines. Checking for viruses Viruses are a huge and common threat to computers. A computer network is at greater risk as the number of computers is large and the damage can be huge. Protecting networks and computers outside of networks from viruses is a top priority job. Viruses grow in the computer world at a pace which matches the growth in the biological world. The mechanisms to destroy viruses evolve every minute to fight the strongest and newest virus. It is the job of the network administrator to keep the network free of viruses. Constant updates of virus definition files, scans to check for entry of viruses and antivirus software are the most popular and best methods to fight them. Preventive measures are best when deployed round the clock. All resources in a network should be scanned for viruses and guarded from them. Every piece of software and hardware is crucial to the smooth working of a network or system. There are a number of virus scanning utilities available in the market which enables computers to automatically update virus definition files from a core server therefore avoiding the administrator making trips to each workstation. Checking the validity of account name and password Account name and password are the gates that lead the user to a whole world of services, applications and data. Their validity matters a lot for the user to be able to access services, applications or data. Many services use the built-in system account details for success while a few other services require the user to log on to a remote system. This task requires an account name and password that resides in the network account database. To activate certain services or applications, administrative privileges or membership in certain groups is necessary which again requires account name and password. For quite a few system-related tasks, administrative rights are required which allows the user to modify certain settings to suit the need. The worst situation is when a network administrator has configured many applications and services with the administrator account which gets deleted on end of service of the administrator. If all services and applications using the administrator account are disabled and access is denied, then it is very difficult to fix this. Rechecking operator logon procedures Very often users end up facing problems with passwords. Users try to logon to a part of the network for which access is not granted, forget passwords, do not remember the case-sensitive feature of passwords, and so on. Many a times a user tries more than thrice to logon with a certain or different passwords after which the user is locked out. To resolve this minor but deep-penetrating issue the administrator must reset the password for the user. Passwords should be changed at regular intervals for safety and to avoid expiry and this is an issue for many users. Selecting and running appropriate diagnostics Diagnostics is an essential tool to even out variations and eradicate potential problems in a network. Though this is a preventive mechanism the benefits of using it are many. Diagnostics look out for bottlenecks and problematic situations. Diagnostic tools bring out problems or drawbacks and limitations that can be fixed before they erupt as big problems. While choosing a diagnostic program the user should bear in mind the network requirements for which it is to be employed. Smaller networks should use simple diagnostic programs while large networks require extensive protocol analysing and packet sniffing products. Free diagnostic products such as performance monitor and network monitor work well for a medium sized network. To utilise the tool to the maximum without affecting the network performance, the network administrator must research the diagnostic product in depth. With experience and products like these, an administrator is able to identify the problem in time and resolve it effectively. A reliable baseline of activities must be established for testing. Snapshots of different activities at different time periods of the day, week and month helps assess the network efficiently and accurately. 11.2.4 Select the most probable cause Of the many steps suggested to solve a problem, experience of the network administrator is a capability that matters a lot for the process. In case the network administrator is not well versed with the network or common network issues, resolving becomes a tough task. The way a network administrator approaches a problem solves the issue to quite an extent for it can guide or misguide the way the solution is built. In case a new or an outside network administrator is being roped in to solve a network problem the chances of long system downtime is high. The new network administrator has to get familiarised with the network, and then look out for probable causes. The more the experienced the network administrator is, the easier it is to solve the problem. Many a times problems may be similar across systems and a network administrator can tap the experience from the past to fix it faster. A company can benefit largely from a full-time network administrator and who knows the details of the network at the back of the mind. Common problems and their probable causes The common network problems are their probable causes are given in Table 11.2 Problem Probable cause Cannot connect to a computer on a remote network. A routing issue in all probability. Check if it is possible to connect to a local system and ping the router or another system on the remote network. Communication in the entire network is down. If in a coax-based network, check for loose connections. If in a twisted-pair network, check if the hub is operational. If in a token ring network, check if the computer is not beaconing. Takes a very long time to connect to a network resource. Network may be overloaded. A device on the system is not functioning and network connection is not possible. A network card configuration issue in most cases. Check if the NIC is configured properly. Driver may be loaded incorrectly. Communication in a local network is not possible, but other networks are working. Check if the hub/switch is not locked up. Check if the network adapter is configured properly. No Internet access. Check the Internet gateway. Check the router present has a dedicated Internet connection. Check the Internet providers network. Token ring network is locked up. Someone in the network is beaconing. Also check if the bridge is locking up. Table 11.2 Certain problems and probable causes 11.2.5 Implement a solution In order to fix a problem a network administrator can consult others, read related documents, research from the Internet and seek help from the vendor help lines. Finally with a solution on hand that seems most suitable it should be implemented without any delay. 11.2.6 Test the result Confirming if the solution implemented is correct and has solved the problem is very crucial to the problem solving process. Any user contacts the network administrator with a hope of fixing the problem and getting back to work. If the network administrator leaves the user without confirming if the solution provided is correct, then the purpose of the network administrator being present is defeated. It is the duty of the network administrator to ensure that the problem with which the user had approached does not repeat. 11.2.7 Recognise the potential effects of the solution With a working solution in place the next factor that a network administrator should consider is the aftermath of the solution. Many instances can be found where a certain solution to a problem has triggered problems in other parts of the system or network. This cascading effect of a solution requires to be monitored and checked. For example, a user may report a system communication problem and a solution can be provided by resetting of the network cables. The local problem of the system not being able to communicate might be solved, but the system might still not be able to connect to some other parts of the network. Such rippling effects of a solution require attention from the network administrator. Proper implementation of a solution, confirming its working and nullifying all side effects of a solution completes the solution phase. 11.2.8 Documenting the solution After a problem reported has been solved and work is back to normal, it is the job of the network administrator who solved it to document it properly for future use. The fact that certain problems might recur after some period of time, a new network administrator might face a problem already solved earlier, and so on are the reasons why documentation is necessary. An organisation benefits from proper documentation of troubleshooting when there is a change of hands with network administrators, saving on time by not going through the whole process after an apt solution is found by one. It even benefits when a certain solution has faded in the memory of a network administrator. Proper documentation for each aspect of troubleshooting is as good as providing an appropriate solution on time. 11.3 Common connectivity issues in a network The common connectivity issues in a network can be of two types network failure due to physical problems such as device or cable issues and logical problems such as invalid IP addresses or VLAN problems. 11.3.1 Physical issues Cabling problems are the most common physical issues. Visual indicators such as link lights, activity lights and collision lights can be used to fix these problems. A few common problems that a network administrator fixes are: Recognising abnormal physical conditions To be able to recognise deviations from the normal, a good knowledge of what is the normal is essential. If the user is not aware of the default information, then it is not possible to alter or reset details. A user should be aware of certain issues in order to spot a problem. These issues are as follow: Authentication takes more time. More errors are logged than usual. Printing is taking more time. Connecting to a network is getting slower. Connections to resources are being lost. Isolating and correcting problems in the physical media Network cables are the most vulnerable resources in a network. They end up leading to a whole range of problems and can get fixed easily. The cable which is at a high risk of problems is the cable from the workstation to the wall jack. A solution as simple as plugging it back can solve network problems at times. If the problem is not solved, try another cable and try with other cables till contact is established. Cable problem Probable solution Communication in the entire network is down. Check if the cable is intact. The point at which the cable has been damaged should be reconnected with a new cable. The new UTP cable is not enabling network communication. The network is working with the test cable. The new UTP cable might be a crossover cable. Test the connections with a cable tester and replace the UTP cable if not fine. A system was moved to a new location and is not able to communicate now. The system is working properly. Cables might be damaged during transit. Replace the old cables with new ones for proper connection. Table 11.3: Common cable problems and probable solutions. Crosstalk When adjacent wires interfere with a certain wire system it is called crosstalk. The first indication of crosstalk is signal degradation. Using another cable type with multiple layers of shielding is the best solution for this problem. Nearing cross talk Issues in network connectivity occur when a cable wire causes electromagnetic interference in the wires adjacent to it and releases a current. This point has the strongest possibility of cross talk and is generally present in the first part of the wire which is connected to a connector, switch or NIC. Nearing crosstalk helps to measure this type of cross talk. Attenuation Signals degrade as the distance they travel increases, this is called attenuation. If a user finds it difficult to communicate with system at quite a distance, then the maximum cable length for that type of cable would have exceeded. In such a case, a repeater can be used at some point in the cable to reamplify signals or a different cable type can be used. Collisions Data collision is a common issue when there are many systems in a network. Data packets travelling across the network collide with each other affecting network performance. To resolve this issue network hubs can be replaced with switches which has each port on the switch owning its own network segment ensuring that data does not collide. Shorts A network short leads to network downtime. Using a cable analyser to identify shorts can reduce the problem of connectivity and shorts. Open impedance mismatch (echo) Network signals bounce causing communication problems due to high impedance. Signals bounce because of miswired cables or incorrect connectors. Recrimping of the cable is a good solution to avoid high impedance problems. Interference Network cables experience signal interference from external components like power cables, backup lines, etc. Network cables should be laid away from agents that can interfere with its signals. 11.3.2 Logical issues Logical issues have simple solutions but can lead to huge problems if not addressed properly. A few logical issues and solutions are listed in this section. Port speed and duplex settings Confirming that the speed and duplex settings of the network card are set correctly avoids problems for systems connecting to each other across a network. Incorrect VLAN Communication between systems across VLANs is not possible unless there is routing done between them. Therefore placing systems corresponding to the VLAN is important. Incorrect IP address If IP addresses of systems a
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment