Tuesday, December 8, 2015

Managing Troubleshooting
----------------------------



Identify and locate the problem by determining whether the problem is hardware, operating system, application software, configuration, or the user. It is too much easy to overlook the most obvious things when troubleshooting. Always start with the simplest things first. (Is the machine plugged in?)

Ask the following questions:
- What:
What specifically is happening? Have the user walk through what he’s doing when the error occurs. Work directly with the person having problems and monitor in real time rather than getting second- or thirdhand information. Get screenshots or the error messages themselves.

- Who:
Who’s being impacted? Is it one or two users or everyone? Is it a specific sub-classification of users or everyone? Also, is it your production, test, or development system? Never assume that because someone is excited, it must be production. Nothing is more embarrassing then trying to fix the wrong database.

- Where:
Are affected users spread over a wide geographic location or are affected users in a specific city or building?

- When:
How long has this been occurring and has it occurred before? Also, does it happen every time or just sometimes?
If it only happens occasionally, drill down into what’s being done prior to the error. If it only occurred since a recent system change (such as patch, upgrade,reboot), that can be a valuable clue. The question “What has recently changed in the system” is a great question to ask!

- How bad:
Is this a total loss of service where the company is stopped or is it just an annoyance on a seldom-used development system?


1. Examine the symptoms: Take the time to get all the facts when the first signs of the problem are reported. Explore the following questions: Is this happening to one user or is it happening to everyone? Does the problem only happen on one particular system? Does it happen in an application, or is this a system process problem? By gathering as many facts as possible, you can get started in the right direction.

2. Examine the obvious: The seemingly most difficult problems often have a simple source. Don’t overlook the obvious! Even simple things, such as loose power cords, network cables, malfunctioning fans, or a caps lock key can all cause larger problems than you may think. On the software side, make sure that the user knows how to use a particular program. Does the system have enough disk space? Is this a simple permissions problem? By checking the obvious problems first, you can quickly move to more in-depth examinations of the systems that you are checking.

3. Work your way from the simple to the complex: Always start troubleshooting from the simplest systems to the more complex systems. For example, if the problem is reported at a user’s system, start troubleshooting from the user’s system, and then work your way up the chain from the network to the server. By using this methodical practice, you can eliminate the most simple and obvious systems first.

4. Hardware or software: You should also quickly narrow down whether the problem is hardware- or software-related. You will waste a great deal of time and money by swapping and replacing hardware parts if the source of the problem is actually software-related (and vice-versa). Make sure that all of the hardware is operating normally, and that no warning lights, strange sounds, or smells are emanating from any mechanical or electrical components. On the software side, take the time to recreate the problem with the same system. Try the same thing on another person’s workstation to attempt to recreate the problem, and then narrow it down to the server or a workstation.

5. OS or application: After you have determined that the problem is software related, you must again narrow the issue down to either an operating system or application issue. If it is an operating system issue, something within the system itself is causing the problem, such as incompatible versions or conflicting programming libraries. You can easily test application problems by trying to recreate the problem on another machine with the same application.

6. Examine log files: Check all log files for the operating system and applications. Examine the system log file for any warnings or error messages, and check the application logs for malfunctions.

7. Examine configuration: If you have narrowed down the problem to a specific process or application, examine the configuration file to ensure that it has been set up properly. Compare them to configuration files on other servers, and ensure that they don’t contain any errors. If you make a change to a configuration file, remember to restart the particular process or application that you are working on.

8. Use as many resources as possible: When you are stumped, don’t be afraid to use as many outside resources as possible. If you have a maintenance contract with your software or hardware vendor, call them immediately. Utilize resources on the Internet, such as technical and vendor Web sites with searchable troubleshooting databases.

9. Document your solution: After you have solved your problem, document the solution in detail. You may need this information again in the future if the same problem appears, and you can avoid the additional troubleshooting time by referring to your own notes.

No comments:

Post a Comment