Problem Solving (aka Troubleshooting)

Once you develop an understanding of the operating system and networking, you will generally know how things are supposed to work, and you can start to troubleshoot abnormal behaviour. Unless it is obvious, the first thing I usually do is to find a way to reproduce the problem, so I can step through and debug it. Then you can start to think of possible issues, which would cause the problem, then start eliminating them till you find the source. I like to rank these in most likely to occur based off experience, or easiest to check and eliminate first, then move to the more complex issues.

The key to making informed decisions and quickly troubleshooting issues is having extensive monitoring in place before catastrophe strikes. Learn how to configure and use host monitoring and alerting system software, like Nagios, which will allow you to see if a machine and services are responding as expected. Learn how to configure and use a resource graphing system like Ganeti for seeing what systems are doing. These monitoring and graphing tools will allow you to correlate errors with resource state and capacity. I also suggest that you collect all syslog messages on a central server and run some type of analysis on them. Something as simple as a central log host with tenshi software works well. Or go the extra mile and use something likelogstash to get a visual interface, where you can graph, and search for interesting events. Monitoring will give you the situational awareness needed to quickly diagnose issues and recover quickly.
Seems simple, right, but as a sysadmin, you will use a text editor extensively. I often have one open at all times and use it as a type of scratch pad for copying error messages, commands, and other items. No matter the editor you choose, learn the keyboard shortcuts, and use them. I also recommend, using an editor that has good regex support, so that if you need to do a complex search and replace, it is easy.

Hopefully you found this episode useful. It is a lot to take in, but I just wanted to put something down, and show the direction of where the site is headed. In the coming years, I plan to create many episodes on each of these topics. So stay tuned, and as always, if you have any feedback or suggestions place shoot me an email.


3 comments:

Thank You:)