Once you develop an understanding of the
operating system and networking, you will generally know how things are
supposed to work, and you can start to troubleshoot abnormal behaviour. Unless
it is obvious, the first thing I usually do is to find a way to reproduce the
problem, so I can step through and debug it. Then you can start to think of
possible issues, which would cause the problem, then start eliminating them
till you find the source. I like to rank these in most likely to occur based
off experience, or easiest to check and eliminate first, then move to the more
complex issues.
The key to making informed decisions and
quickly troubleshooting issues is having extensive monitoring in place before
catastrophe strikes. Learn how to configure and use host monitoring and
alerting system software, like Nagios, which will allow you to see if a machine and
services are responding as expected. Learn how to configure and use a resource
graphing system like Ganeti for seeing what systems are doing. These monitoring and
graphing tools will allow you to correlate errors with resource state and
capacity. I also suggest that you collect all syslog messages on a central
server and run some type of analysis on them. Something as simple as a central
log host with tenshi software works well. Or go the extra mile and use something
likelogstash to get a visual interface, where you can graph, and search
for interesting events. Monitoring will give you the situational awareness
needed to quickly diagnose issues and recover quickly.
Seems simple, right, but as a sysadmin, you
will use a text editor extensively. I often have one open at all times and use
it as a type of scratch pad for copying error messages, commands, and other
items. No matter the editor you choose, learn the keyboard shortcuts, and use
them. I also recommend, using an editor that has good regex support, so that if
you need to do a complex search and replace, it is easy.
Hopefully you found this episode useful. It is
a lot to take in, but I just wanted to put something down, and show the
direction of where the site is headed. In the coming years, I plan to create
many episodes on each of these topics. So stay tuned, and as always, if you
have any feedback or suggestions place shoot me an email.
This comment has been removed by the author.
ReplyDeleteAwesome post
Deleteawesome dude.........
Delete