Trouble with Troubleshooting?
What, exactly, is “Troubleshooting”? It’s a term we, in technology, use all the time. Definitions vary a bit. Some are more helpful than others.
The most useful definitions describe “Troubleshooting” as a form of problem solving. A process to be followed for identification and resolution of a technical issue. Discovering, diagnosing, and resolving these issues is key. Most importantly, “Troubleshooting” is a critical skill which needs to be learned and practiced by all technologists.
Interestingly, there are very few college or professional courses about developing Troubleshooting skills. If you do a search, you will likely find courses, blog posts or articles about how to find and resolve problems with a focus on specific products or industries. How then, does one learn how to be an efficient and effective troubleshooter?
Find a Mentor
One of the best ways to learn just about anything is to find someone who’s already an expert and learn from them. Better yet, find several of them. Within your organization, figure out who the go-to people are when critical issues arise, and ask to observe, and eventually help, them as they solve problems. Then document the process they use to help others more junior to you. Try to observe different kinds of issues being worked, not just ones you are personally interested in. See where processes are consistent, or diverge, and ask questions (maybe after the fact, so you don’t interfere) as to why certain steps were taken.
Follow a Process
Once you are ready to jump in and start helping, make sure you stick to a process. Generally, you need to start with the most basic steps and progress to more advanced ones as you eliminate options. You know, like all those standard questions you have to answer for front line support on anything from your home internet provider to VMware GSS! They ask the basic questions first to eliminate the most obvious and most common problems. Once you’ve moved past that step, then start digging a little deeper based on the next logical layer and work your way down.
Keep a Log
Write it all down! Or better yet, document it someplace shared by your team. Everything you test, everything you change while you are testing, dates and times, EVERYTHING! This may help you correlate events back to log files as well as help you or your team if the a similar issue occurs again in the future.
Change One Thing at a Time
While you’re running tests and making notes, make sure you only ever change one variable between test executions. Changing more than one configuration, system setting, or file at a time can make your analysis more difficult. And of course, make sure you’ve taken backups or snapshots in advance of making any changes in your environment!
Practice, Practice, Practice!
It’s mostly true what they say – practice will make it easier, but not necessarily perfect, when we’re focused on troubleshooting. What you will come to realize is that, with practice, you will gain confidence in your ability to work your way through issues to a root cause and/or solution. And the more often you can practice and gain experience, the more efficient you will become.
Validate Backups and Disaster Recovery Strategy
Of course, it goes without saying that you should have backups as well as a plan in place to recover from a severe disaster or incident. But how confident are you that you can actually recover from those backups with little-to-no data loss? Your DR plans have to be tested, and should be done on a regular scheduled basis.
For more information on Troubleshooting Processes, watch for Kim or Mandy at your local VMUG UserCon when they present “Troubleshooting: Science or Witchcraft”.