In-Depth

Your IT Operations Guide

Running behind? Too much to do? Worried about the future? Take a breather and consider how to do your job better. This report shares 10 best practices that will put you and your IT staff ahead of the next fire.

It’s time to admit it. No matter how much pride you take in your job, there’s always room for improvement. As IT professionals, we tend to focus on technology when it comes to getting the job done. In fact, that’s what most of us would say we’re known for. However, there’s much more to IT than just the management of hardware, software and network devi.

I’ve worked for many different organizations in my career (sometimes as a consultant and sometimes as an employee). I’ve had the benefit of seeing many different IT groups in action. Some operated like well-oiled machines. Everyone knew what was going on and worked toward the same goals. Others worked as if gears were out of alignment. Tasks weren’t synchronized and one cog had no idea what the other was doing when it came to work like deploying a new server or moving a Web site. Simple issues would become critical because nobody dealt with them in a timely manner. Everything was an emergency, and IT spent most of its time trying to stay “afloat.” Don’t get me wrong: In general, the IT staff was dedicated and worked hard. However, a lack of structure in these companies was the root cause of many IT-related problems.

In this article I share 10 best practices that will help you improve efficiency in your environment. If you’ve never developed a guide to document the processes and procedures of your operations, consider this a starting place and a source of ideas. You can tweak and tune these tips based on your experience and your environment. Whether you work as a member of a team or the sole person in your IT department, I think you’ll find the advice useful.

1. Talk More:
Communicate with and Educate Your Users

I once met an IT staffer who actually said, if a user’s machine is infected by a virus, “That’s their problem.” Shortsighted, to say the least. The fundamental purpose of an IT organization should be to help the business meet its goals (whatever those may be). Therefore, it’s of paramount importance that IT staff communicate with all areas of the organization. All too often, it seems like IT departments work in a vacuum, handling requests without seeing the big picture. Gaining some valuable insight about a marketing program over lunch or when passing someone in the hallway might be very useful when you’re planning for server capacity in the future.

Similarly, training users can have a major pay-off. Wouldn’t it be great if you could get everyone in your company to help you do your job? Think of all the time you’d save if people cleaned out their own home directories periodically and did their part to ensure their files were being backed up. If you take the time to teach a user to perform common tasks without your assistance, it can be a great investment in the end.

Common operating systems and applications usually have much more capability than most users take advantage of. Show users the benefits of sharing documents (using Microsoft Word’s Revision and “Track Changes” features or the use of Public Folders on your Exchange Server) and help them understand the benefits of using the company’s intranet to make information more easily available. Granted, there will be many users who just won’t get it, but others will.

Although it can be painful, it’s always a good idea to get feedback from users. You might know that you’re doing a great job, but what do your “customers” think? Soliciting feedback can be difficult, especially when you have to read negative comments. However, users will generally feel better that you care (“… but thanks for asking!”), and you might gain some valuable insight into what’s really important. Imagine, for example, that your users really aren’t concerned about disk quotas (something you’ve spent a lot of time trying to administer); but it really bothers them when a virus scanner runs every time they log in. In this case, there may be a quick and easy way to improve performance and alleviate this point of pain.

We’ve all come to expect information to be readily available. How many times in the last two years have you picked up the phone to call a vendor to create and mail to you a floppy disk containing some drivers for a network card? This was a common practice just a few years ago. Now we expect all of our vendors to have easy-to-use Web sites that let us serve ourselves. The result is a much more efficient method for getting what we need (plus, it allows technical support staff to focus on real issues). The same should be true for the IT department’s clients. If someone complains that a machine isn’t working properly, make it your problem—and be sure to follow-through to make sure it’s resolved. Keep users informed through the use of an intranet site and, whenever it’s absolutely required, e-mail. Make sure the people you support know where to get information and that it’s kept up to date. When network problems or other issues arise, they should feel confident that they can get the latest information from the site.

Best practice: Take the time to understand the needs of your users and to provide them information about the IT department. Put an emphasis on self-service for common tasks and seek feedback. In many cases, a little information is all that users need to improve productivity and to cooperate with IT.

2. The Grind:
Document Regular Tasks

Have you ever gotten that sinking feeling when a manager asks if you’ve verified backups or checked the configuration on a critical server? If so, you know that it’s difficult to remember the day-in, day-out tasks. You need to develop and maintain checklists for common operations. All too often it’s easy to forget some “little detail” that’s going to lead to a loss of productivity or repeated calls to the help desk.

Here, I present a few suggestions for the types of tasks you should be responsible for performing regularly. Remember, though, that every environment is different and that you should design your own lists based on your priorities.

Regular tasks are the types of things for which you’re always responsible. In many cases, the exact nature of these tasks is difficult to predict. For example, if you’re responsible for user support, it’s anyone’s guess as to what wacky problems you’ll come across on any given day. If you could predict server failures, you could probably earn a lot more money as a professional IT psychic, yet any IT group has many jobs that are important to perform on a regular basis.

How To Prioritize

I’m willing to bet that most IT professionals reading this article are overworked. That is, no matter how much effort you exert, you’ll find that there are still more tasks to perform. And everything is prioritized as “critical,” whether it’s a VP’s mouse that “sometimes skips” or an internal server that has intermittent slowdowns. In some ways, that’s good; it keeps you on your toes and focused on the job. Assuming that all tasks are important, though, how should you decide which ones to do first? A good way to look at them is to determine the value of the tasks and compare them with the effort required to complete them. Figure A provides an example.

Prioritizing
Figure A. How to graph value vs. effort for the IT tasks you face in any given time period. (Click image to view larger version.)

All too often, we tend to work on easy tasks first, things like defragging hard drives and installing software (though they’re tedious and not the most rewarding). Or we choose to react to what seems like the immediate problem—an upset manager, for example. Given that you have limited resources to get the job done, you must figure out what’s important.

A good example is performance optimization (Figure A). Although the job will probably never be complete (performance could always be “better”), you may be able to make more efficient use of your investments with just a few mouse-clicks. This would have a lot of value, but would require little effort (that’s good and should have a high priority). However, you’ll reach a point where you would have to exert significant effort to receive a marginal improvement in performance. That would be the opposite: low value, high effort (that’s bad and should be a lower priority). Overall, use this (or some other) methodology to determine what you should be working on and compare that with what you are working on.

Some are things you need to do every day. For example, you should often review the status of all open help desk requests and other issues. Other tasks can happen less frequently. It might be important to verify that the weekly full backups have been performed on schedule and without unexpected errors. Monthly tasks usually focus on higher-level analysis and management of your network environment. These are crucial for ensuring the overall efficiency of the hardware, software and networks you support. Table 1 provides some examples of tasks, along with a suggested frequency. Use it as a baseline to create your own list of regular jobs.

Task Frequency Description Tools Estimated Duration
Review the status of help desk requests. Daily Review reported issues and update all "open" issues that haven’t been modified within the last three days. Help desk software/tools or e-mail system. 15 minutes
Analyze disk space usage reports for all servers. Weekly Determine trends in disk space usage and predict when new capacity might be needed. Performance Monitor (% Free Space Counters); Excel spreadsheet, containing graphs of disk space use over time. 30 minutes
Review all scheduled jobs on SQL Server 2000 machines. Weekly Verify that all jobs are running properly. SQL Server Enterprise Manager; custom SQL scripts; Event Viewer. One hour
Review audit logs. Weekly Look for suspicious patterns of activity (such as failed logon attempts or access to sensitive files). Manual inspection using Event Viewer filters or use of third-party tools. 30 minutes (for eight servers)
Review user accounts. Weekly Ensure that all accounts are configured as required; remove unneeded accounts. Manual inspection, using Active Directory Users and Computers tool. 30 minutes
Review anticipated IT-related changes. Weekly Plan for expected moves, adds and changes for servers and user workstations. E-mail, intranet. 15 minutes
Update project status. Weekly Update status information for all open IT projects. Project tracking tools, spreadsheets and e-mail. 15 minutes
Verify licensing. Weekly Review purchase orders and compare with software inventory. Excel, SMS or third-party tools; Windows License Manager. 30 minutes
Review and update configuration documentation. Monthly Ensure that all configuration documentation is up to date. Word, SMS or third-party tools. Two hours
Verify backups for an entire server. Monthly Verify full server restore from tape; record time taken and any problems encountered. Backup utilities. Three hours
Verify disaster recovery procedures. Quarterly Attempt to rebuild critical server, assuming that all hardware and data is lost; help test disaster recovery processes. IT intranet for disaster recovery process information; Word for documentation of results. Four hours
Table 1. A list of regular IT tasks can remind you to do the essential, regular chores every IT staff faces.

Depending on the size of the environment and the number of IT staffers you have, you’ll probably want to delegate responsibilities to specific people. Be sure that everyone knows what they’re responsible for and when the tasks need to be performed. Also, the amount of time that’s required for these tasks will vary based on the size of your environment. Note, however, that most don’t take very long—it’s just important to be sure you set aside the time to do them. A few minutes of dedicated effort here and there can really help ensure that your environment is working optimally. Finally, keeping such a checklist can help you determine where your time is going and provide hints as to what might need to change.

Best practice: Develop a checklist for the tasks that you know you need to perform regularly. Use this information to find areas for improvement and to make sure you don’t ignore important tasks.

Rainy Day Activities for IT

A downturn in the economy can bring with it some sweeping changes in an organization. Those $4,000 routers that you may have been able to purchase on your own a few months ago may now require the approval of a half-dozen managers within your organization. Many IT groups are settling into “maintenance mode,” as their companies aren’t hiring new staff or buying hardware as quickly as before. And, although many people debate the extent of the downturn and its real effects, the fact is that many companies have scaled back on spending.

So what can you do when management has told you you can’t purchase any new hardware, software or network devices—and, by the way, your responsibilities and goals haven’t changed? Well, it’s a different mindset; but one potentially beneficial approach is to use this time to do all of the things you didn’t have the opportunity to do before. When things are slow, there’s no excuse for not working on good maintenance practices. Here are some ideas:

  • Rethink your strategies. Decisions are often made with a focus on speed of implementation, based on the best information available at the time. For example, someone might ask you to set up a new installation of SQL Server 2000 for use by the marketing department. Although a new server might get the job done, it’s not always the most efficient method. Usually, a collection of fewer servers is easier to manage. Remember the “Customers” database that was expected to hit a gigabyte but is still only 100MB? Move the database to another SQL Server machine. You’ll save the cost of the new server and it will make the management of backups, performance and other common tasks much easier. Similarly, look for areas in which you can do more with fewer resources—they’re out there, but it just takes some time, effort and skill to find them.
  • Make an IT wish list. Take the time to determine what you’d really like to do when things turn around. It’s great to be able to back up your ideas with facts. For example, you could state, “The average development user requires 500MB of disk space in his or her home directory.” Use that figure to make decisions about how much disk space you might need in the future. If you need a lot, it might be cost-effective to invest in a disk array or a network-attached storage device.
  • Live for today, plan for tomorrow. Although a downturn in the economy can really reduce the pace of business, it’s important to realize that there will be a turnaround. Your job is to ensure that your company is in the best possible situation when that occurs. We just don’t know how quickly that will happen or when it will begin. Use this time to make plans for the future, including estimates on how you’ll deal with rapid growth (if that’s expected). Often, planning for the future can help you make better decisions when building and managing your current environment.

With any luck, the amount of time you spend in planning will pay off. This may sound overly optimistic, but think of an economic downturn as a different kind of opportunity instead of just as a setback!

3. Get in the Fast Lane:
Monitor and Optimize Performance

A fundamental task for IT staffers is to maximize their organization’s investment in hardware, software and network devices. If just anyone could deploy a Windows 2000 Active Directory domain controller in its optimal configuration, it’s possible that no one would need you at all! A great way to maximize investments in client- and server-side hardware is to implement routine performance monitoring and optimization cycles. The process should include the following:

  • Establish a baseline.
  • Identify a bottleneck.
  • Make changes in an attempt to improve performance.
  • Remeasure performance and compare with the baseline.
  • Repeat, as desired.

Although this might seem like a lot of work in theory, in practice it can take as little as a few minutes. Contrary to what some vendors would want you to believe, you don’t have to invest in hundreds of thousands of dollars of software just to manage a few servers. Figure 1 provides a report generated by Win2K’s performance tool. The graph (which represents information collected over several hours) can provide many valuable insights. For example, the chart shows the amount of memory SQL Server was using on the server throughout the day as well as information about the number of users connected to the server.

Logged Performance Data
Figure 1. A view of logged performance data on a Windows 2000 Server. (Click image to view larger version.)

Always demand details. When you’re troubleshooting problems, you might hear comments related to the reliability of a machine or a systems administrator might claim that a machine is “overloaded,” and that’s the reason for the slow performance. Don’t accept such vague answers. If I told you my car was “broken,” you’d probably want details. Does it start? When did the problem begin? Why don’t other similar cars have this problem? Demand the same from IT staff. What does “overloaded” mean? Are we talking about excessive CPU utilization? If so, during peak periods, we should see sustained CPU spikes. A simple Performance Monitor measurement would prove or disprove this theory. You might find, for example, that CPU usage is low when the server is slow. In that case, you’ll need to look for other bottlenecks, such as issues related to disk I/O, memory I/O (paging), network utilization and so on. Based on these results, you’ll be able to make much better decisions on upgrades or the placement of critical applications. You might find, for instance, that the engineering department really doesn’t need a brand new server to run a defect-tracking application.

Best practice: Take some time to get familiar with performance logging and monitoring tools, as well as performance methodology. In the end, this will help you maximize your IT investments and better understand your server bottlenecks.

4. Plan for the Worst:
Develop and Test Backup and Recovery Procedures

If you were asked to list the top 10 IT tasks, backup and recovery would probably be two of the first things you’d mention. They’re also probably close to the top of the list of the most annoying, tedious and mundane IT tasks. Nevertheless, backup and its not-so-distant cousin recovery are truly important.

The foundation of a good data protection plan is based on determining your recovery requirements. Find out what data needs to be stored, why it must be backed up and how often it should be backed up. A simple table like the one listed in Table 2 can help.

Data type Backup frequency Acceptable down time (recovery window) Acceptable data loss Fault tolerance requirements Notes
User home directories Nightly Two business hours One day Survive server disk failure. Data is stored on multiple file servers.
Marketing Shared Data Nightly One business day One day Survive disk, network or server failure. Large volume of data is changed frequently.
Engineering defect-tracking system Nightly One business day One day Survive disk failure.  
Sales Database Application Hourly One business hour One hour Survive disk, network or server failure. Application runs on SQL Server 2000 database.
Table 2. Setting data protection levels will help your group design an optimal backup and recovery plan.

When you start with a recovery plan that includes well-defined requirements, you’ll probably be able to come up with some creative ways to back up your data. For example, if Server 1 must be backed up hourly, but only the latest backups must be retained, you could simply copy the differences in the data to another network share somewhere. You could use this share as a backup device, instead of bogging down your tape machines. Furthermore, if the server must be able to survive a disk failure, you could simply implement RAID technology (such as disk mirroring or disk striping with parity) on the disk systems. Once the plan has been defined, make sure you get sign-off from the appropriate people. Everyone should be involved in this process so there are no surprises. For example, your vice president of sales might not think that losing two hours of data is reasonable until you explain the potential costs of better data protection.

Many backup plans tend to be ad hoc. That is, when a new server goes up on the network, systems administrators just add the entire machine to the backup schedule. While that may get the job done, it also backs up a lot of information that you may not need (like operating system directories if you don’t plan to restore the entire OS from tape). I’m willing to bet you’ll find many of your servers overprotected when it comes to backup and recovery. If you take some time to back up only what’s important, you can make much more efficient use of tape, disk, network and other resources.

Now comes the harder part: Don’t forget to practice recovery operations. You’ll generally learn a lot by going through a simulated failure. First, you’ll know exactly what you need to do when an emergency arises—you’ll be thinking a lot more clearly during this “rehearsal” than when your CTO is breathing down your back. Next, you’ll know exactly how long it takes to recover the systems. If you know that the process will take four hours, for example, this should really help others in the organization react accordingly (by using alternate systems, canceling sales calls, or whatever needs to be done).

Best practice: Create and define recovery requirements for all of the data that the IT team backs up. Then, based on these requirements, review your backup strategy and implementation. Once that’s in place, be sure to go through regular full dress rehearsals to make sure you can quickly and reliably restore data. Remember, the goal is to meet your business needs for data protection.

5. Keep Up with Technology:
Work on Training (and Cross-Training)

Modern hardware and software can be complicated tools, and rarely do we get a chance to understand and implement all of the features. This is especially true when it comes to feature-packed operating systems like Win2K. Fortunately, most techies enjoy the challenges and benefits associated with learning something new. That’s where training (and cross-training) can be a win-win situation. No, I’m not talking about excruciating four-hour workouts, here (although some IT staffers might benefit from the exercise). IT-related training can take many forms. Traditionally, companies would send employees to instructor-led classes (or larger organizations would have instructors visit their facilities). This can be somewhat expensive and disruptive to business tasks. (Who among us wouldn’t be missed if we took an entire week off?)

Fortunately, we have other options. Being an MCP Magazine reader, there’s no doubt you’ve taken advantage of many of the print and electronic technical resources that are out there. Books and Web sites can be great resources for learning; sometimes all it takes is a good article to help you add a new technique to your IT bag of tricks.

Cross-training also presents many potential benefits. Staff members who are experts in an area transfer their knowledge to co-workers. Often all it takes is free pizza to get parts of the IT group together over lunch to learn some new technical topic. Even if it doesn’t pertain directly to their jobs, this can greatly help IT staff stay motivated and keep the gears in their heads turning. Not only is it inexpensive, but it’ll help staff develop their “softer skills” (like those related to presentation and communication of technical information). Cross-training can really help foster a sense of teamwork in an environment (for example, systems administrators might better learn what black arts the SQL Server DBAs practice).

Perhaps the most important thing about training is to be sure to add it to your list of things to do. If you can’t afford to take days off regularly to attend training classes, be sure that you set aside at least a few hours a week to read through articles or portions of books that you think might be valuable. Also be sure to keep track of new technologies that you want to learn.

Best practice: Set aside time for training and cross-training. Whether you manage a staff of 50 or just yourself, be sure that you’re constantly learning new and useful technologies. Remember, in the IT industry, if you’re not moving ahead, you’re falling behind!

6. Knowledge is Power:
Understand your Environment

Many of us manage IT environments reactively. That is, IT waits until problems are reported by users before they take care of them. For example, you might depend on users to report an inability to print documents before you check the print server to ensure there’s sufficient disk space for the spooler to operate. Then IT staffers work to resolve the problem as quickly as possible, often working under time pressures. A much better scenario would be one in which you anticipated the problem before it happened.

In many cases, an ounce of IT-related prevention can save many pounds of IT-related cures. For example, if you determine that you’ll need new disk space for one of your file servers within a month, you’ll have time to find a good deal on the necessary hardware, install the drive and move directories (if necessary). You might also choose to implement disk quotas or to have a few users clear out some disk space. On the other hand, if you wait until your users are complaining that they’re getting “out of disk space” messages, you won’t have as much time to solve the problem. This will undoubtedly lead to unhappy users and a tough job for IT to manage.

There are other benefits to tracking trend information. In general, the more you know about your environment, the better. Suppose some salesperson introduces you to the miracles of Storage Area Networks (SANs). Your job is to determine if a SAN would save your organization money over time. Based on trend information you’ve collected (and on some educated extrapolations), you could determine how much disk space you’ll need in the future. Then you could figure out whether or not it’s worth the time and expense to implement a shared-storage solution. Figure 2 shows a simple Excel spreadsheet that includes disk storage information for a number of servers. You can easily collect the information needed to identify trends through the Computer Management tool in Win2K.

Computer Management
Figure 2. You can track disk usage over time using a simple Excel spreadsheet and Windows 2000’s Computer Management tool. (Click image to view larger version.)

Best practice: Take the time to understand and track various aspects of your environment. By monitoring disk space usage, network utilization and specific applications, you might be able to address issues before users notice them.

7. The Only Constant is Change:
Implement a Change Control Log

I’ll bet that you’ve talked to a user before whose machine “suddenly stopped working.” When asked what was changed, you’ve gotten a simple, “I haven’t done anything.” Later, you drill down into the details and find that a dozen or so high-end games have filled up the hard disk and three instant messaging clients are bogging down the machine. Wouldn’t it have been much easier if you had known this up front?

The same applies for server management. You should keep track of what changes are implemented on servers. For example, a change to the IP address of a Web server (with a corresponding DNS change) might not seem like a big deal to a network administrator. But, if a poorly designed application depended on a hard-coded IP address, it might suddenly break “for no reason.” Tracking down the change could take hours, especially in larger environments. If, however, you had a single place to look for this information, you could quickly determine what has changed recently. A simple Change Control log might look like the one shown in Table 3.

# Date Time Affected machine(s) Change Performed By Notes
1 10/10/2001 9:45 a.m. Web01, Web05 Added a new virtual directory called "ProjectA." Anil Desai Added to support needs of Marketing department.
2 10/07/2001 7:00 p.m. QATest01 Added 256MB RAM (now has 512MB total). Jane Admin Added based on performance issues.
3 10/05/2001 11:15 a.m. Web03 Restarted WWW service. Joe Admin GUI showed service as "Started," but machine was not responding to requests.
4 10/01/2001 3:00 p.m. Web03 Deployed new COM objects for Web site. Jane Admin Installed per instructions from Engineering group.
Table 3. A sample change control log.

With this type of information, you can find trends. For example, does Web03 seem to have problems during certain times of day? Did the issues crop up after other changes were made? Having this data can save hours when troubleshooting new problems. Time after time, I’ve found this type of information to be invaluable when working on many types of server problem.

If you’re dealing with a larger environment (one that supports hundreds of servers), you might want to look at asset management or help desk solutions that will help you keep track of changes and machine configurations. Many products are available for tracking this type of information and for making the results easily accessible (usually through a Web browser). One word of advice, though: Spend most of your time developing content, not the presentation. An IT site just has to provide useful information—people won’t care about all the animated GIFs you’re able to place on the site if they can’t find what they want.

Best practice: Start documenting the configuration of the servers that you support and implement a configuration-management policy. Be sure that all documentation is kept up to date and that IT members have easy access to this information.

8. Build an IT Robot:
Automate the Boring Stuff

When you have breathing room in your environment, you should determine which tasks might be good candidates for automation. Pick simple, repetitive tasks that require time but no manual judgment as a priority. For example, if you routinely move SQL Server backups from one machine to another, you might want to implement the use of automated file copy scripts. Through the use of the standard Xcopy command (or the much more powerful Win2K Server Resource Kit RoboCopy utility), coupled with the Windows Task Scheduler, you can make sure that the copies run automatically. Then you can simply verify that the file copy operations have been performed. If that’s too much trouble (OK, maybe now you’re getting spoiled), download a simple utility that will send you an e-mail message when the job finishes.

Another example is the common task of restoring files. Perhaps your backup utility provides you with a method to script common actions. Be sure to look into this feature. Or if you find yourself frequently setting and resetting permissions on directories, write some batch files that do the job for you. Win2K includes scheduling capabilities from the command line (using the “at” command) or via the more user-friendly “Scheduled Tasks” Control Panel item.

Best practice: Identify some common, repetitive tasks that would be good candidates for automation and find a way to ease the burden through the use of scheduled scripts and batch files.

9. Stay Legit:
Get Current on Licensing

Many organizations have started seeing notices that range from subtle reminders to outright threats regarding the licensing of the software they use. Although software audits are still a fairly rare occurrence, it’s important to make sure all of your machines are compliant with licensing agreements. Because this is the responsibility of IT, that includes checking users’ machines to make sure they haven’t installed any unapproved or unlicensed software packages.

There are many ways to go about software auditing. Smaller organizations could implement some simple batch commands in a logon script for writing the contents of the start menu (or Program Files directory) to a text file on a server. You could then populate an Excel spreadsheet with the necessary information. Larger organizations should consider implementing third-party tools, such as Microsoft’s Systems Management Server (SMS). Such tools will be able to help inventory hardware and software stored on machines regularly and can store the results in a relational database system for better reporting.

Many people fear software audits because they’re worried about what they’ll find. No one wants to go to the CFO and ask for $20,000 for software that’s already deployed. However, there are potential benefits to auditing software usage. That is, you might actually experience cost savings. Suppose you find that the marketing department has individually purchased many different copies of a popular image-editing application. You might be able to combine all future purchases and get a volume discount from a preferred vendor. Also, once you have a better handle on what software is out there, you can set up an intranet page that will help users easily find updates, patches and other useful information.

Best practice: Get serious about software licensing. In addition to being able to sleep better at night knowing that you’ve done the right thing, you might find some hidden cost savings and you’ll be able to better support your users.

10. Put It in Writing:
Document Your Environment

If there exists a single task that most IT staff avoid like the plague, it must be documentation. Granted, it can be difficult to sit in front of Microsoft Word, typing out things that you think everyone does or should know, anyway (and, I’m not just saying that because I’m working on this article right now). However, having accurate, timely configuration information can be helpful, especially in larger environments.

Technical documentation has two important features: scope and audience. You need to determine what you want to document and how detailed the documentation should be. In general, you want your comments to be “strategic,” instead of “tactical.” Strategic documentation tends to be of a higher level. For example, you might state that employee home directories are stored on the “Users” share on Server12. If someone needs more detail (like information about exactly which users have home directories or the permissions settings on the directories), he or she should be able to go to the server to find that information. Next, you should determine the intended audience: Are you writing a SQL Server configuration manual for systems administrators who are unfamiliar with relational databases or are you writing the document for DBAs? Also, wherever possible, avoid repetition by linking to other documents, articles or Web sites you use for background information.

Documenting your environment also involves developing processes. It’s often useful to have well-defined procedures for handling issues related to help desk escalations or reacting to critical server problems. A simple example is shown in Figure 3. Here, a basic flowchart documents the steps that should be performed before an issue is escalated. If an issue does, indeed, need to be bumped up, it outlines the types of information that should be provided to the next level of technician.

Help Desk Issues
Figure 3. A sample help desk issue resolution flowchart. (Click image to view larger version.)

OK, so I’ve done nothing to convince you that writing and maintaining documentation will be fun. But, wait, there’s more! Keep in mind that most documents will never be “finished.” Factor in time to maintain your documentation. If your configuration information refers to how your servers were configured seven months ago, no one’s going to use it. Therefore, keep the details in the documentation at a high level and update the documentation whenever important changes occur.

Best practice: Create high-level documentation for the configuration of your network environment. This information can be incredibly helpful when others need information and you’re not available. Also, get in the habit of updating documentation, whenever necessary.

Making the Best of Your Environment
If you felt overworked before you started reading this article, I’m afraid I probably haven’t done much to convince you otherwise. Remember: The point of all of the advice I’ve presented is not to make your job harder or to add more work. In fact, it’s quite the opposite—to ensure that you’re working as efficiently as possible. We covered many different types of tasks that are required in IT environments. If you haven’t yet implemented all of these ideas, don’t worry—few people have (myself, included). If you’re routinely working 14 hours per day and still barely find time to do the necessities, it’s clear that something needs to change. An easy option might be to hire more staff (assuming your budget allows it), but a more realistic one might be to take a long, hard look at what you’re actually doing with your time. You might find that you’re spending 80 percent of your time doing the 20 percent of tasks that could be postponed or overlooked. Worse, the problems you’re not looking at might be the cause of all of the rest.

Make no mistake about it, for most environments, implementing the operations practices I’ve covered will take time, resources and effort. However, the goal—to improve overall IT operations—should be worth the investment in the long run. We all want to build a better, more manageable environment. Now is as good a time as any to get started.

comments powered by Disqus
Most   Popular