Your Annual Networking Review
From our resident network pro: A network ‘health' checklist that just might save the day.
- By Wendy Chretien
- 06/01/09
AH, SUMMERTIME. With fewer users around, this is the campus network professional's chance to take a step back and verify that networks are on sound footing and can fully support converged, delay-sensitive traffic. Yet, while many of us perform annual reviews of our staff, the sad truth is, we don't do the same for our networks. Entropy (or perhaps evolution?) is a tough adversary; it's so gradual that we don't see it occurring. But as we add numerous services and components to deal with users' needs (especially if those changes are made in emergency mode), the unintended result can be tangled, complex conglomerations. And that all but ensures future problems, not to mention making troubleshooting those problems a nightmare. So what's a campus network pro to do?
Let's Get Physical
A thorough approach starts at the foundation-- the physical layer-- which is the root cause of a large majority of network problems. Precisely how long has it been since you've assessed your campus network equipment rooms and cabling infrastructure? And I don't mean just the data centers, but also all those secondary "closets" (the cabling industry standards organization BICSI calls these technology rooms, or TRs) where cables aggregate and connect to network switches.
With 11 campuses across the state nearly all experiencing growth, one statewide community college system we've worked with was starting to experience service disruptions on its networks, but was having trouble pinpointing the cause or causes. The college system's administrators decided to have a third party come in to evaluate its situation. Among the key findings of the assessment was the fact that equipment rooms had become trouble spots due to the gradual addition of cables and equipment over a period of years. For example, perhaps the physics department at one campus was growing and suddenly needed to outfit part of one wing with new technology systems and/or labs. This led to the need for additional network cabling and switches in the equipment room serving that area. Often there was too little physical space in those rooms to start with, so this made matters worse. On top of that, the additional heat generated by the new switches-- especially those equipped with Power over Ethernet (PoE) ports-- would often cause the room temperature to rise significantly, shortening the expected lifetime of that new equipment. In many cases, a lack of physical space contributed to tangles of cabling (both network and electrical) lying on the floor. At some locations, overly long patch cables were used (likely just because they were readily available right at that moment), resulting in a nearly impenetrable mass of cabling cascading down the fronts of the equipment racks. Some of the rooms ended up being used for storage because they were left unlocked, making network components vulnerable to both accidental and intentional damage. And to add insult to injury, most of the changes made over the years were not documented, leaving network managers in the dark about the current situation.
The resolution required rethinking the physical structure altogether. The technologists determined they needed to set standards regarding space and cooling for equipment rooms as well as the cabling infrastructure, and last year began "cleaning up" on a campus-by-campus basis.
To find out if your network is oversubscribed, use network management software to check the utilization for individual switch uplink ports, thus pinpointing potential congestion points.
Check Your Backbone
Another element we too frequently tend to forget about is the capacity of the links among switches within local area networks. For example, have you migrated from 10/100 to gigabit (10/100/1000) switches for end-user connectivity, but perhaps still have just one gigabit uplink per switch, back to an aggregation or core switch? In this case the total user traffic on one switch could climb as high as 48 Gbps, all competing for 1 Gbps on the uplink. Known as "oversubscription," this situation greatly increases the probability of network congestion, which is one cause of degraded voice and video signals. Often an oversubscribed network will work just fine until that new bandwidth- intensive security-camera video application (or whatever new app may come along) hits the network. Then, suddenly, the network is on its knees. Do you know if your campus network is poised to suffer a similar fate? To find out, you can use network management software to check the utilization (percent of total capacity in use) for individual switch uplink ports, thus pinpointing potential congestion points. Thankfully, this type of oversubscription is fairly easy to remedy by increasing the number of gigabit or 10- gigabit uplinks along the congested paths. But first be sure you have enough fiber strands between the switch locations to accommodate those additions.
Operate Thoughtfully
Moving up a step in the infrastructure, what about the operating system (OS) software on your switches (yes, switches; not servers)? How long since that OS has been updated? Newer versions often are developed by manufacturers due to documented and/or potential vulnerabilities in the previous code. But other feature additions or modifications, such as a better implementation of traffic prioritization, may also make upgrading worthwhile.
Another review question: Are all switches on the same version of OS software? Prepare yourself for a surprise, if this hasn't been checked lately. One college (which shall remain nameless) found it had seven different versions of OS software on switches on the same local area network. While this is very seldom the root cause of a network failure, it certainly can make fixing problems more difficult and time-consuming.
Most manageable switches sold in the last few years allow software upgrades to be performed "in band" or over the network itself. This is the most efficient way to upgrade, but it isn't automatic; you still have to initiate the process. Older switches may instead require the "out of band" hands-on approach-- individually connecting a computer via a console cable, to upload the new software to each switch. Unfortunately, this is just as laborious as it sounds. A complicating factor is that many switches must be powered down and then powered back up after an upgrade. So careful scheduling of upgrade installations is important in order to avoid, or at least lessen, user downtime. Oh, and remember to verify that you have the proper software license rights to make those upgrades. Many manufacturers charge for new OS software versions, which is one of the key reasons some institutions avoid regular upgrades. But even if you don't upgrade to the latest and greatest, at least ensure that what you do have is consistent across the organization and makes the grade when it comes to security.
So if it's been a while since you've conducted a thorough exam of your networks, there's no time like the present. And while you're thinking about it, maybe you'll want to put next year's exam on your calendar, too!