I run a test lab. Setups are mutable from one project to the next, and life is rarely static. Configurations change, and the hundred or so cores in our NOC might not have the same servers running on them from one day to the next, let alone for a week or a month. We’re used to rapid assessment, troubleshooting, and moving on.
Inside one of my servers is a folder of tools, divided into two groups. The first group sets up systems, while the second group is for administrative, post-installation work. Another directory has images, ISOs, and blobs of this-and-that. Some days we code, we configure, we update… sometimes all in the same hour.
Tools are important to us. Fortunately, a handful of useful tools is guaranteed to be in your favorite Unix-a-like distribution, unless it’s a horribly stripped version of Solaris, Linux, BSD, etc. We add to these standards to suggest the 10 most indispensible NOC tools—the ones you need in the network operations center. Don’t logon without them. (See also our companion article, 10 Indispensable NOC Software Tools for Windows Server Administrators.)
Don’t let civilians near these; they’re administrator tools and you can hurt someone if you’re not careful. Few civilians understand the difference between etc and bin, anyway.
It’s not particularly beautiful, and it’s a work in progress, but Nagios is a very useful monitoring kit. In the olden days, it was more painful to use, but today you can download Nagios as a VMware appliance (therefore convertible or runnable on other hypervisors that are compatible with VMware) that’s staggeringly simple to install.
The Nagios system monitors Linux, BSD, and Windows, whether client, server, or appliance. You set triggers on the monitors to send messages (SMS, e-mail, lots of options) when a monitored component (such as disk space) exceeds a threshold you set. A lot of information can be tracked with Nagios, and your first big job is playing with thresholds that suit your organization’s profile. Too tight, and you get volumes of nags per second. Set them too loosely, and it’s not working for you.
You can then use the Nagios reporting and alerts to serve as a baseline NOC management system, where management means reactive response to conditions. Secondarily, Nagios can be used for planning, as it keeps a long history of conditions it finds. Currently it is not able to do things like move around VMs, or yell at your boss for more budget for fatter pipes and bigger routers. But for especially small organizations or branch NOCs, it’s plenty helpful.
Like Nagios, Zenoss is a community-supported management plane with a commercially-backed organization behind it. It also has some 150+ plug-ins for what they like to call “deep-monitoring” of server facets. In our experience, it’s a bit heavier-weight than Nagios, but if you don’t mind the kitchen-sink approach, it’s great for facilities monitoring of a heterogeneous NOC infrastructure (yes, Windows and even Solaris are included).
Not to be confused with Xen, Zenoss is a database-driven product. And that’s always been my gripe with Zenoss for lightweight, fast-on-your-feet work: The database (you can use several kinds) needs to be protected, even mirrored, as it’s the heartbeat of what goes on.
The database centricity also means that you can get great reporting. That is nice if you’re in an audited environment, but a hangover if you’re in constant-change mode (meaning non-production sites). Otherwise, it’s about two-thirds of your tool kit.
Zenoss can be easily coded for if you understand how to use its API, along with REST or XML commands to communicate among monitored devices. The community support behind Zenoss is strong, and often helpful if you shoot yourself in the foot. The theory-of-operation is easy to grasp, but takes a while to get up to speed. If you like quick-and-dirty, go someplace else. If you need concrete and steel, stop here.
We reviewed Puppet-MCollective elsewhere after we saw it in an Ubuntu Server release. The MCollective is a configuration and command-and-control system from PuppetLabs, and is Apache-licensed although included in Ubuntu server editions; commercial support is available.
The Puppet “master” controls its “marionettes” (the “mcollective”) through a series of easily-learned steps. Once prepared, the puppet master can then roll-out VMware (and compatible) Linux instances that also respond to the master, on your own NOC server strata or Amazon Web Services (AWS) at the rate of literally hundreds per minute. Or, if you just need a handful, it can do that, too.
You need to know a bit about how the underlying processes work. A pre-loaded daemon in each instance can serve as a connection-command broker. The commands can be easily filtered, so that you can tell the master to tell all of the web servers in Milwaukee to report about this specific status, or to perform such-and-so command. You can roll out servers to do things like video rendering or DNA segment analysis, then shut them all down and evaporate into the ether, if that’s your desire. It’s very much like having your own intelligent, yet secure, botnet. Use it for good.
CloudPassage is not unlike Nagios, save that it’s Linux kernel 2.16+ specific. It’s a daemon that connects to the CloudPassage website, for free (other services pending at an actual cost), that checks a configuration for sanity. It’s like your Unix/Linux teacher, grading you on how tightened-up your server is. Did I mention: free?
You have to register. You have to allow your CloudPassage daemons to talk across your security perimeter to CloudPassage. You have to use a compatible version of Linux in the server (or client, for that matter) that you deployed with the CloudPassage daemon. It has to be run initially as root, which will send chills down the spine of security purists, but that’s the only time this need be done.
Then, wait for the CloudPassage daemon to look under the rocks of your configuration files, then diss you for the fool you are. It will find a very long list of chapter-and-verse mistakes you made, probably because you deployed the server with default settings. CloudPassage knows a lot of applications and server settings, and points out to you the error of your ways, citing references right down to the salient CVE bulletins to show just how asleep at the wheel you are.
CloudPassage is worth its weight in gold, considering you can deploy it in compatible instances, even in the myriad of public cloud VMs you just rented. It watches, reports, and rats you out. You’ll love it.
Your SWAT Team
Linking Linux to SMB and Microsoft’s Active Directory is done with SAMBA, which while wonderful, needs a GUI for administrator stress relief. The Samba Web Access Tool (SWAT) is just such a GUI, and it’s included in some versions of Ubuntu and other Linux distributions.
Most people who install SWAT can’t get it going. The instructions are a bit vague, but the reason’s actually something outside of the code. The inet.d daemon blocks it and must have its inetd.conf file modified. You have to leave a port open, which is a no-no to security barbarians.
Once you do the conf file mod, then suddenly you’re working with SAMBA in a whole new way, without the evil command-line syntax from hell problem. This means that you can unite SAMBA configurations more simply to match your Windows Active Directory (AD) and vice-versa. Even if you’re a black-belt in SAMBA, the visuals can be worth the installation. It’s free and fairly well documented; we’ve rarely seen it fail or act silly.
LogMeIn makes a tool called Hamachi, which is a dangerous tool for making VPN links. Hamachi solves the problem of working across networks that use Network Address Translation (NAT), which is many of them—including most home networks, as home networks rarely have public-facing static genuine IP addresses.
Because NAT trips up VPN protocols like IPSec, Hamachi takes its place, but it does so by violating some of the “rules.” That said, it can work across boundaries that are otherwise blocked by interstitial IP addressing translation. LogMeIn also has other products for personal remote access via web browsers.
The danger comes from the fact that some networks aren’t supposed to be joined together in a VPN. This might be because of company policy, security, or the problem of naming conventions. The upside to Hamachi is that IPSec is a Layer2 VPN, and can become completely confused by NAT—a problem that Hamachi conveniently solves with essential transparency. But sometimes we didn’t want that to happen. I often do. But I know organizations that would shoot bullets if it was found there.
The htop app allows you to see all of the processes running in a machine—and so very much more than anything you can do with top or ps. You see dependencies. You see memory used. You can do important things, like identify strange stuff and document or kill it. Oddly named processes have their dependencies revealed, so you can understand what element of what is doing what. It’s almost as nice as the Activity Monitor, a Mac OS utility.
Htop is an interactive tool that’s text-graphical, rather than dependent on any GUI/window manager platform. It does require a compatible version of ncurses but I haven’t found any that aren’t compatible.
Beyond its educational value, htop is a great tool to do emergency diagnosis of problems while knowing what you’re doing—we hope. It’s not actually server or NOC dependent; it works about anywhere in Linuxville. Think of it as dtrace-light; dtrace is the marvelous Oracle/Sun tool that can be compiled into an application to reveal its constituent elemental characteristics from a diagnostics perspective. This tool, htop, can diagnose your server’s ills, or just verify process characteristics. It’s among the great tools that should be in every server tools directory, hidden away from hapless civilians who might play with it.
Virtually every hypervisor, and much of the virtualization and cloud revolution owes its success to the humble little hypervisor, Xen. You wanted a cheap way of consolidating servers? Xen still works. It’s still free. It’s still moderately ugly. Yet several utilities can be used to manage it, including a few moderately ugly GUI apps.
But Xen is pretty rock-solid, and you can get the commercial versions of Xen as manifested in Citrix XenServer, and its strange cousin, Hyper-V, from Microsoft. While I realize that the “lite” and “community” versions of the three major hypervisor families that are related to Xen (Xen, XenServer, Hyper-V) are often free as in beer, good old Xen is pretty light-weight, and the price is free whether it’s one server, or 24 cores in a hundred servers. Tough to beat the price.
And finally, two more tools that also appear in the 10 NOC Tools for Windows list:
SysRescCD.ISO can make you a hero, or a heroine, or a conspirator. Why is it included with Linux and BSD tools? It’s the System Rescue CD for Windows, used to rescue Windows systems where the password has somehow been forgotten or mangled. Download the file and burn it to a CD or, better still, a USB drive.
Make sure you can legally use this file. This tool specifically allows you to replace the administrator/most-privileged user account password. You may need very specific permission to do this; laws on password cracking vary from region to region. Check yours first.
Boot the password-rejected machine with the CD or USB drive. The instructions are inside the ISO download, or you can refer to its website for the docs on how to rescue everything from Windows 2000, Windows XP, Windows 2003, and interim editions.
You won’t need this tool often, but it’s a help-desk must-have. Inside the NOC, where things are busy and documentation can be forgetful, it’s the only way to crack the administrative password. Keep this secret, and remember that nothing protects a machine if a user – you – have physical access to it.
How many times has WireShark pulled me out of the drink? Plenty. Here’s what it does: It listens to promiscuous Ethernet devices (WiFi, too) and assembles packets. You can export them, sort them, pair conversations, and ultimately sniff the wire and air. It’s a sniffing tool that lets you find all of the traffic, which can be overwhelming, so that you can minimize the captured packet volume to something manageable.
There are plenty of tricks to WireShark and other packet-trace tools. Most Layer 2/3 switches allow you to mirror ports, so that you can capture traffic on segments that you’re not logically connected to. (Although the wisdom of port reflection is up to the astute user. Meaning: Be careful.)
With WireShark, you can diagnose and verify the fix for numerous server maladies, ranging from understanding server traffic overloads to decoding packet and segment-related error messages at the protocol (rather than stupid operating system) level. Then you can do things like whip DHCP servers back into shape, find rogues, malware phone-homes, and a myriad other problems. To use it well, expect to go back to the book from your class on network protocols, the one you slept through; but you’ll have great fun.
Remember that your organization may be guided by extreme privacy principles, and the data you see may be not for your eyes. Use of a tool like this implies that you know the rules, what you’re doing, and you will use it for good and not for evil.
What other tools do you consider must-have? Tell us about them in the comments, so we can add to the community’s body of knowledge.