System log files were originally designed to be a centralized repository of event information. Often the events were exceedingly boring, and informational in nature. But a few pieces of information were important: error messages, exceptions, and application messages.
The concept of a text log to list every system event works well, but scrolling through messages and then trying to make sense of them can be harrowing—especially on a busy system with lots of messages, or a verbose application that spews screen after screen of important, yet indecipherable and unthreaded messages. The word “yikes” comes to mind.
Syslog managers are designed to make some sense of the contents, and to allow system administrators, QA support, and help-desk personnel make sense and act on their meaning.
In the Microsoft world, the number of error logs has mushroomed, and still it seems to be climbing. Microsoft’s Systems Center products can make sense of much of the messaging produced by Microsoft infrastructure and applications, but they don’t quite know what to do with Oracle’s messages, or log files generated by other applications. Relationships in and among applications and servers can become highly complex. The situation begs for a deciphering tool.
You can’t just point your finger at Windows. Linux, BSD, and Solaris have syslogs that are also crammed to the gills with messages. Some divide messaging into secure.log, system.log, and what any number of other daemons might shoot messages into the /var/log directory. The logs get filled, then renamed, so that the current log file has a finite size—but not a finite date boundary.
It’s a mess.
With a syslog manager, all (or as many as can be made sense) of the logs are digested. Depending on their smarts, you can monitor these admin applications for keywords or conditions that, in turn, can spawn activity as necessary.
Many products encompass these features, but some focus on specific types of logs, like Unix’s /var/logs/syslog; others look at security logs or application logs. One product’s focus might be for administrators, another for regulatory compliance reasons. Some syslogs are designed to actuate administration and management based on log content: “I see a problem, I raise or send an alarm, or automatically take action based on what I find.”
The number of syslog managers is huge, and their focus can be all-encompassing, or tuned to specific applications, or to segments thereof, such as httpd or networking. There are literally hundreds of offerings using commercial, open source, and hybrid models. They can be OS-specific, but the trend is really towards heterogeneous OS and application support.
In the ideal world, syslog managers organize the data in the log file so that informational messages are tucked away in an easily–found-and-summarized place, while an ugly event spawns an alert that goes through a delivery chain of action types until the alert is acknowledged.
Better still is the ability to correlate the message with a probable solution, with a reliable reference to its cause, effect, remedy, or all three. Sometimes there might even be a method to use a successful method to remedy, do the job, test that it was completed, and tell you in a message saying something like: “I fixed it, nothin’ to worry about here, but you should know that I did this.”
When the domain of syslog messaging also relates to user information, or connected client data, the syslog manager may extend its reach to a domain of machines, thereby covering a local network. There becomes a dividing line in varieties, as some networks contain persistent devices (always-available applications), while others networks are composed of non-persistent networking components (job-focused, often used once, or re-used but not consistently deployed instances). Both types can be (and frequently are) virtual machines (VMs) or VM appliances, and may or may not be cloud-based.
No matter the persistency or the location or the virtualization state, all of these applications and systems have the status logs we love and loathe, and therefore multiple-source syslog management is usually needed. The breadth and responsibility for system logs starts to climb.
One architectural approach is to place a daemon or service into each device that you desire to monitor. A daemon or service in the host periodically parses and digests pertinent log files. Application logic in the localhost then decides if it should pass a message or take other action based on the entries in the log. Messaging then consists only of data that’s sent to a “mothership” system where an application console or other management system does something with the message based on administrator-set criteria.
On the surface, the mothership application only bothers an administrator when logic trips or twigs an alert condition, and messaging activity doesn’t have much burden on networking infrastructure unless multiple hosts concurrently complain, which doesn’t happen often. Underneath, however, when a mothership application queries the hosts, it also adds an additional verification process that ensures that the networking path to the monitored host is working, and that the host is alive.
Instead of using push-type messaging from client to server, that there’s an interactive process between host and client, thus also verifying that all relationships are alive, as well. It additionally adds an open port for the client (perhaps two if send and receive ports are opened) unless basic messaging ports (SSH, SSH2, or mail exchanges) are used as the transport. Novel transports are sometimes used as well, such as port locations rarely used for RPC responders. The attack surface is increased, but additional reliability is endowed along with knowledge of overall state of the clients.
Another architecture uses a proprietary mothership that’s an installed server system (for example, AlertLogic's LogManager), which in turn, logs into a cloud system and provides the role of an agent into remote locales and ops centers. In the cloud, the data from systems logs it monitors are digested, and warnings are available on the covered clients. The upside is: constantly increasing intelligence. The disadvantage is that the mothership must log onto each machine and continually digest its logs, very slightly increasing each machine’s attack surface, along with adding a public-facing IP address to each locale. For some, this is not much of a sacrifice.
Novell tSentinel Log Manager takes a more reporting focused bent by using an agent-mothership approach, then taking the resulting data and putting into desirable formats. These include PCI Security Standards Council/SSC Data Security Standard/DSS format, formats for HIPAA, Sarbanes-Oxley, and more. It also has a vastly heterogeneous platform support than many of the products that I’ve seen.
The biggest problem with all these syslogs is noise and cryptic entries. Some applications and daemons/service are unbelievably chatty. And they often tell you nothing, as in an error message number or a message vomited by an application in pain that a programmer thought no one would ever, ever see. Sometimes the message is information, almost pompous in nature, but obligatory. I like the ones where apps tell you they exited abnormally. The nature of the messages is that they’re often obscure on a good day, and are made to sound frightening on others. Then, there become so many messages, that it’s difficult to tell the comparative importance of a message, or know which ones need saving for one particular compliance or regulatory problem or another.
Some syslog managers can identify and normalize messaging, an invaluable trait as daemons that write to syslogs or event files often encounter, fail, write a message, retry, fail, and write a message once again. Rinse and repeat. Receiving ten alert txts to a cell phone, then a hundred or more on the same message, gets expensive. Intelligent log managers know how to discern what’s done, then digest subsequent messages of the exact same type. This takes talent.
Further intelligence is needed to correlate events. Some syslog managers can correlate events across multiple servers and installations. These might be break-in attempt messages, or router messages that indicate networking layer issues. Correlation, once performed, can also make alerts more meaningful. One event in South Africa may mean little, but when the same event is happening in Nairobi, Johannesburg, and Miami, something more onerous is likely happening. Better the devil(s) you know then encountering each devil sequentially. To an extent, AlertLogic’s architecture plays to distributed event correlation constructs by having an off-premises mothership that contacts other local motherships actively to digest messages. Such central organization protects and provides organization-wide correlated application monitoring.
There are indeed standards and RFCs that programmers should use to report events to systems logs and event files. The IETF publishes RFC 3164, which is a guide to formatting, tuplelizing, parsing, and communicating syslog messages. Syslog-NG builds on RFC 3164 to form still more comprehensive approach to messages and communicating them. But none of the standards impart philosophy like “you talk too much,” or “the message, while correctly formatted, is meaningless.”
Certain messages ought not to show up by themselves. Especially bothersome to me are “success” messages without an “attempt” event to correlate. The success message makes me contemplate what happened to spawn it and why that wasn’t documented in a process set of messages, and I have no time to contemplate the madness of inarticulate, malformed system messages. I want to think about old Johnny Carson curses, like the wish for early automatic transmission failures, or the flies of a thousand camels infesting a certain programming team’s armpits.
A syslog management system can be heaven-sent. But none of them wake up and do the work for you. They require integration, then you’ll still need to check the results of their automation. You may not need to check as often, and your quality of administration may be increased. If nothing else, you’ll be automating a tremendously tedious process, and one that’s tedious enough to be dangerous—because you’re motivated to not do the important checks.