Successful system monitoring is a matter of balancing the information presented with the workload of the responsible technicians. The information being gathered needs to be thorough, but filtered in a way that techs can respond to issues in a timely fashion and not be overwhelmed by the sheer volume of information.
Logs are important, especially when looking at system and network trending. Retroactive analysis also benefits with a more complete data set to work with. But as technicians are human beings, there is a limit to the amount of information that can be put to practical use before some of the information is overlooked. When a person is bombarded with information, as is all too common today, anything that is not immediately attended to can become buried (take a look at your email inbox), become dated, and eventually be lost. If that information is critical, then the overabundance of information can result in the failure of the monitoring process.
Many companies with the best of intentions have a difficult time implementing a monitoring system that evolves with their network and acts as a useful preemptive tool for the IT team. Just as system maintenance needs to be considered in the regular duties of a system administrator, so does the maintenance and configuration of the monitoring system. Unfortunately, for many companies, the manpower is just not available. Having a pre-disaster alerting system in place can save even a medium sized company thousands of dollars in labour cost, but this return on investment is not easily communicated to executives outside of the IT field. In fact, the assumption in most cases is that once the monitoring software and system is installed, you are up and running and no further problems should be expected nor is anything else needed.
By default, most monitoring systems can run a “discovery”, and then throw ping requests at the devices on the network to tell you what is in an “UP” state, but this does not tell the IT team whether those devices are actually running and healthy. The user base relying on Exchange for their email will not care that the network card was returning the ping reply if the Information Store service has crashed. The user base will not care that the Blackberry Enterprise Server was powered on if the Agent Process had died and messages were not being delivered.
There is an alternative, however. For a fraction of the cost, this service can be outsourced. With the infrastructure already in place, and the expertise in Systems Analysis and alert filtering, a company like System Lifeline will compliment any IT team without the cost of hiring another body.
I am not suggesting that outsourcing your system monitoring is the only answer to providing reliable IT services, but a properly configured monitoring and alerting system is an important cog in the overall IT infrastructure. As is the case with any complex system, making any portion more reliable benefits the entire scheme.