One of the first monitoring situations you will undoubtedly encounter with OpsMgr 2007 is the “Failed to Ping Computer” alert. This alert is generated after an Agent Watcher loses contact with a health service on a agent-managed computer. It is an automatic check that is well-conceived, but isn’t quite what you might expect by its name. When you think of a “failed to ping” alert, you normally think of a ping monitor, running consistently against an IP, that fails to ping.
Similarly, the out-of-box monitoring for a network device is to pull a MIB-2 OID from the box as a connectivity check. This is also not the common conception of network device availability or an equivalent to a network ping.
In both cases, it would be nice to have a “plain old ping.” For this reason, I’ve developed a management pack to provide this. This management pack includes:
- A ping performance data source module designed for collecting response time (performance), with overrideable parameters for the number of packets to send and the size of each packet.
- A ping unit monitor type designed for determining whether the target is responding (availability), with overrideable parameters for the number of attempts that will be made to ping the device and the delay desired between attempts.
- A rule that targets the SNMP Network Device class with the aforementioned ping performance data source module, with write actions to the operational database and data warehouse. This allows network device response time to be graphed, reported, etc.
- Two unit monitors that use the aforementioned ping unit monitor type. One is targeted to the SNMP Network Device class and one is targeted to the Health Service Watcher class. The proxy agent to which any given network device is assigned will run the monitor and therefore do the actual ping for that network device. The agent watcher for any given health service should do the actual ping for that health service, but this seems to always be the RMS in my test environment. Let me know what you experience.
When you override the parameters for either the performance rule or the availability monitors, be sure to take the overall time your options will take into consideration and increase the script timeout if required. If you have retries and delays set such that the evaluation of the monitor could take more than 2 minutes, you must increase the script timeout above 120 seconds, for obvious reasons. If the script always times out, you will never receive an alert. Likewise with the performance monitor, where a timeout will prevent any performance data from being collected.
The current version of the sealed MP is 188.8.131.526. Our public key token is b77a5c561934e089. It is available here: