Monitoring and Other Mayhem

SHARE
April 18, 2016

Out of the Box

My network is more complex than the average home installation, but not terribly so. I have a handful of Cisco SG300 switches, a Synology DS1513, a VMware host, a bunch of VMs, and a few other odds and ends. This was the big test: how much would it take to get all this monitored?

Hint: it’s not hard.

For all the x86_64 Linux hosts I have around, it was tirival; just throw the agent on, apply Template OS Linux to the Host, and you’re done. Anything you want to do beyond that is gravy. It’ll automatically monitor CPU, memory, disks, and more.

The Synology was, surprisingly enough, equally simple. Just add the Host and add the various generic SNMP templates that come with Zabbix. It automagically grabbed all the disks and whatnot and set them up to be monitored. The only changes I had to make were to the triggers; they default to alert at <20% free space on a volume, and I generally run a number of my volumes a lot fuller than that.

I also learned that Zabbix has a convenient collection of community-supplied templates that you can add. In my case I imported a Synology template from that source and added it alongside the pre-existing SNMP templates. This gave me a few additional bits and pieces, like the unit’s temperature.

The Cisco switches were almost as easy, but for one thing.

If you use SNMP to monitor something, you have to watch out for the default settings. If it’s a Linux host, it’s probably not going to be an issue — but it doesn’t work right out of the box with my cheap Cisco gear. It took a bit of digging to find out why.

The short version: It uses SNMP v2c bulk transfers by default, and (at least according to the Zabbix folk) Cisco’s implementation of this is broken. To make matters worse, it was intermittent. It started gathering data, but I’d have big gaps in the collection. The server log showed that the various monitored interfaces were randomly “becoming not supported”.

Disabling SNMP bulk gets solved the problem. This is one area where Zabbix needs work; the defaults are not always sane, and there’s really no reason they shouldn’t be.

Other Gotchas

Really, there aren’t many that I’ve run into yet. The Web Scenarios I’ve already mentioned; those are slighly annoying. Beyond that, most of it is just learning curve; the trigger and calculated item formulas, for example, need better documentation. Once you learn them, though, they’re not hard.

The one big gotcha that I ran into, though, was in trying to force things to happen.

Example: I applied Template SNMP Disk to my Synology host. At first glance, nothing happened. Nothing continued to happen for an hour, and then — suddenly — the volumes and network interfaces appeared as items. An hour later. WAT?

Turns out that Zabbix has some weird rules surrounding discovery intervals, and there’s no way (that I know of) to force an immediate discovery. This is another one of those things that there’s really no reason for that I can see. If nothing else, a template should apply its discovery rules immediately upon being added to something. This would make things much more predictable.

So if nothing seems to be happening, wait an hour and see if nothing is still happening.