Monitoring and Other Mayhem

SHARE
April 18, 2016

Pros And Cons

Zabbix isn’t bad. The main issue I have with it is that someone let a developer design the interface (again!). It’s fairly convoluted, and thus not very easy to navigate unless you’re already familiar with it. Most of the learning curve was solely in figuring out where everything was (and it’s not always obvious).

Once I got past that, Zabbix is pretty sweet. So far, anyway.

The interface is cluttered. Some of it is serious UI design issues (want to be able to view your host’s data, and click a link on that page to reconfigure something? Good luck with that). Other things are really just annoyances (it took me forever to find the global macros, for example, because instead of being on a tab-bar sort of thing, they’re on a drop-down — at the right hand side, where my eyes don’t automatically check. And everything else seems to be on tabs excepting filter selection. It’s just this one menu that happens to be a dropdown).

One you get used to it and know where everything is in the interface, however, it’s better than Nagios by a mile. I can at least understand what’s going on in it. Could it use work? Yes. But I’d give it three stars, as opposed to Nagios’ one star (“Was this thing designed in 1996 or something?!”).

The basic structure is pretty clear:

  • Hosts are self-explanatory.
  • Host Groups are groups of hosts.  They’re mostly used for filtering various lists and whatnot, and for assigning permissions for various users to operate on the contents.
  • Items are things you want to monitor on a given host.
  • Triggers cause an action to happen (such as an email being sent) when some condition is met.
  • Graphs are also self-explanatory.
  • Templates are collections of Items, Triggers, Graphs, and other stuff that can be applied en-masse to a host.  If you update the template, all hosts that are linked to it are also updated.
  • Actions are things that happen, like sending an email.

There are also a number of more advanced features that I won’t go into (such as automatic discovery). Overall, once you get the above down, you’re in pretty good shape for understanding what’s going on in Zabbix. There are a few other things, of course, but these are the main ones.

And then there are things that needed to be slightly more thought out, like Web Scenarios.

Web Scenarios are, in the abstract, pretty cool: you set up a set of web requests that need to be made in order, and if they all pass, all is well. You can then alert based on these scenarios, ranging from “it failed” to “step #3 failed” or “step #6 took too long”. They’re pretty flexible.

They’re also more or less designed to test external web servers, which is unfortunate.

If I have a web scenario, it’s most commonly going to be attached to a Host to test a web service that runs on that host. “Okay,” I said. “If I use http://localhost as the target URL, that’ll work, right?”

Bzzzzt.

You end up having to specify the URL as “http://{HOST.CONN}/foo”. This is not exactly a big deal, but it’s another one of those annoyances I mentioned.

The Agent

The Zabbix agent has some serious cool factor going for it. As I mentioned, I really like Nagios check scripts, but the Zabbix agent may just be cooler. At its core, it’s just an agent, but it’s also extensible in a surprisingly simple fashion. This became apparent when I wanted to collect trending data on various stats (like cache hit rates and such) from the varnish instance that sits in front of floating.io.

To do this, you add a line like this to /etc/zabbix/zabbix_agentd.d/varnish.conf:

UserParameter=varnish.stat[*],(test -f /usr/bin/varnishstat && /usr/bin/varnishstat -1 -f $1 | awk -- '{print $2}')

…and restart your agent.

Once that’s in there, you can add any stat available from varnishstat to an item in Zabbix by using a key like “varnish.stat[MAIN.cache_hit]”. It’s that simple. There are also a bunch of standard ones in there, though I’ve only explored a few of them.

Another nice feature is automatic registration. First, you add an Auto Registration action (okay, now there are two annoying things on a non-obvious dropdown). That action can create a Host object, assign it to a Host Group, link Templates to it, and so forth, all based on conditions (such as, for example, the domain it’s in). Then you set the ServerActive= directive in /etc/zabbix/zabbix_agentd.conf on a host and fire up the agent.

Now your host will automatically register with Zabbix according to what you set up. This would be extremely handy in a large environment with lots of hosts being added on a regular basis. It would also make monitoring “elastic” environments much simpler.

Zabbix gets a big win here.