Ranting, Technically Speaking

Tuesday, 26 January 2010

Essential Application Plugins

The nice thing about programs like Firefox and Thunderbird is that you can get a lot of community-created plugins to make the program look and do what you want. The downside of programs like Firefox and Thunderbird, is there is (at least for me) a few plugins that have to be installed before they work well. So to that end, I've started building up a list of essential plugins.

The plugin model isn't perfect, but it far exceeds the alternative which is that your applications all suck (Microsoft, I mean you). Heck, Nagios at the core doesn't do anything at all for you, it's all from plugins and I can't rave enough about how great an application Nagios is.

- Arch

Thursday, 31 December 2009

Fire Bad!

Battery backup at home went off today BEEEEEEEEEEEEEEEEEEEEEEP BEEEEEEEEEEEEEEEP! Everything shuts itself down and I go to reboot the UPS when *sniff* *sniff* ah yes, the distinctive smell of burned electronics. So that's it finally. Adios APC BackUPS 350. You will torture us no more with your intermittent failures!

Now I have to look for a new UPS. Preferrably small in size (it has to go under my desk) and monitored. APC's successor to the UPS model I had wasn't monitored last time I checked, maybe they've got a newer model that is though. If not, I'll have to look at other manufacturers and then that means looking at software support, etc.

Oh well, out with the old! Happy New Year's!

- Arch

Friday, 6 November 2009

Nagios Rules All

Nagios is a network monitoring application which itself provides no actual monitoring but rather specializes in scheduling checks and notifications. As a module framework, it works very well and there are a lot of monitoring plugins and all told, there aren't many (or any) systems that really compare, F/OSS, proprietary, or otherwise.

Since it's not a complete solution in and of itself, I know at least I found it a bit daunting to get in to. So I got this book:

Building a Monitoring Infrastructure with Nagios by David Josephsen

It's not a huge book like say, HP's OpenView manual(s), so read it first.

Nagios is super cool. You build definitions for each host on your network and each service on each host. Nagios checks each service recording the service's status. When a service fails, nagios will send a notification once it is sure the service is down and then periodically until it comes back up.

Fine, that's the basic premise. Now the configuration works pretty well because any host can inherit it's configuration from any other host definition including host definition templates. So set your general parameters once, and then override where necessary. It's the same for services. And you also have host and service groups which allow you logically group hosts (or services).

Nagios doesn't have any built-in way to check services, it's all through plugins. A plugin is simply an external script or program which exits with a status of 0 for OK, 1 for warning, or 2 for critical and optional 1 line of standard output for status text. Nagios has many standard plugins available, for example the check_ping plugin. This plugin is a little wrapper script which is invoked with arguments specifying the warning and critical thresholds for response time and packet loss. So in testing a plugin, you can simply invoke the plugin with the arguments that Nagios would be feeding it.

Now if a service goes down, Nagios will check if the host is down. Again, this is a plugin of the same type as for the service. Typically, this means check_ping. So you don't really need to have a check_ping "service" check, just for the host. So if your host runs a webserver, you would use check_http and if that fails, Nagios will check_ping on the host to see if that's down. If the host is down, well then obviously all services on that host are a write-off so Nagios will send a notification for that host once rather than for each individual service.

And when I say Nagios will send a notification, it doesn't know how to do that either. Notifications are also defined but typically the stock notification will suffice. On Fedora, it uses the "mail" program to send a mail message.

Ah, so who does it notify? Well, each service and each host defines contact groups and also contact hours. So Nagios will notify everyone in a contact group if it's during notification hours. So you can monitor your development systems as well as production ones and only get notifications when appropriate.

Nagios also provides escalations. So if a service (or host) remains down, you can define an escalation path. Maybe level one is help desk, and if they don't respond, it escalates to supervisors as well, and if they still don't respond, then it escalates to on-call staff, managers and eventually the head cheese.

What else is cool? Oh yeah, parent-child relationships. On each host, you can define parent hosts. So if you have say several routers throughout your network, connectivity to hosts would depend on connectivity to their routers. So if a router goes down, as with services on a host, Nagios will know to only notify of the router being down and not all the children individually.

There is also an agent for Nagios called the NRPE. It is totally optional but if you want local system checks, like disk, CPU, checking running processes, and not just network service checks, then NRPE lets you do this. Install NRPE on your monitored hosts, and NRPE is available for "Linux" and Windows, and it, I think, is like a little baby Nagios invoked by the mothership. So you install service check plugins with the NRPE and then on the server, your service checks are like check_nrpe!check_disk ... or something like that so the server sends the service check to the NRPE on the monitored system. I haven't used this yet, but will definitely be doing so.

The NEB is another cool part of Nagios. The Nagios Event Broker is an interface where can write programs which hook into Nagios's regular operations. There's a couple dozen callback functions you can hook into and this makes the possibilities for Nagios virtually endless.

The part I've left for last is the user interface. Well, once again, there is none. You configure it, it fires off notifications, that's the core. There is a web interface Nagios provides you can use if you want. It will show you host and service status and you can acknowledge alarms through it and schedule downtime for hosts. Now you can hook in more functionality, for example, historical graphs can be very useful. If you're checking disk usage with Nagios, why not keep a record of it? Well, when you get a service check result back, it comes back with one line of text from standard output, right? Well, there's packages that will build graphs from this data so that you can have your service status and historical reports too! Josephsen has a pretty extensive discussion on doing this kind of stuff and some great info on some of the options out there.

So, yeah. Get that book! Use Nagios! Monitor everything with it! Let it tell you when your toast is toasted or your beer needs a refill!

- Arch

Thursday, 1 October 2009

Fedora Bootable USB

LiveUSB Creator, it's a wonderful thing. Connect a USB key, get the LiveUSB Creator on your PC (Windows or "Linux"), point it either to a local .iso file for a Fedora live CD or let it download the version you want for you, click go, and shazzam! (yes, "shazzam") You've now got a bootable Fedora USB key. And if you gave it a block of persistent storage, you've got, well, persistent storage to use in this OS for data files etc.

- Arch

Wednesday, 30 September 2009

Processing Deferred Messages in Postfix

For anyone who's had to cleanup some mail problems with Postfix configuration (or more often with other things, like anti-spam, tied in but not part of postfix), it may be common enough that a large spool of mail gets queued up and needs to be pushed out. The easy way to do this is to do either "postfix flush" or "postqueue -f" which basically force Postfix to re-process pending messages (actually "deferred" usually) and send them out.

However, if either the queue is huge, or you don't really know if you have your problems resolved and want to try a few messages before unleashing the masses, I found it was not clear how this can be done. There is a straight-forward way to do this which is to put everything on hold using "postsuper -h ALL deferred", and then un-hold whichever messages you do want processed with "postsuper -H ".

Tres handy

Friday, 11 September 2009

Let's FUSE him with this juice!

Back in the olden days, like a year or two ago, Filesystem in Userspace (FUSE) was a fancy feature that allows users to mount file systems. Using FUSE means that you can create a file system driven by an application rather than a driver (e.g. a kernel module). When I first tried it, it meant customizing your kernel to include this feature and building a bunch of utilities and drivers and generally it was awesome, but not something one does for a "quick fix".

Fast forward to a few months later (or aeons in OSS terms) and there's standard kernels and packages to operate FUSE. You can pull everything you need from your distro's stock repository.

In particular, there is sshfs which is hella tight. "sshfs" is, as you might guess, a file system over SSH, e.g. in FUSE. This means the security and features of SSH including SSH keys and all that good fun. Installing "sshfs" and FUSE is a simple three step process:

yum install sshfs (or aptitude install sshfs for Debian / Ubuntu users)

Profit!

Similarly, once you've installed "sshfs", using it is a simple three step process:

sshfs myhost.example.com:/some/remote/path /some/local/path

Profit!

What could be simpler? If you're finding your virtual file system access in Gnome or KDE produces odd behaviour sometimes, just FUSE your remote file system instead. You get fully functional and secure access to remote file systems.

Oh, and just one last note, you use a FUSE command to disconnect the mount:

fusermount -u /some/local/path

Thanks, Toddz for mentioning FUSE the other day and getting me to revisit it.

Ciao,
- Arch

(title for this post nicked from an Invader Zim quote)

Friday, 4 September 2009

Crappy Power

I've had some problems in the somewhat recent past where my UPS goes into panic mode and because the battery was old / crappy, this made things "very bad". I've had no issues since replacing the battery, but now I'm getting a picture of why it was so awful from apcupsd:

Mon Aug 31 11:13:36 PDT 2009  Power is back. UPS running on mains.
Mon Aug 31 11:13:34 PDT 2009  Power failure.
Thu Aug 27 11:20:20 PDT 2009  Power is back. UPS running on mains.
Thu Aug 27 11:20:18 PDT 2009  Power failure.
Sat Aug 22 16:59:32 PDT 2009  Power is back. UPS running on mains.
Sat Aug 22 16:59:29 PDT 2009  Power failure.
Sat Aug 22 16:56:29 PDT 2009  Power is back. UPS running on mains.
Sat Aug 22 16:56:27 PDT 2009  Power failure.
Fri Aug 21 00:12:33 PDT 2009  Power is back. UPS running on mains.
Fri Aug 21 00:12:31 PDT 2009  Power failure.
Fri Aug 21 00:11:52 PDT 2009  Power is back. UPS running on mains.
Fri Aug 21 00:11:50 PDT 2009  Power failure.
... etc

There are a lot of power events going on. Given that the time of the "power failure" is always 2 seconds, my guess is that this just means power is fluctuating. I've lived in places where this happened a bit and where it happened not at all, but this is the worst I've seen.

The only thing I can say is: get a UPS if you don't have one! You may not need battery backup per se, but this is the kind of stuff that will send the power supply unit in your PC to an early grave. And if you're unlucky, the PSU may just take other components of your PC with it.

- Archangel