Ranting, Technically Speaking: 2011

Friday, 30 December 2011

Storing Passwords

The most effective way to manage your passwords for personal or professional use us to use a password manager. This allows you to manage unique logins for all the different resources you access (bank vs email vs general forums vs ...) and only have to maintain one master password. Pick a reputable password manager, like KeePass, and remember that backing up and restoring your password database is critical.

Keeping electronic copies is fine, but also consider keeping a hard-copy as well in a relatively secure location. One suggestion is that you print off your passwords every time you change your master password (annually is pretty minimal) but write that master password down on the print out so you can recover it if you forget it! Useful if you do cycle your master password frequently.

Friday, 16 December 2011

WiFi Routers and NAS

The last time I bought a new router was when the Linksys WRT54G was "the king" of home WiFi routers - and mostly because you can replace the useless stock firmware with DD-WRT. Otherwise, it was "a router". At the time, 4 years ago, which is like many generations in Internet time, you had to manually setup security on your WiFi AP still so you saw lots of open WiFi hot-spots like "Linksys" or "Dlink" around. Then the WiFi router manufactuers started providing security setup as part of their setup wizard so you see more SSID customization and security enabled. Now, apparently, everyone auto-configures security with a magic button called "WPS". Then you've got other features USB ports so you can run a file-server from a USB drive or print server and "guest networking" so you can isolate isolate your workstations from other users.

"WPS" - WiFi Protected Setup is definately a cool feature. It comes as a button on the router so when you press the button, its like the router goes into a sort of "security auto-config mode". WPS, if its supported on your client (I assume it's a software install), will then automatically configure your client and your router with strong security settings. It means no more default passwords and streamlining the security options for users who frankly don't need to have "WEP" as an option.

[Edit: WPS is broken and should be disabled on all routers that support it according to SANS.]

Guest networking is another cool feature on some routers. It is a separate SSID for, well, guests to use your WiFi from. It is isolated from your main network so that guests won't have access to, for example, your network attached printer or to your media collection you stream from your laptop to your television. This is just so cool for people who may be sharing their Internet connection with their neighbours or roommates but just don't want their surfing habbits to infect their own systems :)

And the USB ports. Many routers seem to have one or two USB ports on them which is interesting, but what's more interesting is what you can do with them. A lot of new routers have built-in file servers so as soon as you attach some storage, you can share files and folders from it to the PCs on your network. How convenient is that? Some routers have more sophisticated web interfaces than others and let you specify which folders are or aren't shared - but either way, if you're buying a new WiFi router anyhow and you get this feature, it means you get a functional NAS for the cost of a USB key or USB attached hard drive! *And* some routers are starting to come out with USB 3 - SuperSpeed USB which if you consider these routers have not only 802.11n speed on the WiFi but also Gigabit speed for the network ports, is an awesome feature.

And that's not the only thing you can do with the USB port - some routers will also act as a print server! So you attach your generic USB printer to the router, and it's now a network printer you can print to from any laptop or PC in the house. Talk about great value-added feature! I love it!

And did I mention that new routers are all now wireless N with Gigabit LAN interfaces? WiFi is still garbage and a ways away from being reliable outside very small deployments, but N is an improvement over previous specs. Interestingly, I found out the other day as well that if you run your router in "dual band" to support both N and G clients, your wireless speeds on both N and G suffer. So ironically if you have any wireless G clients, unless you really need your N devices to run at "slightly faster than G but nowhere near N speeds", you should still run G only.

Cool beans! I'm liking some of the features I'm seeing on the box these days from some of the WiFi routers. A nice change from the utter crap they used to shlep out where the only smart thing to do was check if you you run a custom firmware on the device and replace the junk software sold with it.

Wednesday, 19 October 2011

Source Control for Server Admin

So you manage a server, or a lot of servers alone or in a team, however you are doing this, you are going to be tweaking configuration files often and creating custom scripts for automation. There are two tools I use for revision control - RCS for configuration files (generally) and SVN for scripts (generally).

RCS. The classic. All the documentation you will ever need is in the man pages. Well that and some context for how to use it. RCS creates revision files in place. So if you change /etc/dhcpd.conf, it will create /etc/dhcpd.conf,v. This is a very useful setup when controlling local files in arbitrary locations - like most of /etc on most of your servers. There are a few caveats to keep in mind:

RCS will put revision files (the ,v files) in an RCS folder if present
The default behaviour is to remove a file from its current path on check-in

Keeping these in mind, this is my general pattern for working with files under /etc.

If there is no RCS folder (e.g. /etc/RCS), create it first

mkdir -m 700 ./RCS
Assuming your working folder is where the file in question is, this will create an RCS folder and protect it from other users (typically non-root)

If a file doesn't exist in RCS, check it in first

ci -u dhcpd.conf && co -l dhcpd.conf
ci is short for "check-in", unlike SVN or CVS, "ci" is the command and not an argument to "rcs"
The -u "unlocks" the file leaving it in place (so dhcpd can read it)
co is "check-out" and -l "locks" the file for editing

I always leave files checked out to capture changes by other users or by the system (like rpm)

If the file does exist in RCS, check for any un-committed changes

rcsdiff dhcpd.conf
This does a diff against the last checked-in version by default but you can specify a version if you want to compare against earlier changes
Check-in any un-committed changes or find the person who made the changes and make them do it

The file should always be left checked-out (per above comment), otherwise check it out
Make changes
Check-in changes, and check-out the file for the next user

ci -u dhcpd.conf && co -l dhcpd.conf
Give a brief log message indicating what the changes were and again, leave the file checked-out to capture changes by the system or other users

Now the last useful command I'll mention there is rlog which lets you read the revision history log.

Now SVN is a proper centralized source control system. They have excellent documentation on setting up a repository. This is very useful for system admin scripts.

Although most system administration related scripts won't ever have "releases" or "branches", you probably still want to follow the SVN guide and create at least a trunk in case you ever do need to tag a specific version. There's no cost, so I use a trunk even though I've never used it because changing later is a problem.

With SVN you'll want to keep an updated local working copy ("tip") either on a shared NFS location or locally on each server. How you do it is up to you, just create a cronjob to run "svn update /path/to/tip" and then you can always run scripts from that path.

RapidSVN is a great tool, well maybe not great, but works very well for sys admin anyhow and its readily available. So check out your own working copy of the trunk with RapidSVN. I configured RapidSVN to use gedit as my standard editor and meld as my diff tool.

This gives you everything you need for day-to-day creating and maintain system configuration files and your toolbox of scripts for automated system maintenance.

Saturday, 15 October 2011

Debugging Python Scripts

This is really just props for a site I found with a nice walk-through of using the Python Debugger - pdb.

http://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/

pdb your built-in step-through debugger allowing you to inspect objects and all the usual things you need in developing a program.

Friday, 9 September 2011

Running the numbers

Two interesting tools popped up recently.

Good old Linux Counter has been passed down to a new maintainer. This is a classic project which attempts to get Linux usage data from user input. Its hard to tell if its particularly relevant, but it is interesting to see relative usage across platforms and by region. As for estimating global Linux use? Hard to be convinced this provides a good enough sampling to be very convincing. Nevertheless, I keep my machines at home registered there. Or at least some of them :P

Another one I really like is Debian Popcon which tracks popularity of Debian packages by installs and by "votes". Popcon is actually just a Debian package which phones home your installed package list and it is installed by default on some distros while not others. What I like about popcon is that when there are a wide variety of F/OSS tools available, you can check the list to see which tools are ranked highest so you can at least start by trying the most used tool rather than taking a total wild guess. For example, in looking for a SVN GUI tool, I did a "yum search svn" and there were a lot of hits. So I opened up popcon, search the list top to bottom for "svn" and took the highest hit which was a GUI tool which was RapidSVN. Well, then I checked with Dante which tool he used, but lo and behold, it was RapidSVN :)

Thursday, 7 July 2011

Reorganizing Ubuntu Partitions

My personal PC at home died. It was an old PC no matter which way you look at it. Every part had been replaced or upgraded over time (case, PSU, optical drive, hard drive, memory, CPU, mainboard, NIC, video card) so knowing it's actual age is pretty hard, but it looks like "Friday" as a PC existed for 7 years. Checking my blog, the first reference I found was October 31, 2004 indicating Friday was the new name for an old PC called Michael.

Time for a new new PC. I've reused the optical drive but everything else is new in Agnes (from Immortality by Milan Kundera). I did your basic "install Windows first, Ubuntu second" so pretty much just a mommy-install. Until I realized I really hadn't made a big enough Windows partition.

I figured it would be a pain, moving the first Ubuntu partition back on the drive so I backed everything up and booted from the Live CD. "gparted" is included on the live CD and it was painless to shrink the Ubuntu partition, move it "right" and extend the Windows partition. I didn't have to reinstall grub or do anything else, it pretty much just worked - for both OSes. It's always so nice when things just work.

But I will say, it's pretty dumb that Ubuntu doesn't use LVM. As I have posted before, LVM is very useful. What would be nice is if I could have just lumped most of the free space into LVM and then just carved out an LV for home and another for media so I could grow them as needed. Rather than fiddle too much with that though, I ended up just going with a relatively large /home partition and will just grow that as needed and then if I need space for other things - like more storage under the 'doze, I can put a partition at the end of the disk.

Thursday, 16 June 2011

Heartbeat

I recently have tested out running Heartbeat (finally, took too long to get to this, but that's another story). This is a cluster resource manager (CRM) which polls nodes in a cluster and brings resources up when a node failure is detected.

It's interesting. I wouldn't call it elegant really, maybe the newer Pacemaker would seem cleaner. But it is simple and at least in testing it is effective especially when combined with DRBD which I posted on earlier. The thing is where DRBD really seems built for top-notch resiliency and flexibility, Heartbeat seems it will work, but it's not obvious that you'll get what you expected - maybe it's just the documentation on DRBD was really well done.

At any rate, there is great documentation on getting Heartbeat up with DRBD both from the CentOS wiki and from DRBD. I used heartbeat with drbd83 in CentOS.

What Heartbeat will do is listen for heartbeats from a peer node in a cluster and if a peer goes down, it will bring up services on the working node. There's a handful of important things about this to keep in mind.

First is the heartbeat - this is just a stand-alone network connection between two nodes so if that connection goes down or the heartbeats get choked out by competing traffic, Heartbeat may well decide you have a node failure. This is not a trivial problem because now that Heartbeat can kill services on an active node, this is potentially an new point of failure. And this is common to many HA configurations including DRBD itself though as we know, it will identify split-brain and gives you some recourse for repairs. So the suggestions here are to use a dedicated connection, preferably a point-to-point connection with a cross-over cable or a serial port - and this is not uncommon for clusters (the point-to-point connection) like in this white paper for Microsoft Storage Server.

Then there is the issue of resource management - when the CRM is managing the resources, the usual OS procedures should not. If Heartbeat is in charge of bringing up MySQL, you shouldn't be starting MySQL from the init scripts when the OS boots. Now the nice thing with DRBD is that it's behaviour is consistent with this paradigm - when DRBD resources start up, they are in "secondary" mode and cannot be accessed by the OS. So if you have a file share protected by DRBD, Samba wouldn't be started by the OS, and likewise, that file system would be unavailable when the OS starts (by default at least). So here, Heartbeat makes a lot of sense. You take a 2 node cluster for example, when the nodes start up, Heartbeat looks for the peer, picks someone to become active, and then would make "primary" the DRBD resource on that peer, mount the file system, start smb. On the stand-by node, you would both have 'smb' off and the file system would not be writeable which helps ensure consistency.

I guess I could go on about Heartbeat quite a bit, but there's one last thing to mention specifically here and that's the style of cluster. There are "R1" style clusters which are simple but limited to 2 resources (and other limitations) and then there are CRM enabled clusters which are more robust but more complicated to configure. I have only used R1 because it was sufficient for my needs - 2 nodes, one was known "preferred", keeping cluster configuration in sync "manually" wasn't onerous. But CRM enabled clusters are more interesting because you can add more nodes and it will distribute the cluster configuration automatically, etc.

The one thing I haven't really touched on is the quorum which others who are more familiar with cluster management will be more familiar with than I. Basically with Heartbeat in an R1 style cluster, there isn't going to be a quorum. Your configuration is maintained pretty much manually, services are only running on one node, etc. In CRM style Heartbeat or other application clusters, the quorum is what all the nodes agree on and typically stored in a file. On Windows Storage Server and other clusters, the quorum is stored on the shared disk which means any problem there means the cluster fails. With Heartbeat, the quorum file is copied among the nodes but this is susceptible to becoming out of sync like if there is a communication failure on the heartbeat channel leading to a split brain. Or this is my limited understanding of this. At any rate, it is a problem and it isn't trivial in working with active/active or multi-node configurations.

Friday, 13 May 2011

Cache in Openfire

In the course of troubleshooting the office Jabber server the other day, I came across some interesting info about the various caches that Openfire has. If you log on to the admin console of your Openfire server and go to the cache summary page, you can see what the usage and effectiveness of your various caches are. Specifically, I found that a couple caches were full - Roster and VCard. The Roster cache was limited to 0.50MB by default it seemed and it's effectiveness was less than 20% at the time.

It is a fairly common issue and it has been discussed in the Ignite Realtime forums. The solution posted is to set a couple system properties to override the default:

cache.username2roster.size
cache.vcardCache.size

Both of these are given in bytes. The post in that thread says to go to 5MB, I found that my Vcard cache didn't need to be much bigger than the default and the roster cache only needed 2 or 3 MB.

After changing this, both cache hit rates are closer to 90%.

Our system is very small (less than a couple hundred users total), so the effect is not big on regular usage. But well worth checking on your server as it is a quick and easy optimization.

Thursday, 5 May 2011

Again with the tapes

In a previous post I said that to get around devices changing their numbering, it was useful to use the /dev/tape/by-id instead of the generic /dev/nst0. Unfortunately, this is also imperfect I've just learned as the device which was previously "scsi-35000e11138aa0001-nst" this time came up as "scsi-35000e11138aa0001". And you can guess how gracefully the software handled that (not at all). Now I don't know if was a driver update (possible) or if the device was switched to a different SAS interface (also possible), or maybe just the gremlins. Whatever it was, once again, I had to reconfigure the software to find the new device ID.

Thursday, 10 March 2011

Flexible Storage Replication

I have recently been looking quite a lot at different storage setups including storage replication and have been so far mostly relying on running rsync to copy a file system to an appropriate secondary host. For large file systems - either with a lot of files or simply a lot of changing data, this is slow and resource intensive. Not really a problem in some cases, but very problematic if you want your secondary system to have very current data. If you want to cobble something together yourself from commodity hardware, DRBD is an excellent tool and very feature-rich.

First of all, I can't recommend the DRBD User Guide enough. It really lays out the features and usage not just of DRBD but also some common applications you would use alongside like LVM for storage management and Pacemaker and Heartbeat (and others) for clustering.

What DRBD is going to do is basically copy writes to a block device over the network to a replica device - this storage set is called a "resource". Generally, you will expect to have two nodes for each resource. During normal operation, you will have one "Primary" node and one "Secondary" node for each resource which logically indicates that one node is writing changes to the resource while the other is making a copy. DRBD is generally very slick in handling replication and the status of the nodes. First of all, when you configure the resource, you specify an IP address for the replication target and generally you are going to want this to be a separate network interface from your general data plane - for example a cross-over cable for point-to-point connection between the two nodes. If the replication path goes down, DRBD is basically going to mark at what point in time it happened and then keep track of which blocks changed since that point so when the path comes back up, it has a list of which blocks need to be transferred instead of having to resync the whole device. That's another thing - it does the whole device sync for you too when you create the device. And also, you get basically the same behaviour if your secondary node tanks, or if both nodes tank for that matter, or even the primary node.

Unless both nodes end up in a "primary" state during some overlapping time. So if you automatically bring up the secondary node in case of a primary failure with Pacemaker, for example, but the issue was a path failure and not a node failure, then both nodes may end up in "primary" state. Since DRBD is tracking when communication is disrupted, it will detect this problem - a "split brain". You get several options for manual resolution (I think automatic as well) including taking the changes of one node or the other, the node with the "most" changes, the node with the "least" changes, the oldest primary, the youngest primary... You may still be stuck losing some data - but you can keep both nodes in split brain and consolidate externally (e.g. if you have critical data like financial data where you can never drop a transaction).

DRBD supports three replication "protocols" called, intuitively, A, B, and C. "A" is asynchronous so writes to local storage device unblock after the local device finishes writing. "B" is "semi-synchronous" which unblocks after the data has reached the peer. And "C" which is "synchronous" so the write operation is only complete once the data is written to both devices. I was finding that "A" and "B" got me similar speeds and "C" was slower - but this is not very rigorous testing and my replication link was 100Mbps through a shared data plane.

One of the things about any of these replication options compared to rsync is that they are going to generally be much nicer on your memory. I find that when rsync scrapes the file system, this effectively nukes the OS's disk cache such that after rsync runs, users may notice it takes a while to "warm up" again. But, replication is not a backup - if a virus eats your files on your primary node, it will eat them on the secondary node synchronously or asynchronously - your choice.

If you are using LVM (and you should be, I've posted about LVM before, so have others), you'll wonder whether you layer DRBD on top of LVM or vise-versa. As Chef would say: Use DRBD on top of your LVs. Dramatic over-simplification aside, it does depend on what you are doing. If you are using LVM to carve up a pool of storage for example for virtualization and then want the storage layer to replicate your VMs, it may make more sense to create your DRBD volume from physical storage, then it will replicate the whole LVM structure to your replica node. But there's complications like ensuring LVM will even look at DRBD devices for PVs and managing size changes, etc. There's a time and a place for everything, and that's college.

Um, what else is awesome about DRBD? Offline initialization, "truck based replication" (a.k.a. sneakernet), replicate the node locally, ship it to the remote site, turn-up from there. DRBD Proxy (paid feature) for when you need to buffer replication for slow or unreliable network links. Dual-primary (for use with something like GFS) operation. 3 node operation by layering DRBD on top of DRBD.

Yeah, it's cool. It's Free and free. You can get it stock with Fedora and CentOS (probably Ubuntu and others, but haven't tried it yet).

And one last thing - you cannot mount a resource that is "Secondary". So if you are getting crazy error messages that you can neither mount nor even fsck your file system, it's probably in Secondary - don't bang your head against the wall, just do "drbdadm primary <resourcename>". Is clear?

Wednesday, 26 January 2011

Tape Devices for Amanda

I've found a few times now that my tape server can be a bit of a pain about tape devices. Generally, I have Amanda configured to use /dev/nst0 but the tape drive isn't always this device if I attach other devices (at least other drives). So rather than configuring the "nst0" and then changing it to "nst1" after a few days of realizing the backups aren't working for some reason, I've started using the "tape/by-id" device instead. So my amanda.conf now shows:

changerdev 	"/dev/tape/by-id/scsi-1IBM_3573-TL_00X2U78M1255_LL0"	# tape device controlled by mtx
tapedev 	"/dev/tape/by-id/scsi-35000e11138aa0001-nst"	# the non-rewind

Is clear?

Saturday, 22 January 2011

No more mail

One more service down - no more mail. All "real" email has been offloaded or canceled except for uro.mine.nu which has basically just been sacked. I've closed the ports for SMTP, POP, and IMAP. So now this is it, I'm down to just web applications that I'm hosting from home.

What I'd like to do is find a dirt-cheap web-host for this stuff. None of it is high volume - the old URO forums which is still used by some of my gamer buddies (I think - haven't checked in a while) and some personal blogs including this one, and I have a couple personal site type things up. iweb.ca is still offering hosting for $1.67 / mo so I'd like to give them a shot. We shall see, I'll try a couple different services over the next couple months.