A little while ago I was thinking about transcoding a bunch of my music from Ogg to Mp3 so I could burn the music to CD for my Mp3 CD player. Now I'm no stickler for audio quality, but clearly transcoding from one lossy format to another is going to make things worse then necessary. Given I still have some desire to have music available both streamed and on Mp3 CD, I decided to start re-encoding my music collection to a lossless audio format, specifically FLAC.
Ripping to FLAC was going well initially but there is one distinct downside to using a lossless format which is the disk space requirements. My music collection will be jumping from ~15GB up to 100GB. After some messing around with my volumes on Siona, I could not come up with enough free space without destroying a lot of data. Well, I destroyed a bunch of data for good measure (and because it is important to test that you can restore from backups) and then I bought a new drive.
So the new drive is a spanking new 320GB Seagate drive. I slapped that bad-boy in there and formatted it with XFS. So far so good. I've moved all the existing data from the old WD 80GB drive on there and I've been ripping CDs merrily.
To assist with the task, I even created a little shell script that gives me a list of every CD already on the system and whether the CD is FLAC or not. Even lumps the Various Artist discs together. I'm almost up to a third of the CD collection (between both the wendigo's and mine).
Thursday, 31 August 2006
Friday, 25 August 2006
The Fight Against Spam
The previous post was about my experience dabbling with greylisting but what I really rely on for mail filtering is Amavis, SpamAssassin, ClamAV, and Procmail. Each tool has it's place and does it's job quite well. Except Procmail, but that's just a backwards old mail processing/delivery agent that using syntax far too arcane for mere mortals... But that's just my griping for no good reason, it does the job.
Anyhow, the setup here is that Postfix hands all messages to Amavis for inspection. Amavis can run a message through any number of spam or virus checking programs, SpamAssassin and ClamAV in our case, and any of these programs can approve a message, mark it in some way, move it to a quarantine or reject it flat-out. Basically, Amavis is a front-end for these types of filtering applications. It's very versatile, postfix works well with it, so it works.
The simple one is ClamAV. It runs a message through its virus definitions. If it finds a match, it quarantines the message and then sends a notice to the user saying what happened. Easy. ClamAV actually picks up much of the Phishing scams. It works great.
The other fun one is SpamAssassin (SA) which reads through a message and assigns it a score depending on many factors like whether the message headers appear corrupt. Low score means probably not spam, high score means probably spam. If it's a high score, SA can modify the message or discard it. The levels at which it takes "evasive action" are configurable so this has taken some tuning before I was really happy with the results.
So the setup here is that SA actually modifies all messages adding a X-Spam-Score: nn header to all messages. This way, I can see that the last message I got from my roommate was scored -3.787, for example.
At a score of 4.0, SA adds an additional header that reads X-Spam: Yes and also adds ***SPAM*** to the subject line of the message. This is where Procmail comes in. I have Procmail configured on my account to automatically move any messages with X-Spam: Yes to my Junk folder and out of my inbox. I have found some false positives, specifically my logwatch notices from Siona, which can sometimes go above 4.0 so SA is configured to still deliver messages to the user.
Above 6.0, however, I have seen no false-positives so SA is configured to reject these messages. Above 10.0, SA will not even bother sending a delivery status notification (DSN) to the source of the email on the assumption that the source email address is spoofed and the sending mail server is just a zombie host.
This has been working out very well. My guess is that 90% of spam is scored over 6.0 and is outright discarded. Of the remaining 10% of spam, I would guess 80-90% of that is scored above 4.0 and is getting tagged as spam but delivered. A very small number of legit messages are scored between 4.0 and 6.0 (less then 1%), and no legit messages are scored above 6.0. All in all, this leaves a couple percent of the spam messages that are delivered to the users.
This squad of applications works quite well together for fighting spam. I only have a couple of problems. The setup is a bit confusing out of the gate, but definitely doable and there was a lot of tutorials Online for this setup. And the other problem is that we should not be seeing this much spam anyhow. Mail servers should be authenticating users that wish to relay and message authenticity should be verifiable. If Joe Smith at example.com gets a message from my domain, the example.com mail server should be able to verify that someone at this domain actually sent the message. As it is, it is trivial to forge a source email address, and SPF is not the solution. We need something universal, unlike SenderID, or else what's the point? The closest is the domain signing that Yahoo (?) came up with (which Google actually uses for their mail service), but it is still not universal. Someday, we will leave with not much spam, some day...
Anyhow, the setup here is that Postfix hands all messages to Amavis for inspection. Amavis can run a message through any number of spam or virus checking programs, SpamAssassin and ClamAV in our case, and any of these programs can approve a message, mark it in some way, move it to a quarantine or reject it flat-out. Basically, Amavis is a front-end for these types of filtering applications. It's very versatile, postfix works well with it, so it works.
The simple one is ClamAV. It runs a message through its virus definitions. If it finds a match, it quarantines the message and then sends a notice to the user saying what happened. Easy. ClamAV actually picks up much of the Phishing scams. It works great.
The other fun one is SpamAssassin (SA) which reads through a message and assigns it a score depending on many factors like whether the message headers appear corrupt. Low score means probably not spam, high score means probably spam. If it's a high score, SA can modify the message or discard it. The levels at which it takes "evasive action" are configurable so this has taken some tuning before I was really happy with the results.
So the setup here is that SA actually modifies all messages adding a X-Spam-Score: nn header to all messages. This way, I can see that the last message I got from my roommate was scored -3.787, for example.
At a score of 4.0, SA adds an additional header that reads X-Spam: Yes and also adds ***SPAM*** to the subject line of the message. This is where Procmail comes in. I have Procmail configured on my account to automatically move any messages with X-Spam: Yes to my Junk folder and out of my inbox. I have found some false positives, specifically my logwatch notices from Siona, which can sometimes go above 4.0 so SA is configured to still deliver messages to the user.
Above 6.0, however, I have seen no false-positives so SA is configured to reject these messages. Above 10.0, SA will not even bother sending a delivery status notification (DSN) to the source of the email on the assumption that the source email address is spoofed and the sending mail server is just a zombie host.
This has been working out very well. My guess is that 90% of spam is scored over 6.0 and is outright discarded. Of the remaining 10% of spam, I would guess 80-90% of that is scored above 4.0 and is getting tagged as spam but delivered. A very small number of legit messages are scored between 4.0 and 6.0 (less then 1%), and no legit messages are scored above 6.0. All in all, this leaves a couple percent of the spam messages that are delivered to the users.
This squad of applications works quite well together for fighting spam. I only have a couple of problems. The setup is a bit confusing out of the gate, but definitely doable and there was a lot of tutorials Online for this setup. And the other problem is that we should not be seeing this much spam anyhow. Mail servers should be authenticating users that wish to relay and message authenticity should be verifiable. If Joe Smith at example.com gets a message from my domain, the example.com mail server should be able to verify that someone at this domain actually sent the message. As it is, it is trivial to forge a source email address, and SPF is not the solution. We need something universal, unlike SenderID, or else what's the point? The closest is the domain signing that Yahoo (?) came up with (which Google actually uses for their mail service), but it is still not universal. Someday, we will leave with not much spam, some day...
Thursday, 3 August 2006
Greylisting
So I tried a spam deterring technique known as "greylisting". Basically, you tell your mailserver that any messages from an unknown source should be met with a temporary error under the expectation that very few spam agents will resend the message later whereas legitimate mail servers will.
Ok, interesting. This can be a highly effective technique and likely to deter 99% of spam. However, this involving making your mailserver respond with an error by default.
What happens is that each incoming message is first checked by the greylisting service (I tried Postgrey). The greylist service checks "has this mail server tried to send a message from Alexa to Bree before?" and if not, then an error 450, "mailbox unavailable" error is sent. The sending server will then queue Alexa's message for later transmission. When it is retransmitted, the greylist service says "ah, I recognize this so therefore it is probably not spam" and allows the message from Alexa to Bree to pass through and subsequent messages from that source to go through.
The playout is that it is a highly effective method of detecting spam, however when messages are being queued for delivery "later", that turns out to be anywhere from one to four hours later. For people that are sending lots of messages back and forth, this is not a big problem since the greylist tracks who has successfully sent messages in the past but for any arbitrary exchanges, like say a customer sending a message to sales or tech support for example, greylisting really slows down mail delivery.
All-in-all, it's a pretty drastic anti-spam measure. I ended up disabling it once I realized just how long mailservers will arbitrarily queue mail.
Ok, interesting. This can be a highly effective technique and likely to deter 99% of spam. However, this involving making your mailserver respond with an error by default.
What happens is that each incoming message is first checked by the greylisting service (I tried Postgrey). The greylist service checks "has this mail server tried to send a message from Alexa to Bree before?" and if not, then an error 450, "mailbox unavailable" error is sent. The sending server will then queue Alexa's message for later transmission. When it is retransmitted, the greylist service says "ah, I recognize this so therefore it is probably not spam" and allows the message from Alexa to Bree to pass through and subsequent messages from that source to go through.
The playout is that it is a highly effective method of detecting spam, however when messages are being queued for delivery "later", that turns out to be anywhere from one to four hours later. For people that are sending lots of messages back and forth, this is not a big problem since the greylist tracks who has successfully sent messages in the past but for any arbitrary exchanges, like say a customer sending a message to sales or tech support for example, greylisting really slows down mail delivery.
All-in-all, it's a pretty drastic anti-spam measure. I ended up disabling it once I realized just how long mailservers will arbitrarily queue mail.
Subscribe to:
Posts (Atom)
Popular Posts
-
For anyone who's had to cleanup some mail problems with Postfix configuration (or more often with other things, like anti-spam, tied in ...
-
In the course of troubleshooting the office Jabber server the other day, I came across some interesting info about the various caches that O...
-
For everyone who uses cron, you are familiar with the job schedule form: min hr day-of-month month day-of-week <command> A problem...