Book RSS updated

The RSS feed of bookstore author events from PHP iCalendar was broken for University Books because of an entity replacement problem. I added é to the list of translated characters and everything works a bit better now.

Posted by Scott Laird Tue, 04 Nov 2003 01:04:06 GMT


The Future of Email

VentureBlog has an interesting bit on spam, claiming that spam is going to give Microsoft control over the entire email server market. The logic is kind of interesting; basically it boils down to using your Exchange server license as a bond against sending spam. If you spam, they yank your license, so owning a valid Exchange server license is an automatic key to spam whitelisting:

However, corporations are already shelling out big bucks for email - specifically for Microsoft Exchange or IBM/Lotus which between them have 75% of the corporate market.
Microsoft could just provide a stamp on each outgoing message (think public key cryptography) identifying that it came from a specific exchange server. This would be verified with Microsoft, which would provide a whitelist of valid exchange servers to every anti-spam company. [VentureBlog]

Three problems with this:

  1. Bayesian filtering seems to work really well. My home email filter is over 99% effective right now, blocking roughly 200 messages per day with no false positives.
  2. Spammers are already using viruses to generate open relays. How long will it take before office computers are attacked deliberately to use their whitelisted Exchange server for spamming?
  3. The liability issues of point 2 will effectively keep Microsoft from blacklisting large customers, even when bushels of spam are pouring out of their servers.

So, in short, I think it’s a neat idea, and I wouldn’t be surprised if Microsoft tries it, but it isn’t going to help. In fact, it’ll probably just make corporate PCs even more attractive to spammers.

Posted by Scott Laird Mon, 03 Nov 2003 23:29:09 GMT


Snowing

I just work up and it’s snowing outside. It’s not sticking, but it’s coming down pretty good. Seattle doesn’t get a lot of snow most years; this is actually the earliest snow that I can remember, but I haven’t been making notes :-).

We should take the kids somewhere where they can play in it. Gabe loves snow, and Sophie probably will too.

Posted by Scott Laird Sun, 02 Nov 2003 15:49:39 GMT


Spam-blocking update

My spam blocker is working better then expected; over the past two days, it’s been over 99% accurate, with 1 or 2 false negatives and no false positives. I’ve been receiving around 200 spams/day, and Apple’s Mail.app was only catching 80% of them, with a handful of false positives. I’m pretty happy with the new system.

Posted by Scott Laird Sat, 01 Nov 2003 21:00:54 GMT


HTML screen-scraping in Ruby

My little author reading project is written in Ruby, my current scripting-language-of-choice.

Here’s a example of what it takes to grab web pages and extract content from them:

client=HTTPAccess2::Client.new
url="http://www.elliottbaybook.com/..."
parser = HTMLTree::XMLParser.new(false,false)
parser.feed(client.getContent(url))
xml=parser.document

xml.elements.each('//p[@class="small"]') do |node|
  event=BookEvent.new
  event.store="Elliott Bay Book Company"
  event.location="Elliott Bay Book Company"
  event.time=node.to_s.gsub(/<\<[^>]+>/,'')
  event.author=node.elements['./a[1]/b[1]'].text rescue nil
  event.title=nil
  event.note=node.elements['.'].to_s rescue ''

  next unless event.time and event.author

  if event.note =~ / at [0-9].* at ([^<>]*)/
    event.location=$1
  end

  event.time=BookTime.new_from_string(event.time)
  next unless event.time

  books.push(event)
end

The interesting bit is probably xml=parser.document; that’s where Ruby’s HTML parser hands its parse tree off to Ruby’s XML engine, REXML. This lets me use REXML’s XPATH engine for searching through the HTML mess that most bookstores use on their web sites. In this case, all author reading events are inside of <p class=”small”> tags, so I iterate through all of the matching tags and try to create a BookEvent object from each. The author name comes from a <a><b> block inside of the <p> block, and the time and location are extracted via regular expressions.

If book stores had decent web pages, this’d be really easy, but as it is, I had to apply a few heuristics and flat out guess at times, and I’ll have to revisit the code every time they reformat their web sites. But, Ruby worked out really well this time.

Posted by Scott Laird Sat, 01 Nov 2003 20:54:13 GMT


More bookstore calendar updates

I’m still playing with the book store event calendar that I was talking about a day or two ago. I’ve cleaned up the code a little bit, split each bookstore into its own iCalendar file, reorganized the /books directory on scottstuff.net, and installed PHP iCalendar. So, here’s where we stand:

  • You can go to http://scottstuff.net/books/ and see all of the events on a convenient calendar.
  • From the calendar, you can subscribe to each of the individual iCalendar files for each bookstore.
  • You can manually subscribe to the aggregate of all 4 bookstores via webcal://scottstuff.net/books/calendars/seattle.ics.
  • PHP iCalendar can provide an RSS feed for each individual calendar, but it doesn’t work very well for me right now; I’ll probably write an RSS feed directly when I have time.
  • The calendar page looks totally different from the rest of the site. I started to adjust the CSS for it, but it’s a bigger job then I feel like tackling right now.

I’ll probably add Barnes and Noble sometime this weekend.

Posted by Scott Laird Sat, 01 Nov 2003 20:29:44 GMT


Local blogs

I’ve tried searching for local-interest weblogs via Google a few times, but I’ve never really found anything interesting. It’s hard to come up with the right set of keywords, and Google isn’t location-sensitive, at least not yet.

This morning, I noticed this in my logs:

65.248.4.234 - - [31/Oct/2003:05:14:30 -0800] 
   "GET /scott/index.rdf HTTP/1.0" 304 - "-" 
   "Localfeeds: Geographic Syndication, 
   http://www.localfeeds.com using UltraLiberalFeedParser/2.5.3
   +http://diveintomark.org/projects/feed_parser/"
   0 scottstuff.net

Hmm. Could be cool. Since I just added latitude and longitude to my site yesterday, it seemed interesting, so I hit them up to see who else it knew about within 5 miles of home, and it returned 4 or 5 interesting results. It’ll even export an RSS aggregation of local blogs for you. Cool.

So take a look at http://www.localfeeds.com/, and add a ICBM tag to your website.

Posted by Scott Laird Fri, 31 Oct 2003 16:40:33 GMT


Seattle Book Tours

I’ve missed readings by several of my favorite authors over the past year, largely because I haven’t had a good way to track who’s coming to town when. So, being partially insane, I threw together a little Ruby script to extract author visit information from 4 local bookstores and turn it into an iCalendar file, suitable for iCal or Mozilla’s Calendar.

The stores are:

  • University Books
  • Third Place Books
  • Elliott Bay Book Company
  • Seattle Mystery Bookshop

There are a bunch of little things that I need to do to make this usable for people, but it’s almost 2:00 AM, and it works well enough for me. I’ll add a web-based iCalendar reader, add per-store iCalendar files, and maybe an RSS feed later. Oh yeah, and add it to cron. Can’t forget to add it to cron.

Update: I’ve made quite a few changes since I wrote this. See the category index for details.

Posted by Scott Laird Fri, 31 Oct 2003 09:50:10 GMT


Belkin iPod flash reader

I was tempted to get Belkin’s new iPod CompactFlash reader (Belkin F8E461) until I read this:

I am going to call tech support on Monday and find out what the hell is going on here- seems like they stuck a USB interface for a FW device or there is some bug in the software. Xfering 500 MB took 22 minutes, and I shudder to think the chances of getting through 1 full 1GB card without one of their batteries dying- if it weren’t so slow I wouldn’t sweat the batteries, but at that speed… jeesh. 300 kBps just can’t be right! [David Gawlowski on dpreview.com]

In short, it’s so slow that it’s utterly worthless for anyone with more then a handful of pictures. Belkin admits that it’s only supposed to do .3 MB/sec, and says that they aren’t planning on changing it. I’m not sure what they’re thinking; is there really a market for clunky-looking, painfully slow media readers?

At least the software support for this exists on the iPod now, and maybe someone like Griffin will come out with something that doesn’t suck.

Posted by Scott Laird Fri, 31 Oct 2003 01:02:17 GMT


Berke Breathed is in town

Argh! Why doesn’t anyone tell me these things! If I knew in advance, I could work it into my schedule.

Berkeley Breathed will be signing his latest book today at two locations: 3:30-5:30 p.m. at University Book Store, 4326 University Way N.E., Seattle, 206-545-4361; and 7 p.m. at Third Place Books, 17171 Bothell Way N.E., Lake Forest Park, 206-366-3333 [Seattle Times]

Seriously, I’d love to have either an RSS feed of author reading/signing events or an iCalendar that I could point iCal at. If either exists, Google can’t find it.

Posted by Scott Laird Fri, 31 Oct 2003 00:59:49 GMT


address-o-sync

Cool, I’ve been looking for something like this:

via MacMegasite: Address-O-Sync is a new freeware utility ($5 donation suggested) which lets you synchronize your addressbooks via Rendezvous without using iSync or .Mac. [Mac Net Journal]

I’ll have to give it a try when I get home. This plus a bluetooth dongle for my wife’s iMac, and we might finally have reached address-book nirvana: one family address book, editable anywhere (2 phones, 1 palm, 2 Macs) and shared everywhere. Right now, all of my addresses and phone numbers are synced between my devices, but they aren’t shared with my wife’s, and her Mac and phone (T68i) don’t talk to each other.

I swear, I’m never going back to non-synced phone address books. I’d buy a sync-able 5.8GHz home phone in an instant, if anyone made one.

Posted by Scott Laird Thu, 30 Oct 2003 19:02:04 GMT


Spam blocking on scottstuff.net

I just added a mailto: link to each post on scottstuff.net. Since I’ve added a new spam filter, I’m not quite as worried about putting my email address up on a website. But, in a fit of spam prevention, I’m using a JavaScript mailto-rewriter. If your browser doesn’t support JavaScript, then you’ll get a nasty-but-still-understandable mailto: URL. If JavaScript works, then you should get a perfectly decent working URL without even knowing it.

Posted by Scott Laird Thu, 30 Oct 2003 03:15:15 GMT


Handheld predictions

As mentioned before, I’m still waiting for my next handheld to be announced. I want something with a better-then-320x240 display, wireless networking (802.11b, .11a/g would be nice, bluetooth would be nice), at least a SDIO slot (CF type 2 would be nice), and a built-in keyboard. So, I’ve been paying more attention then normal to handheld news for the past few months, and I’ve had a few people ask me where I think things are going over the next year or so.

First, we’re going to see a lot of 640x480 screens early in 2004. Toshiba’s e800 series is the first PocketPC with a VGA-resolution screen, but very few apps have support for it yet. Sharp’s C700-series Zauruses have had VGA screens in Japan for months now, but US models have been limited to 320x240. That’s supposed to change in January with the SL-6000. Sony will probably be the first PalmOS vendor with a VGA screen, probably around the time they ship their first PalmOS 6 handheld.

Speaking of PalmOS 6, it’s supposed to be released from PalmSource at the end of this year, so we’ll probably see handhelds with it start to be announced in March or April. It sounds like this will finally be a real operating system, with multitasking, protected memory, and native ARM applications. PalmOS 5 only runs on ARMs, but it’s mostly emulated 68k code. Since current PalmOS devices can’t really multitask, background tasks like checking email and RSS feeds turn into a coding nightmare. I tried using a Tungsten C a few months ago in a store, and I just couldn’t cope with the delays inherent in switching between mail and web browsing. I really like the basic design of PalmOS, but it hasn’t scaled very well so far, and it was never designed to handle removable media or networking. It’ll be interesting to see what v6 includes; this is supposed to be the Palm equivalent of MacOS 9 -> OS X or Windows 95 -> NT, so it might be a real contender for me. At the very least, it’ll sync with OS X out of the box, unlike PocketPCs (third-party sync tool) or Zaurus (no OS X sync for current handhelds, although they finally released a tool for syncing older models; no iSync).

Later next year, I’d be amazed if we don’t start seeing a few high-end handhelds with embedded hard drives, like Cornice’s 1.5GB $50 1” model, or something similar from Hitachi. These are similar to CompactFlash-based microdrives, but mounted directly onto the system board of the PDA. If these 1.5-4GB models sell well, then expect 1.8” (iPod) models in early or mid 2005, with up to 80GB of disk space. At that point, it’s unclear exactly which market niche they’re going for–that’s more then enough disk space to be a high-end MP3 player, hold thousands of digital still pictures, and a few hours of digital video. It makes for a seriously bulky handheld, although probably not any worse then Sony’s NX/NZ series.

On the feature front, at least half of the over-$300 handhelds on the market should have wireless networking by early next year. Someone, probably Dell, will push that down to $200 or so later in the year. Integrated mini-keyboards will become more popular; right now, HP’s 4355 is the only PocketPC with one, although Palm has one and Sony has quite a few (NX/NZ series, TG50, UX series).

It looks like 3D graphics are starting to make inroads; the Toshiba e800 has an ATI video chip with 2 MB of RAM, the Tapwave Zodiac is a PalmOS 5 game machine with its own 3D chip, and the next rev of MS’s PocketPC software will have 3D support built in. I’m not sure how useful this is in a pure organizer, but higher-end handhelds have been headed towards mini-PC land for a while now, and once they get networking and larger storage capacities, they’re going to start acting a lot more like PCs in a lot of ways.

Two more things: we’re going to see: mini-USB host ports on at least a couple high-end handhelds next year. This will let the handheld talk to keyboards, digital cameras, CD burners, mice, printers, and all sorts of things that don’t really make a whole lot of sense but will happen anyway. Finally, bluetooth-enabled handhelds are going to get bluetooth keyboard drivers eventually, and someone will start marketing a mini-keyboard with bluetooth.

Posted by Scott Laird Thu, 30 Oct 2003 00:14:09 GMT


Spam filtering

I finally turned on server-side spam filtering at home this weekend. I’ve fought doing it for a couple years; first due to lack of decent spam filtering software, and next because Jaguar’s Mail.app did decent spam filtering on its own. Why bother with server-side filtering when client-side filtering works well and is easier to train?

That argument held up for a while, but the latest rounds of pharmaceutical spam have been pretty good at getting past Mail.app’s filters, and moving to Panther hasn’t done anything to help. So, over the weekend, I added a Bayesian filter (Spamprobe) into my Courier-based mail server. As usual, it was more difficult then I had hoped it would be, but not as bad as I’d feared. The first step was turning on Courier’s maildrop filter. This was easy, just a simple edit of /etc/courier/courierd to add

DEFAULTDELIVERY="| /usr/bin/maildrop"

I could have used procmail, but I’ve grown increasingly irritated by its obtuse syntax over the years. I refuse to use languages that consist mostly of punctuation. Once maildrop was running, I added this to my .mailfilter file:

# save mail to the "saved" mbox, better safe than sorry
cc "$HOME/Maildir/.spam.saved"

# score the mail and tag it
SCORE=`spamprobe -8 receive`
xfilter "reformail -I \"X-SpamProbe: $SCORE\""

echo "Score: $SCORE"

# if it's spam, reroute it to the spamprobe mbox
if (/^X-SpamProbe: SPAM/)
  to "$HOME/Maildir/.spam.spam"

This is mostly copied from the README.maildrop that came with the Debian version of spamprobe, but I had to tweak it a bit before it’d drop mail into the right maildir. I then had to create $HOME/.spamprobe and spam/saved, spam/spam, and spam/ham mail folders.

Once this is complete, all mail that comes through my system will be copied to spam/saved, and then scored as spam. Spam will be copied into spam/spam, while non-spam mail is delivered normally.

The next step is to fill spam/spam and spam/ham (“ham is not spam”) with a bunch of samples of spam and non-spam mail. Fortunately, I had 500 or so of each just sitting around. I copied them into place, and then ran a script like this:

IMAPDIR=$HOME/Maildir
spamprobe good $IMAPDIR/.spam.ham/*/*
spamprobe spam $IMAPDIR/.spam.spam/*/*

This tells spamprobe to analyze the contents of my spam/spam and spam/ham folders to discover which keywords signify spam and which signify ham. I then added a cron job to re-run this script hourly.

To train the spam filter, all I need to do is drag messages around in Mail.app. If a spam message appears in my inbox, then I drag it to the spam/spam folder. From time to time, I check the spam/spam folder to look for false positives, and then drag them to spam/ham. The next time the cron job runs, my filter will adjust itself and do a better job categorizing spam.

So far, it’s working well. Most of the spam that I receive is addressed to one specific account that is forwarded from a previous employer; until last night, I was just dumping all of the mail from this account into a folder automatically, and then checking it a couple times per week to remove the ~150 spams/day that it receives. Last night, I stopped filtering it into its own box, and let spamprobe handle it. And, so far, it’s doing a good job. I’ve only seen 3 or 4 false negatives, and those were from early in the training process. Annoyingly, I’ve had 6 false positives that I had to pluck out of the spam folder; one was a MAILTO web form that went to my old college user group mailing list; it was categorized as spam, and that primed the pump so that several followups to the same list also went into the spam box. Once I moved them to the ham folder, mail for that list started making its way into my inbox correctly. Spamprobe also ate an opt-in ad from REI and a notice from a vendor that I wanted to see, but moving both of these to spam/ham seems to have fixed the problem.

According to a grep of my spam/spam folder, I received 217 spam messages yesterday and 101 so far today. Good riddance.

Posted by Scott Laird Wed, 29 Oct 2003 22:32:26 GMT


Photo season

I’m slowly adding pictures to my photo gallery. I still have most of a year’s pictures left to work though, but I’m starting to make headway.

I rented a set of lights and a couple backdrops from Glazers’ this weekend. On Saturday, I shot a ton of Halloween pictures at church, and then followed up with Christmas-card pictures of the kids on Sunday afternoon. I’m kind of amazed that the Halloween pictures are done; I had around 450 shots to sort through, and it took a few hours to get everything categorized, sorted, rotated, and uploaded to the website. The only thing left to do is color-correct everything and burn a CD for Costco to print.

I think I’m finally getting the hang of lighting, at least when I’m working with small groups. I’m actually happy with most of the Halloween shots; last year’s shots were kind of spotty, and I had to throw a lot of shots out because of inconsistent lighting. I was planning on renting a lighting kit with 3 heads, but Glazers was almost completely out of stock by early last week, so I had to settle for the same 2-head Dyna-Lite kit that I’ve rented 3 or 4 times before. I used one of the Dyna-Lite heads at 45 degrees on my right with a big umbrella, and then put the other head on the left, as close to the background as I could get it. I then used a couple black foam-core boards to keep that head from spilling onto the backdrop or directly into my lens. Finally, I used a 550EX on-camera in manual mode for fill, and as a wireless trigger for the Dyna-Lites. Since I was too cheap to rent a light meter, I set things up by shooting a white card with each light in turn and then adjusting it via the D60’s histogram; the main light was at F11, the side light was F8, and the 550EX was around F5.6. This would have been a lot easier if the Dyna-Light let you set the power of each head individually; as it is, you can set 125/250/500 Ws per head, and then adjust the total power output, but you can’t fine-tune the heads with respect to each other. I think I’ll rent something a bit bigger next time and see how that goes.

I’m occasionally amazed that I’m still enjoying photography; it’s been nearly 4 years now. I’m obviously not the world’s greatest photographer, but it’s still fun.

Posted by Scott Laird Tue, 28 Oct 2003 03:28:57 GMT