Has it been six months already?
Wow, I just realized that I’ve been writing here for over six months. It’s been kinda fun, but it’s made me very aware of my shortcomings as a writer. Something to work on, I guess.
On the other hand, it’s helped with organization, which is usually my Achilles’ heel. At the very least, I have a record of what I’ve been working on, as well as a place to put reviews of the bits of technological flotsam that drift through my life.
Strange fact: I started this blog as a place to put reviews of assorted camera equipment. How many photography reviews have I posted? None. It just goes to show that Clausewitz was right: no plan survives the first contact with the enemy.
Even more blog spam
They’re back again, 100+ blog spams for some casino. Rather then delete them automatically, I added the mt-blacklist plugin. It includes the ability to bulk-delete comments based on IP address.
Interestingly enough, last week’s anti-blog-spam measures didn’t really help–the spammer followed the comment form right to the new, renamed comment CGI. So, it looks like we’re headed for a real spam arms race. Bastards.
Blog spammers must die
Overnight, I was hit with 108 comment spams for Xenical from 66.36.249.149. Very irritating, especially since MT doesn’t have a good way to delete bulk spam. This spammer was kind of interesting–it looks like he was actually following the HTML from my archive pages, rather then blindly attacking /mt/mt-comments.cgi. That means that simply renaming the comment CGI probably wouldn’t have stopped this attack.
Here’s a chunk of the access log for those who are interested:
66.36.249.149 - - [23/Jan/2004:03:42:54 -0800] "GET /scott/archives/000001.html HTTP/1.0" 200 5368 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net 66.36.249.149 - - [23/Jan/2004:03:43:02 -0800] "POST /mt/mt-comments.cgi HTTP/1.0" 200 59 "-" "http://@nonymouse.com/ (Unix)" 3 scottstuff.net 66.36.249.149 - - [23/Jan/2004:03:43:05 -0800] "GET /scott/archives/000002.html HTTP/1.0" 404 220 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net 66.36.249.149 - - [23/Jan/2004:03:43:13 -0800] "GET /scott/archives/000003.html HTTP/1.0" 200 8678 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net 66.36.249.149 - - [23/Jan/2004:03:43:17 -0800] "POST /mt/mt-comments.cgi HTTP/1.0" 200 59 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net
Google suggests that ‘@nonymouse.com’ is an anonymizer, so the spammer was actually abusing two services, not just mine. Which also means that the IP address given isn’t very useful.
I’m not sure how best to handle this sort of thing in the future–I’ll try renaming mt-comments.cgi to something less obvious, and probably javascript-ify the comment link on my pages. That’s rude to the poor users without javascript enabled in their browser, but I don’t want to spend hours deleting spam again.
Longer-term, it’d be nice if MT added moderated comments, and a way to automatically change the open/moderated/closed status of entries after a set period of time. That way, new posts could have open comments, and then be auto-moderated after a week or two. That seems like a decent compromise to me, and it’s orthogonal to most of the other anti-blog-spam suggestions that I’ve seen.
Bastards.
Comments closed: bizarrely enough, this post gets more comment spam then any other page on my blog (and nearly more then all other pages), so I’ve closed comments.
Note to self, never mention Paris Hilton in blog again
It’s been a weird day, traffic-wise: MSN has decided that my simple mention of Paris Hilton referrer spam is good enough to make me their search engine love me–I’m number 18 on their list of sites when searching for “Paris Hilton Video.” I’ve had at least 55 different users today arrive from MSN’s search engine.
Update: Gack, Google’s at it now, too. I just got a hit from “paris hilton jpegs.” Except this one was from someone with too much time on their hands–I’m apparently on the 44th page of listings on Google.
Update 2: I’m now up to number 9 on MSN’s site. I’ll probably break 150 paris hilton hits today. It’s not a lot of traffic, but it’s just so bizarre that they’ve decided to send it my way.
Access logs
It seems like there’s always something interesting lurking in web access logs, but actually finding the interesting bit is a pain in the neck. For instance, over the past day or so, I discovered that I was briefly the #1 Google listing for “andy serkis seattle” and that Lockergnome included me on their list of 2004 PDA predictions. I didn’t see that coming. Cool. My MPx200 notes have generated a bit more traffic then usual, too, and they’re only a day or so old. Searches for the Sony/Ericsson CAR-100 have finally slacked off; google was sending me piles of CAR-100 traffic for a while.
I’m seeing a bit of referrer spam, too–mostly for paris-hilton-video.blogspot.com. Either that, or they’ve linked to me somewhere that I can’t see, and that link has generated a dozen hits over the last month, all from different IP addresses in different countries.
The thing is, I spotted all of these trends manually, by running tail -f on the log files, and then grepping for interesting strings. None of the web log analyzers seems quite appropriate for blog traffic. And, interestingly enough, searching for “web log analyzer” in google hits way too many (web logs) on (analysis). If anyone has any suggestions or recommendations, feel free to leave a comment.
Comment spam
Ugh, I got hit with 3 comment spams from 66.80.241.23, all for Bulgaria-based pharmacy sites. I’ll move the comment CGI around tomorrow; that’s supposed to be an easy fix for 95% of the comment spammers.
...and added Atom
Now that MT 2.65 is running, I added the Atom feed template and added a link for Atom autodiscovery to my main page. Now if I only had an Atom reader…
For those of you who lost track, oh, 20 minutes ago, Atom is more or less another version of RSS, but redesigned from the ground up and with a rigorous specification and conformance tests. With RSS, the specs never really specified whether <title> blocks could contain HTML, for example. Atom adds an explicit way to say how individual pieces of content are encoded.