Posted by Scott Laird
Fri, 19 Oct 2007 23:28:44 GMT
So, as part of my new home server series, I want to explain why I’m using OpenSolaris instead of Linux.
I’ve used Linux since 0.97.1, in August of 1992. I’ve had at least one Linux box at home continuously since 1993 or so. I’ve had a few small chunks of my code added to the kernel over the years. I’ve built several install disks and one embedded appliance distro from scratch, starting with a kernel and busybox and going on up from there. I’ve written X drivers, camera drivers, and drivers for embedded devices on the motherboard. I’ve managed Great Heaping Big Gobs of Hardware at various jobs. Basically, I know Linux well, and I’ve used it for almost half of my life.
That in itself might mean that it’s time for a change–professionally, I’ve been very tightly focused on Linux, and diversity is a good thing. But that’s not why I’m using Solaris this week. I’m using it because I’m fed up with losing data to weird RAID issues with Linux, and I believe that OpenSolaris with ZFS will be substantially more reliable long-term. Things I’m specifically fed up with:
- md (the Linux RAID driver)’s response to any sort of drive error, even a transient timeout, is to kick the drive from the array, no matter what. Most of the IDE drives that I’ve had over the years have been prone to random timeouts every few months, at least once you bundle more then 2 or 3 of them in a single box and then try snaking massive ribbon cable through the case. My SATA experiences haven’t been substantially better. Linux will happily bump an otherwise working 4-drive RAID 5 array to a 3-drive degraded RAID 5 array on the first failure, and then on to a 2-drive failed array on the second failure. Even when a simple retry would have cleared both errors. This has cost me data repeatedly, because I’ve been forced to manually intervene and re-add “failed” disks to RAID arrays. If I was too slow, then a second drive failure risked total data loss. Even worse, these random transient failures blind you to real drive failures, like the one that ate my NAS box last weekend.
- Actual drive failures can hang the kernel. I’ve had at least 3 cases at home where broken drives either caused system lockups or completely kept the system from booting. That sucks. Odds are some drivers are good while others are broken; apparently I’ve just had bad luck.
- None of Linux’s filesystems are particularly resilient in the face of on-disk data corruption. Compare with ZFS, which checksums everything that it reads or writes.
In short: everything works great when things are perfect, but building a reliable multi-drive storage system requires careful component and kernel compatibility work, and then you have to stay right on top of things if you want everything to keep working. When things stop working, they usually fail badly. That’s almost the complete antithesis of what I want for home: plug it in, and it just keeps working. I don’t want small failures to cascade through the system. Little failures should isolated, identified, and automatically repaired whenever possible. OpenSolaris and ZFS seems to provide that, while Linux with md and ext3 does not.
That’s why I’m planning on using ZFS. My logic for building a server vs. buying another little NAS box is simple: none of the little NAS boxes on the market use ZFS right now, and none of the cheap ones have room for more then 5 drives. I’m planning on using a double-parity system (RAID 6 or ZFS’s raidz2, where the system can cope with a 2-drive failure) plus a spare drive, and that’d only leave me with 2 data disks. The only way that I can get enough data with only 2 disks would be to use 1TB drives, and they’re too pricy right now.
So, I’m willing to spend the time to build a somewhat complex server because I believe (hope?) that it’ll save me time in the future, and it’ll let me avoid ever having to do the reconstruct-from-the-source dance again. I don’t think I lost anything critical last weekend, and I’m reasonably confident that I’ll be able to get things limping along well enough to recover data anyway, but I’ve now done this 3 times in the past 4 years, and I’ve had it.
Coming up soon: backups, OpenSolaris hardware compatibility, and GC-RAMDISK performance benchamarks. Stay tuned :-).
Tags linux, opensolaris, raid, solaris, storage, zfs | 3 comments
Posted by Scott Laird
Thu, 07 Jul 2005 23:46:19 GMT
CNet says that there’s a security bug in zlib 1.2.2. There’s no exploit yet, but since everything uses zlib, this will probably turn into a problem for those who don’t upgrade to 1.2.3 once it becomes available.
Since libpng and OpenSSL both use zlib, we’re going to see a lot of network-based programs with issues.
Posted in Computer Security | Tags linux, openssl, security, zlib | no comments
Posted by Scott Laird
Wed, 25 May 2005 18:52:31 GMT
According to a number of sources, Nokia has just announced a new tablet-like wireless internet device, the Nokia 770. No one really seems to know what to do with it–it’s slightly larger then a PDA with a 4.13 inch 800x480 LCD, 802.11 and Bluetooth, 64 MB of RAM, 128 MB of flash, and an RS-MMC socket. Nokia’s positioning it as a cheaper, more portable alternative to the laptop, and equipping it with a web browser and email software. There have been a number of products with similar aims in the past, but none of them have been able to achieve any amount of success.
The 770 will probably fail, too. It does have a could things going for it, though–it’s a relatively open platform (it supposedly runs Debian Linux), and the software for the device is open-source. The hardware is surprisingly capable for the cost–at $350, this is cheaper then any PocketPC with a VGA screen. It’s a bit limited on the storage front, with room for only a single RS-MMC card (up to 512 MB), but that’s not really all that bad.
Personally, I wouldn’t mind something like this, but I’d be tempted to use it as a portable video player, and I doubt that the 770’s 200-ish MHz OMAP chip has enough oomph to play back video at any reasonable resolution and frame rate.
I’m not really sure what Nokia has up their sleeves here. On one hand, the hardware looks pretty good. Unfortunately, the software is brand new and doesn’t seem to include any PDA-type features–it’s focused entirely on web browsing (using a scaled-down Opera), email, and RSS reading. If Nokia can keep the platform alive for a year or two, it might gain enough support to be interesting, but as it stands I don’t see how it’ll have much of a chance in the market.
Update: Martin at Telepocalypse sees the 770 rather differently, and is very positive on it. Om Malik agrees.
Posted in Handheld and PDA | Tags linux, nokia, nokia770, tablet | 2 comments
Posted by Scott Laird
Mon, 11 Apr 2005 18:12:25 GMT
Newsforge is running an interview with the three main participants in The Great Linux SCM Saga, Linux, Larry McVoy, and Tridge. By and large, it’s a good article, but I suspect that someone who didn’t know the people involved would assume that the whole mess was Tridge’s fault–he’s the one that was working on cloning BitKeeper, even though any sane person would know that it would really piss Larry off. Even after people pointed this fact out to him, he kept working on his BitKeeper tools.
I’d be remiss if I didn’t point out that Tridge has a history of doing this sort of thing. I’m aware of two other cases where he’s dug in and reverse-engineered similar sets of protocols and file formats. The first time, the result was Samba, which was (and still is) really one of Linux’s first killer apps. The second time, he decoded TiVo’s on-disk media format. Pretty much any tool on the net that knows how to extract video from TiVos (except for TiVo’s recent TiVo-to-Go release) is based on Tridge’s work.
That’s not to say that reverse-engineering is all that he does–rsync is his too.
I remember people questioning his ethics during his TiVo work–besides just downloading video from TiVos, his would could (in theory at least) allow someone to buy a TiVo and feed it program guide information without paying TiVo’s monthly subscription. Without that, TiVo’s revenue model falls apart, and the company would be forced to either sue their own users or go out of business. The BitKeeper folks might have paid attention to how he handled the TiVo issue–as I recall, he released the video download code, but kept the programming guide code to himself. In some ways, that actually helped TiVo–I had no qualms about buying a second TiVo, even when their financial footing was shaky. Without Tridge’s programming guide code, a TiVo box without TiVo, Inc would just be a big paperweight. Just knowing that the program guide code existed was enough to ensure that my TiVo would continue to be useful, because someone would pick up the torch if TiVo fell.
I don’t know what Tridge was planning to do with his BitKeeper tool, but based on his past record, I really doubt that he would have used it to sabotage BitMover. Or, at least not to do anything that he saw as sabotage. Clearly Larry McVoy (and to some extent Linus)
saw things differently.
Getting off of BitKeeper is probably best for Linux in the long run. It’s a pity we couldn’t have waited for another year or so for open-source SCM software to mature more, though. There are a number of promising contenders, but they all have issues that keep them from being usable for the Linux kernel today.
Posted in Linux | Tags bitkeeper, linus, linux, samba, scm, tridge | no comments
Posted by Scott Laird
Wed, 06 Apr 2005 14:10:19 GMT
I finally have Xen working on a system at home. I hadn’t expected this to be very difficult, but apparently Xen doesn’t like my new Athlon 64 system (bought mostly for running Xen). They’ll fix it eventually, but for now I’m using an old Athlon 700 system that I had sitting around. It needed a new CPU fan (just try finding Slot A fans these days!), but I was able to scrounge up 512 MB of RAM and an 80 GB hard drive, so it’s perfectly usable.
I built a couple quick disk images and booted them under Xen, and everything worked as expected. This is always a good sign, and it suggests that I’ll be able to make progress on my little virtual-server project without a whole lot of trouble.
Posted in Xen, LWVS | Tags linux, virtualization, xen | 1 comment
Posted by Scott Laird
Tue, 08 Mar 2005 07:23:21 GMT
It’s sort of an axiom of programming that features that aren’t continually used or tested won’t actually work. A similar rule holds for system administration–any feature that hasn’t been tested since the last upgrade is probably broken. An obvious corollary suggests that systems get more reliable as their user load increases–more users means more features are used more frequently, and broken features will be spotted sooner. And the corollary to that is that any server wedged under a desk in someone’s home office is probably flakier then hell because it’s probably just sitting there collecting dust and not getting used.
I’m not convinced that that applies to my home gateway box. It’s a busy little beaver:
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes prot opt in out source destination
234M 75G all -- dsl0 * 0.0.0.0/0 0.0.0.0/0
47M 1001G all -- eth0 * 0.0.0.0/0 0.0.0.0/0
In the 25.75 days since I last rebooted this system, it’s received over 75 GB via its DSL link and around 1 TB over its main Ethernet link. If my math is right, that’s an average of 3.6 Mbps on the Ethernet link and around 270 kbps over DSL. I wasn’t keeping outgoing traffic stats when I first booted this box, but more recent estimates make it look like there’s almost as much outgoing traffic on dsl0 as there is incoming.
CPU load is similarly heavy–the box has averaged 51.9% idle since it was rebooted. My rule of thumb for years was that any production box that was under 80% idle was due to be upgraded soon, because it was probably pegging the CPU during peak times during the day. If the box was under 70% idle, then it was time to start scrounging for an immediate upgrade. By those metrics, this box is way overdue for a major upgrade. Fortunately for my wallet, those metrics don’t really apply to this box–it’s spending a lot of its CPU time on tasks that aren’t particularly critical. Also, Linux 2.6 made some changes to /proc/stat that procinfo doesn’t seem to have picked up on; once you factor those into the equation, the box is really closer to 75% idle. Subtract off the non-critical usage, and the system is probably only 10% busy. I’ll probably upgrade it later this year if my virtual-server project works out, but that’s more for security and reliability then pure performance.
Posted in Computer System Administration | Tags dsl, home, linux, networking, router | no comments
Posted by Scott Laird
Wed, 23 Feb 2005 03:08:59 GMT
There’s an interesting thread going on right now on the Linux netdev mailing list, speculating about the network accelerator technology that Intel’s been talking about recently. No one’s quite sure what Intel is planning on adding, but for the past several years “network accelerator” has usually meant TCP offload engines (ToE), and Linux’s core networking guys are almost famously anti-ToE. Even though no one really knows what Intel’s up to, there’s a feeling that it’s not just ToE this time.
Several people have pointed out other technologies that can make a huge difference without requiring the sorts of compromises that ToE needs to work. For instance, this post by Lennert Buytenhek suggests that PCI and memory system latency is a big problem, but fixing it can have huge payoffs:
The reason a 1.4GHz IXP2800 processes 15Mpps while a high-end PC hardly
does 1Mpps is exactly because the PC spends all of its cycles stalling on
memory and PCI reads (i.e. ‘latency’), and the IXP2800 has various ways
of mitigating this cost that the PC doesn’t have. First of all, the IXP
has 16 cores which are 8-way ‘hyperthreaded’ each (128 threads total.)
I haven’t paid much attention to Intel’s IXP network processor family in the past, and that may be a mistake–from the description here, the IXP2800 sounds like a cross between Tera’s multithreaded CPU and IBM’s new Cell processor. Tera’s CPU, which was designed to support tons of threads, automatically switches between threads whenever one thread blocked due to I/O or memory access. The goal with Tera was to be able to remain efficient while the gap between CPU and memory speeds continued to grow. The IXP2800 isn’t as ambitious as the Tera, but the fundamental concept looks similar–support lots of threads in hardware, and switch when latency gets in the way. The IXP2800’s threaded CPUs aren’t full-blown processors, though–like the Cell, the IXP2800 contains one main CPU and a cluster of smaller domain-specific processors that are specialized for one specific task.
It’s unlikely that Intel will roll something like this into their Xeon CPUs anytime soon, though. It’s certainly not a quick fix–it’d require major changes in any OS that wanted to make use of it, and would probably take 3-6 years before it was really fully utilized.
Massively-multithreaded CPUs aren’t the only approach that has paid off for dedicated network processors, though. Some of FreeScale and Broadcom’s chips know how to pre-populate the CPU’s cache with headers from recently-received packets. This drastically cuts latency, but it seems to require that the CPU and network interface be very tightly coupled. Reducing the overhead needed to talk to the NIC can help, too–apparently some of Intel’s 865 and 875 motherboards use a version of their GigE chip that is connected directly to the north bridge, bypassing the PCI bus entirely, and some benchmarks show substantial improvements.
Reading the thread suggests that most of the effort going into Linux network optimization in the next few years will be happening on the receive end of things. Over the past several years, most higher-end NICs have added limited support for checksum generation and TCP segmentation offloading (TSO), where the CPU can hand the NIC a block of data and a TCP header template, and then have the NIC produce a stream of TCP packets without requiring the CPU to touch the data at all. Relatively little has happened on the receive side, but this seems to be changing. For example, Neterion’s newest card can separate headers from data, and is nearly able to re-assemble TCP streams on its own, sort of the inverse of transmit-time TSO. It’s not clear how many streams the card can handle at a time, though–even my little web server at home is currently maintaining 384 simultaneous TCP connections, and a busy system could easily have tens or hundreds of thousands of open streams. Odds are, throwing 100,000 steams at the card would run it out of RAM and completely negate any benefit that receive offloading would have. Unless it’s bright enough to be able to handle the 1,000 or so fastest streams and then let the main CPU handle the 99,000 that are dribbling data at 28k modem speeds.
This is a fascinating topic, and I can’t wait to see how this will turn out.
Posted in Linux, Computer Networking | Tags intel, linux, networking, sysadmin, toe | no comments
Posted by Scott Laird
Fri, 11 Feb 2005 21:52:02 GMT
As regular readers know, I recently turned up a new DSL circuit at home, replacing an older, slower line that Verizon had refused to upgrade for months. As part of the upgrade process, I needed to buy a new DSL modem. Instead of using an external DSL modem (DSL-Ethernet bridge would probably be more accurate, but “modem” seems to have stuck), I decided to buy a Sangoma S518 PCI ADSL modem. I had two main reasons for preferring this internal modem to a generic external model:
- Better control over upstream buffering, for better VoIP QoS.
- Better visibility into the modem’s state, so I can syslog minor outages and notice things like speed changes.
I chose the Sangoma model instead of a cheap, generic card because the manufacturer strongly supports its use with Linux, and a number of people on the Asterisk-Users mailing list have recommended it. I paid $115 plus shipping from BSD Mall.
Read more...
Posted in Computer Networking | Tags adsl, dsl, linux, neetworking, review, s518, sangoma | 6 comments
Posted by Scott Laird
Wed, 09 Feb 2005 23:21:15 GMT
My DSL modem showed up yesterday, so I dropped it into my gateway box and fired it up. It immediately reported that it was unable to train; there was nothing to talk to on the other end of the phone line yet. Since my official install day is still a couple days out, that didn’t surprise me. Then this morning, I saw this in the logs:
Feb 9 08:19:22 guam kernel: wanpipe1: ADSL Link connected (Down 1792 kbps, Up 448 kbps)
Feb 9 08:19:30 guam kernel: wanpipe1: Link connected!
Feb 9 08:41:03 guam kernel: klogd 1.4.1#11, log source = /proc/kmsg started.
The gap between the second and third lines is the problem–the box went down, hard, right after the DSL line came up. On the other hand, it looks like I’m provisioned above 1.5/384 on the ATM side. Assuming a 20% cell tax, this gives me a usable connection of around 1430 kbps down and 360 kbps up, which isn’t too bad. Now I just have to keep the thing from crashing. I’m rolling my ADSL drivers back from the beta version that I’d started with to the most recent release; hopefully that’ll be good enough to fix my problem.
Posted in Computer Networking | Tags atm, dsl, linux, sangoma | no comments
Posted by Scott Laird
Mon, 10 Jan 2005 23:00:49 GMT
Okay, so my RAID array died because I wasn’t paying enough attention and my 3ware card had already kicked out one perfectly good drive for no obvious reason. No sweat, I can handle that. I as I mentioned before, I took me most of a day, but I recovered almost all of the data off of the failed 4-drive array onto a new 2-drive RAID-0 array. Once the copy was complete, the goal was to destroy the old, broken RAID-5 array, create a new, working RAID-5 array, and then copy all of the data off of the RAID-0 array onto the new RAID-5 array. Then, when everything was complete, I was planning on using the RAID-0 disks as parity and spare drives for the RAID-5 set. Nice and simple, right?
So, by Friday night, I had 6 drives in front of me. One was bad, three were good, but part of the broken RAID array, and two held the data that had been on the RAID array. My goal was to take the 3 good drives and use them to build a new 4-drive RAID-5 array, so I built a software RAID-5 array in degraded mode–that way, I could get away with leaving out the 4th drive at the beginning. Once I copied the data off of the 5th and 6th drives, I was planning on adding them to the RAID-5 array so I’d have a 4th disk plus a spare.
I was very careful not to re-use the broken drive–it was on 3ware channel #2, so I cleverly built my new array using Linux’s sda, sdc, and sdd devices, skipping sdb. Once RAID-5 was running, I formatted the new array, copied everything from the RAID-0 set, broke down the RAID-0 set, and added the drives to the RAID-5 array. And promptly watched everything crumble to dust. My RAID-5 array started out in degraded mode, with 3 of 4 drives active. I then added 2 additional drives, and instead of watching it rebuild to 4 of 4 plus 1 spare, it went to 2 of 4 active. It even sent me this helpful email:
From: scott@mail.sigkill.org
Subject: Fail event on /dev/md1:nfs
Date: January 8, 2005 8:16:43 AM PST
To: scott@sigkill.org
This is an automatically generated mail message from mdadm
running on nfs
A Fail event had been detected on md device /dev/md1.
Faithfully yours, etc.
Although the array was still mounted, any attempt to access it generated a steady stream of I/O errors. What happened, you ask?
Basically, I was an idiot. Like I said, the drive on 3ware channel #2 failed, so I didn’t use drive sdb. Except that 3ware numbers their channels starting with 0. So channel #2 was drive number 3—sdc, not sdb. So I’d rebuilt by array using the bad drive, then copied my data onto the broken disk, and destroyed all of my good copies. I spent all morning Saturday trying to fix things, but I couldn’t even get the kernel to acknowledge that the RAID array existed. I finally gave up and tried cloning sdb onto sdc, to see if that’d work, but it didn’t make a bit of difference–I could at least get mdadm to tell me that sdb had once been a part of a RAID array, but it didn’t recognize any of the data on sdc as any part of anything.
In desperation, I tried re-creating the RAID array exactly as I’d first built it, using sda, sdc, and sdd. Amazingly enough, that worked, and I was able to mount the drive. I then carefully added sdc into the array, watched it rebuild the first 20% of the array, and then fail sdc back out of the array, leaving me back where I started. I finally turned off the computer in disgust and went and played with what was left of our snow.
Sunday was more snow, so I played with the kids, and then finally took one last swing at the computer. I re-built the RAID array again, and then built a RAID-0 array from sde and sdf. I then tried to copy anything that was salvageable off of the broken RAID-5 array. I figured that I’d be able to copy something before it croaked again. I checked back a couple hours later to discover that it’d copied all 216 GB without error. I was stunned–apparently the drive’s problem was really just corruption of a few sectors–writing new data back onto the drive overwrote the weak parts with a new, strong signal, and it was able to read them back safely. Ugh. It wouldn’t resync right because there were still a number of old sectors with old data on them–if I’d zeroed out the whole drive, it’d probably have worked right from the start, for at least a couple months, until it failed again.
So, I went back through the process again, destroying the array built from sda, sdc, and sdd, and then building a new one with sdb this time. There’s no way I’m going to trust the failing drive, even if it did work this time. I copied everything off of the little RAID-0 array, then carefully tore it apart and used its drives to rebuild the big array into its full RAID-5 glory. And it actually worked this time, without errors. Everything was finally finished around midnight last night, and I was able to reboot without problems.
All done, right?
Ha.
This morning I got up to find the screen full of syslogged Ethernet problems–apparently the network card had locked up. I could log in on the console, but I couldn’t ping anything. I rebooted, everything came up okay, and I tried copying a bunch of stuff onto the new RAID array. It copied just fine for about 5 minutes, and then the box locked up hard. No kernel panic or anything, just a dead box. The reset button didn’t help, and it ignored the soft power button, so I had to do the hold-the-power-button-for-5-seconds trick. After that, it didn’t boot right–there were 3ware card errors everywhere–timeouts, not drive problems. It locked up again halfway through booting.
So, practically speaking, I’m right back where I started on Friday morning–my box is dead, but the data is probably fine. I’m going to pop the box open and wiggle some cables, but I probably have bad hardware somewhere in the box–motherboard, 3ware card, or power supply. If this had happened at work, I’d just RMA the whole mess and let the vendor sort it out, but that’s not very useful at home, especially when dealing with a 4-year-old system with a second-hand RAID card. Ugh.
Update: I powered it off for a while, wiggled cables, removed spare hardware, rebooted, and found a nice kernel bug. If you have a RAID array with 4 drives plus a spare, and for some reason the spare’s RAID superblock has a higher timestamp then the 4 data drives, then the kernel’s RAID code will gladly kick the 4 good drives out of the array and keep just the spare. I sense a bug report in my near future.
Posted in Computer System Administration | Tags broken, ide, linux, raid | no comments
Posted by Scott Laird
Sat, 08 Jan 2005 08:29:24 GMT
I’ve lost a lot of hard drives over the years, but I’ve never really had the ability to put one under the microscope, so to speak, to see what happened and what I could have done to detect the failure before it became a problem. In generally, even an extra 24 hours’ notice would greatly reduce the amount of data lost and reduce the pain involved in replacing failed drives. Drive makers understand this, and added the S.M.A.R.T. drive monitoring standard to drives years ago. Under Linux, the smartmontools package provides a number of tools for monitoring drives’ SMART status; I’ve been increasingly vigilant about running it on all of my systems, hoping that it’ll let me spot drive failures before data loss occurs.
I lost another drive this week. This is the first drive that I’ve lost that has been actively monitored by smartmontools the entire time, and the logs produced are instructive. Unfortunately, I didn’t pay close enough attention to SMART to prevent data loss, but there are a number of lessons contained in the logs produced. By understanding what the precursors of this drive failure, we should be able to be more reactive when faced with future failures.
First, here are the basic specs on the system and drives involved:
- Athlon 700 (slot A)
- 384 MB RAM (PC133)
- Via KT133 chipset (Asus K7A MB, I think)
- 3ware 7500-8 8-channel IDE RAID controller
- 3 Maxtor 160 GB drives, 1 Hitachi 160 GB drive
The drive that failed was a Maxtor, on channel #2. Here’s what smartmontools 5.30 has to say about the drive in its current condition:
Device Model: Maxtor 4A160J0
Serial Number:A608B7WE
Firmware Version: RAMB1TU0
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Fri Jan 7 11:47:02 2005 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity was
completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 24) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: ( 243) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 99) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 214 214 063Pre-fail Always - 11805
4 Start_Stop_Count 0x0032 253 253 000Old_age Always - 73
5 Reallocated_Sector_Ct 0x0033 249 249 063Pre-fail Always - 41
6 Read_Channel_Margin 0x0001 253 253 100Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000Old_age Always - 0
8 Seek_Time_Performance 0x0027 252 244 187Pre-fail Always - 34394
9 Power_On_Hours 0x0032 224 224 000Old_age Always - 24560
10 Spin_Retry_Count 0x002b 253 252 157Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 253 253 000Old_age Always - 76
192 Power-Off_Retract_Count 0x0032 253 253 000Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000Old_age Always - 0
194 Temperature_Celsius 0x0032 253 253 000Old_age Always - 38
195 Hardware_ECC_Recovered 0x000a 253 252 000Old_age Always - 43456
196 Reallocated_Event_Count 0x0008 251 251 000Old_age Offline - 2
197 Current_Pending_Sector 0x0008 249 249 000Old_age Offline - 41
198 Offline_Uncorrectable 0x0008 253 252 000Old_age Offline - 0
199 UDMA_CRC_Error_Count0x0008 199 199 000Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000Old_age Always - 0
201 Soft_Read_Error_Rate0x000a 253 216 000Old_age Always - 37
202 TA_Increase_Count 0x000a 253 248 000Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 245 180Pre-fail Always - 19
204 Shock_Count_Write_Opern 0x000a 253 252 000Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000Old_age Always - 0
207 Spin_High_Current 0x002a 253 252 000Old_age Always - 0
208 Spin_Buzz 0x002a 253 252 000Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 154 148 000Old_age Offline - 0
99 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0
100 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0
101 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0
smartctl also reports a bunch of event log results after this, but they’re not completely relevant right now–the events in question didn’t occur until things started failing.
Looking at the results that smartctl reports, it doesn’t look like anything is particularly wrong. None of the pre-fail statistics are outside of their ideal range, and then old-age statistics make the drive look nearly new. Just looking at these numbers wouldn’t give you any indication that the drive was throwing uncorrectable read errors every few minutes.
So, let’s move on to the syslog results. The smartmontools package actively monitors each of these parameters and logs changes to syslog from time to time. You can look at the raw logs if you want to see the whole picture, but it’s way too long to include in its entirety here. The short version goes like this:
Dec 5 07:31:06 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 251 to 252
Dec 5 15:01:06 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
Dec 5 15:31:04 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 253 to 252
Dec 5 20:01:04 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
This pattern continues on like this the whole time, with Seek_Time_Performance wandering from 251 to 253 and back. All 3 of my Maxtor drives do this all the time, and have since they were brand-new. It’s just noise in the logs, not a real problem. Next:
Dec 8 01:31:06 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 253 to 252
Dec 8 02:01:05 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 252 to 253
This is the first indication of trouble. Notice that it’s not very threatening–Hardware_ECC_Recovered just barely changed and it immediately flipped back to its old value. Plus, it’s marked as a “usage attribute,” which indicates that it’s non-threatening. Continuing:
Dec 13 04:50:56 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 253 to 252
Dec 13 05:20:56 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 252 to 253
Dec 13 06:50:56 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 253 to 252
Dec 13 07:20:56 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
Dec 13 09:50:57 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 253 to 252
Dec 13 11:20:56 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 252 to 253
Dec 13 13:50:56 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 253 to 252
Dec 13 21:20:56 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 253 to 252
Dec 13 21:50:57 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
Dec 13 21:50:57 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 252 to 253
This is the first time that Hardware_ECC_Recovered reoccurred after the first occurrence on the 8th. I left the Seek_Time_Performance lines in, just to show that the ECC lines aren’t particularly common–the Seek Time lines show up every couple hours, day in, day out.
The ECC notices continue, showing up again on the 16th, 18th, 25th, and again at 5:20 AM on the 1st. That’s where things start getting interesting:
Jan 1 03:20:57 starting scheduled Long Self-Test.
Jan 1 03:50:56 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 253 to 251
Jan 1 05:20:56 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 253 to 252
Jan 1 05:50:56 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 251 to 252
Jan 1 05:50:56 SMART Usage Attribute: 196 Reallocated_Event_Count changed from 253 to 252
Jan 1 05:50:56 SMART Usage Attribute: 198 Offline_Uncorrectable changed from 253 to 252
Jan 1 05:50:56 Self-Test Log error count increased from 0 to 1
Jan 1 06:20:55 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
Jan 1 06:20:55 SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 252 to 253
At this point, I hadn’t seen any actual errors yet, but the drive’s SMART self-test had spotted a bad sector. The 2nd and 3rd were basically the same–their self test reported that the same sector was still bad. All hell started to break lose on the 4th:
Jan 4 02:50:56 SMART Usage Attribute: 196 Reallocated_Event_Count changed from 252 to 251
Jan 4 02:50:56 SMART Usage Attribute: 198 Offline_Uncorrectable changed from 252 to 253
Jan 4 07:35:40 ATA error count increased from 980 to 981
Jan 4 08:35:40 SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
Jan 5 02:05:42 starting scheduled Short Self-Test.
Jan 5 02:35:40 SMART Usage Attribute: 198 Offline_Uncorrectable changed from 253 to 252
Jan 5 02:35:40 Self-Test Log error count increased from 3 to 4
Jan 5 06:36:08 SMART Prefailure Attribute: 5 Reallocated_Sector_Ct changed from 253 to 252
Jan 5 06:36:08 SMART Usage Attribute: 197 Current_Pending_Sector changed from 253 to 252
Jan 5 06:36:10 ATA error count increased from 981 to 1293
Jan 5 07:14:45 ATA error count increased from 1293 to 2377
By this point, I was seeing errors in the filesystem. Syslog was filling up with 3ware and XFS errors about disk problems. Things were starting to suck. On the 6th, I ordered new drives, and this morning I started installing them. I’m currently attempting to recover whatever data I can off of the bad disk.
So, there are a couple things that we can learn from this. First, if I’d been paying attention and immediately migrated data off of the failing disk as soon as SMART told me that it had developed a bad sector, then I’d probably have been okay. It took 2 or 3 days before the problem got bad enough to be visible at the filesystem level. Second, if I’d had enough familiarity with this particular Maxtor drive, then I should have noticed that something weird was happening when the ECC errors started climbing. None of my other Maxtor drives have ever logged an ECC message; that makes the Hardware_ECC_Recovered message look kind of suspicious, but that probably only holds for this exact family of Maxtor drives. In a commercial environment, where I had dozens or hundreds of similar drives, I’d want to tell my log monitoring software to pay special attention to that message, because it looks like a good indicator of drive failure.
More importantly, though–if I’d been paying closer attention to my 3ware card, I would have noticed that this 4-drive RAID 5 array was running in degraded mode before the drive failed. If I’d fixed that then, then the drive failure wouldn’t have cost me any data–the array would have dropped the failing drive and warned me, and that would have been that. Instead, I’m looking at a weekend’s worth of hassle as well as some data loss. When I get everything back up and running, I’m probably going to switch from using the 3ware card’s hardware RAID 5 to software RAID 5–I trust Linux’s RAID monitoring tools more then I trust 3ware’s. Also, I was only getting ~25 MB/sec writing with the 3ware’s hardware RAID 5, while I should get closer to 100 MB/sec with software RAID 5.
Posted in Computer System Administration | Tags broken, drive, ide, linux, smart | 13 comments
Posted by Scott Laird
Fri, 07 Jan 2005 17:51:59 GMT
Well, this isn’t looking promising–my new drives arrived, but the big RAID array is throwing errors left and right. I’m not sure how much data I’ll be able to recover off the thing. Most of the data on the drive is reconstructible, but not everything. Most of this contents are old digital pictures, but I’ve tried to write them all to DVD before throwing them onto the RAID array. Odds are I missed some stuff, though.
Amazingly enough, the system logs seem to have survived unscathed, so I’ll write up a “anatomy of a drive failure” article later, showing what this looked like from a SMART perspective. Since I’ve never actually seen a SMART-monitored drive failure before, it should be somewhat educational.
Posted in Computer System Administration | Tags broken, drive, failure, linux, raid, smart | no comments
Posted by Scott Laird
Wed, 08 Dec 2004 20:50:17 GMT
If you’ve been living under a rock, then you might not have noticed that PalmSource has announced that they’re going to be building a version of the Palm operating system that runs on top of Linux. It’s not completely clear what this means–are they replacing the kernel in PalmOS 6 with Linux, or is this a parallel project, intended to fit into new niches? PalmSource released an open letter to the Linux community that provides a few details:
- Existing 68k-based Palm apps will work fine.
- Apps based on the new Cobalt API will need to be recompiled.
- ARM-based apps for PalmOS 5 aren’t mentioned, it’s probably safe to assume that most of them will break.
- They’re going to enhance the Linux kernel as needed and contribute their changes back to the community.
- It’ll be possible to run Linux apps underneath their UI, but if you want a user interface, you’ll need to use their API. In other words, it’ll be possible to run things like Apache and MySQL on PalmOS for Linux, but not X applications.
- Their licensing model for PalmOS itself isn’t changing–they’re still licensing the whole package to hardware manufacturers and expecting them to port it to their hardware. Presumably, this will become easier when using Linux, because it comes with more drivers and Linux driver programming is a easier skill to hire then PalmOS driver programming.
Of course, that glosses over most of the important issues. Particularly, is any vendor actually going to ship this? Ever? PalmOS 6 (“Cobalt”) was released to manufacturers at the end of 2003, and not only is there no PalmOS 6 hardware available, there aren’t even any rumors of any on the horizon. It’s unclear if PalmOne will ever ship a PalmOS 6 device. It’s entirely possible that the only PalmOS 6 hardware to ship in 2005 will be from afleet of small asian contract manufacturers building for local markets, although Samsung may have something up their sleeves.
Given the glacial rate of PalmOS 6’s adoption, PalmSource will probably be best off focusing all of their attention onto PalmOS for Linux and calling it PalmOS 7, because there’s no way they can carry three software lines–PalmOS 5, PalmOS 6, and PalmOS for Linux. Since current PalmOS 6 applications won’t be binary-compatible with PalmOS for Linux, there’s no way they can call it PalmOS 6.2 and pretend that it’s an extension of the current 6.x line. If they’re going to push a Linux product at all, then they need to push it hard, and they can’t push two “next generation” products that are mutually incompatible.
Which brings up the big question: when will it be ready? After reading their press releases, I don’t thing they’ve been working on this for very long. They certainly aren’t ready to ship anything, and I’d be surprised if they actually have much more then a proof-of-concept port in-house. On the other hand, they have a solid, well-known base to work from, so it’s not like they have to fight with alpha-grade build tools, flaky OSes, and all of the other moving targets that they presumably had to deal with when building PalmOS 6. Porting the current PalmOS to run on top of the Linux framebuffer device shouldn’t be very hard. Adding support for Linux’s network stack might be interesting–as I recall, PalmOS 5’s TCP stack was entirely located in user space, so it the API might not be very close to the traditional BSD socket API, but I don’t really know. Porting 68k apps will be easy; they already have an emulator that runs on Linux and has for years. Adapting it to the new framework shouldn’t require a whole lot of work.
Unfortunately, the one thing that will probably be hardest is the thing that makes PalmOS so unique–it’s filesystem, or rather the lack of one. Traditionally, PalmOS applications don’t really have the notion of saving or multitasking–everything lived in RAM, and switching between programs didn’t involve a whole lot of extra effort. Applications kept their data organized into databases, not files, and they edited the databases directly, without any sort of “save” step. This meant that switching between apps is fast and gives a good user experience for simple applications, but it hasn’t scaled well because it doesn’t provide an easy way to manage block-based storage, like external flash cards or internal hard drives. Instead, PalmOS has had to add an whole extra API for accessing filesystem-based devices, and this has left us in a state where some applications won’t run off of flash cards, and many applications are unable to access data saved on flash cards.
With a virtual-memory based OS like Linux, it’s possible to fake a lot of this with mmap, but that isn’t ideal when you’re dealing with flash cards–it’s easy to wear out most flash cards today by sending them thousands of small writes, and that’s what I’d expect to see when changing a mmaped database. Also, what happens when a flash card is ejected while an application has a file mapped? Linux is never happy when removable devices go away, but causing applications to crash just because the card was removed is seriously user-unfriendly. If mmap won’t work, the big alternative is to copy things to RAM transparently and then copy them back out when done, but that will push the memory requirements up, which will push up costs and limit battery life.
Given all of this, I’d be surprised to see a PalmOS for Linux device before mid-2006, and that’s a long ways away. It’s not clear that the Palm world can wait for another year and a half, falling further and further behind the networking and multitasking abilities of their PocketPC-based competitors. Given that, PalmSource must be feeling a lot of pressure from their licensees to switch to Linux, or they wouldn’t have made this announcement at all.
Posted in Linux, Handheld and PDA | Tags linux, palm, palmos, palmos6 | no comments
Posted by Scott Laird
Wed, 01 Dec 2004 05:04:25 GMT
I made a bit more MythTV progress today. DVD playing now works perfectly. I had had three problems:
- Audio was really quiet. After upgrading mplayer, I decided that this was really an issue with my receiver–it was decoding analog Dolby Surround correctly, but it wasn’t really configured for my speakers. A little bit of fiddling and it’s acceptable, if still a bit quiet. The reference source that I was using for comparisons is really loud–the meter on the receiver is peaking out all the time, while Finding Nemo (my DVD test today) is really just about where it should be.
- Mplayer was dropping frames while playing DVDs, but DVD rips played just fine. DMA wasn’t enabled on my DVD drive. Once I fixed that, it became perfect.
- I couldn’t eject DVDs without opening up a shell and
umounting /dev/cdrom. I’m not sure what was up here, but something in KnoppMyth was automounting /dev/cdrom every few seconds. I commented out the entry in /etc/fstab, and everything seems okay–I can still play DVDs, but the eject button on the drive works now.
At this point, MythTV is an acceptable DVD player for me. It still isn’t perfect–it takes too many button pushes on the remote to start playing, and the remote buttons aren’t mapped quite right. In other words, it’s still kind of complex, but it works fine once you get through the complexity.
On the other hand, the image is stunning on the projector. I think the jump from NTSC DVD player to VGA DVD player is almost as big as the jump from VHS to NTSC DVD, at least in my setup.
Posted in Toys, MythTV | Tags dvd, entertainment, linux, mythtv | 2 comments
Posted by Scott Laird
Tue, 09 Nov 2004 17:42:27 GMT
Slashdot has an article this morning on the OpenBSD people’s new BGP daemon, OpenBGPD. In essence, the OpenBSD people did the same thing that they’ve done repeatedly before, and taken a protocol that didn’t have an open, secure implementation and provided a clean, minimalistic, BSD-licensed tool.
Personally, I find OpenBGPD kind of fascinating, because I’ve worked with router jockeys for years, and I get dragged into “can we run a BGP daemon on this PC” discussions with surprising frequency.
OpenBGPD’s stated goals include this fun little snippet:
Provide a lean implementation, sufficient for a majority. Don’t try to support each and every obscure usage case, but cover the typical ones
And that’s where my problem lies. I don’t think I’ve ever been asked for a “lean implementation” of BGP. Every time I’ve been dragged into a BGP discussion, it’s been because network engineers have been trying to do something bizarre and creative with BGP, and the tools that they’re used to using aren’t sufficient. For instance, at Internap, we wanted to add per-prefix, per-peer prepending for a huge number of prefixes, and we wanted to change the path selection algorithm to include a bunch of extra information that we had on reachability and performance. In other cases, I’ve been asked for simulators and BGP loggers that could feed BGP prefix reachability information into a database. Inevitably, every time someone needed just a “lean implementation,” they’d already have a Cisco box handy and they’d use it instead of monkeying with BGP on a PC.
That’s not to say the PCs make lousy routers or anything like that–the price/performance is impossible to match with anything from Cisco–but that the totals costs involved in any BGP peering that I’ve seen make the cost of the router little more then noise in the equation. If you’re paying tens of thousands of dollars per month for multiple pipes to providers, then what does saving $20k on a router buy you, besides maintenance and reliability headaches and a hard time finding network engineers familiar with your setup? Most of the time, it’s cheaper to spend $20k on hardware and make it up on productivity and reduced downtimes.
So, while OpenBGPD is cool, I’m not sure how useful it really is outside of test labs and maybe small ISPs, if there are any of them left. On the other hand, I’d love a good OpenBGPD-ish OSPF implementation. I’ve played with Zebra, and the whole design of the thing just rubs me wrong (although Quagga might be better). I need to remember to actually give Xorp a try, too. OSPF is more useful inside of existing networks, and it makes a lot more sense on a LAN then BGP does.
When it gets down to it, I suppose my real point is this: it’s largely pointless to scale PC-based routers up to make them compete toe-to-toe against Cisco’s big WAN routers, because the network
costs and the maintenance costs of doing one-off routers works against us. It’s also really hard to get reliable, well-tested WAN interface cards for anything faster then a T1. Try finding a PCI OC-12 POS card with Linux drivers sometime.
On the other hand, other alternatives make a huge amount of sense:
- Scale them down. You can build a cheap Linux router for almost no money these days–look at the Linksys WRT54G.
- Scale them out. Imagine a medium sized company replacing all of their assorted branch office routers with PCs talking to DSL and providing QoS, routing, firewalling, VPNs, VoIP, etc. It’s expensive to do it once, but you can replicate the work onto a hundred devices for very little additional cost.
- Push them into niches. There are cases where the fantastic flexibility of PCs can make them much more useful then an equivalent Cisco. Linux, for example, has no problem running multiple routing tables and a fantastic number of firewall rules. You can do amazingly creative things with just the stock tools, if you can figure out how to use them.
Posted in Computer Networking | Tags bgp, cisco, linux, openbgpd, openbsd, router | 1 comment