Home Festplatten / Wechselplatten Festplatten / Wechselplatten Hostadapter
 

13.1 Troubleshooting


Troubleshooting your hard drive


Dieser Artikel ist ausschliesslich in englischer Sprache verfügbar. Wer diesen Artike übersetzt haben möchte sollte es selbst tun. Erreicht mich eine korrekte Übersetzung wird diese hier angefügt.


IT'S DEAD - NOW WHAT?

Okay, you've got something wrong with your hard disk. Let's classify the problem - break it down into a category where we can look at potential causes and solutions. Just scan through the list until you find the information you need. Do note, however, that some hard disk problems can be very subtle or actually be multiple problems; it never hurts to read the whole troubleshooting list. It can inspire good ideas.


I. CATASTROPHIC FAILURES

The disk won't run on. Nothing happens when you flip the power switch - no spinup "vroooom", no "whick whick whick" of the Adaptec control- ler resetting, nothing.

This is a thoroughly sick drive. This usually means a power supply problem; power supplies fail more than anything else. Diagnosis: First: check the fuse. Yes, you'd be amazed how many people forget that. Despite the protestations from a generation of electrical engineering professors, yes, fuses do sometimes blow for no good reason. Try replacing the fuse (same value, of course, and that means the same value on both ends, volts and amps). Don't do something dumb like wrapping it in aluminum foil, or the jam-in-a- thick-piece-of-copper-wire trick. The fuse is trying to tell you your drive is inhaling too much electricity; if you bypass it, you may burn up the drive. This is embarrassing at best.

Not the fuse, eh? Well, make sure the drive is plugged in and that the power strip is "hot". A quick meter check of the power strip might not hurt. I've seen remote lightning bolts take out the MOV's (Metal Oxide Varistors, a power spike protector) and otherwise burn up a power strip.

Okay, it's getting power, still nothing. You'll have to open it up. First, check the power switch with your meter. Make sure it turns off and on (check for 220V AC drop across it in the off position).

CAUTION: THAT CURRENT IS DEADLY. DON'T TRY CHECKING IF YOU'RE NOT FAMILIAR WITH ELECTRONICS.

Second, check for continuity through the transformer. I'm delibe- rately using technicalese here; IF YOU DON'T KNOW WHAT THESE TERMS MEAN, PLEASE DON'T PLAY WITH YOUR POWER SUPPLY. There are dangerous voltages on those big chunks of metal "heat-sinking" the power tran- sistors; I had one drive's power supply really zap me, so I know.

If that looks okay, check for +5 volts DC and +12 volts DC on ALL the power leads coming out of the supply. This is easy. The ACSI-SCSI board requires +5 VDC only (measure +5 to the ground lead running along with it). The hard drive itself and the Adaptec controller require +5 and +12 VDC. Usually the +5 wire is red and the +12 is orange, but don't count on it; it seems to depend on what spool of wire they were using the day they built the supply.

If power goes into the power supply and nothing comes out, you've got a bad supply. Make sure you test this with everything connected to the power supply; an unloaded power supply often shows no output at all! Admittedly, this makes it trickier to troubleshoot; a "dead load" (such as an old, useless hard disk that spins up, but does'nt work otherwise) is real handy here.

If the power supply gives +5 and +12 UNTIL something is connected, then either the supply is weak or the component being supplied is shorting out across the supply. This can be a little tricky to diag- nose. A dead-short board will cause the power supply to "crowbar" and shut itself off to prevent damage, so you tend to "see" a dead supply. Unhook the suspect board, take a Pepsi break and let the supply sit and cool off, then try again.

Crowbarring is often signalled by a "click click click" noise from the drive; also sometimes you'll see the fan barely jerking as the supply turns on and off. That's a symptom of a dead short.

If you find one board that causes the power supply to shut down when it's hooked up, obviously, replace it. New Adaptec 4000's are available everywhere; ACSI-SCSI converters are available from the manufacturers. It is very helpful to have another drive to swap parts with. Make friends with your local dealer's service department - take them out for a beer sometime - and they might let you borrow parts from a shop drive to test yours.

If it's the actual hard disk mechanism that's broken (I've seen that happen several times), then you're probably stuck and you have probably lost all of your data. There are shops that can SOMETIMES repair hard disks and SOMETIMES get your data off dead hard disks, but they are VERY expensive.

Also, I've seen four hard disks that, once spinning, could keep spinning, but which could'nt get the motor started. There's a fix you can try, but it is GUARANTEED TO CAUSE TROUBLE; if you have to do this, it's a last ditch effort; be ready to get all the data you can off this drive. Have your floppies formatted and ready ...

Apply power, then reach to the head stepper motor shaft/cam, and gently wiggle it. That often can cause a head that's frozen to the disk surface to come loose! I know: it sounds like an awful thing to do, but this is desperation strategy.

Once it's spun up, copy all the data off that drive that you can get, and use it for target practice thereafter. You can bet the platters and head are damaged.

WARNING: The drive may never spin up again, so don't turn it off! This might be your only chance to recover your data, so don't waste it. Either make an image copy of the drive to a new drive, dump it to floppy disks, or both.

I have two Microscience HH 1050 20-meg drives that did this to me. As soon as I have Spectre GCR out the door, I'm taking them out for an appointment with my .270 rifle.

If your hard disk squeals unbearably all the time, look under the hard disk mechanism for where the head spindle touches. There's often a small copper "strap" here. A SMALL drop of oil here can cure the squeal. Don't overdo the oil! Oil attracts dust (that's why older cars used an oil-soaked air cleaner - it really pulled the dust out of the air) and if you apply too much, you really get a squeaky drive in a few days. Also, you can very gently loosen the strap, just a little bit, and see if that helps. Don't overdo any of this; the hard disk is incredibly fragile.

Look at your Adaptec (ACB) controller. Does it's LED come on? (Atari wires this LED to the front of the hard disk case.) If not, and your ACB is getting power, then your ACB is sick; you'll probably have to replace it. The ACB should go through a power-up cycle that involves turning that LED on.

Finally, check all the power wiring. You'd be amazed at how many times the connector that brings power to the ACSI-SCSI board on Atari drives can come loose; that'll paralyze the unit. Make sure the plugs are plugged in fully. A loose plug can work for awhile, then oxidize and quit.

I'm getting used to hard disk mechanisms going bad; I've gotten sort of blase about it. For example, just before last Christmas, a 40-meg Miniscribe, 20-meg Microscience, and FOUR Hewlett-Packard 20- meg drives all gave up the ghost one night from an unexpected power glitch. That's what, 140 megabytes of storage? After that, I just keep a spare drive around and swap it in if I suspect drive problems.

If you still don't have the problem solved, start swapping parts, until you've completely rebuilt the drive. Swap the 50-pin, 34-pin and 20-pin cables FIRST; these are the least reliable parts of the system. (One good tug on a clamp-on cable will often kill it.) Then,

swap the power supply, the drive mechanism, the ACB and the ACSI-SCSI board last. Again, you can see why it's really helpful to have a friends drive to swap with.

If you run into a bad cable, THROW IT AWAY. Don't keep it in your junk box, where you might re-use it again. If you want to save the ribbon cable, fine; cut the connectors off with scissors or diagonal cutters. (Note: the clamp-on connectors are not reusable.)

If your drive has endured something like a lightning spike coming into it, it may be that everything inside is fried. Another possibi- lity is a "ground loop" where your drive gets in the way of a acci- dental 220 VAC circuit. If either of these events happen, plan on replacing everything.



II. IT SPINS UP BUT ...

If the drive spins up ("Vrooom") but the head does'nt move ("whick whick whick"), either your Adaptec isn't sending out the head move commands, the drive is deaf and cannot listen or the cabling moving the commands to the drive is bad. Swap and fix appropriately.

Next, TRY SWAPPING YOUR ST/HARD DISK CABLE. I've mentioned this once before, but it's worth repeating. I have had more trouble with that 19-pin cable than with anything else, period. It's just too short, and that causes bend and strains on the internal conductors.

Again, ICD will sell you a new cable. Unless you are darned good at soldering, don't try rolling your own; it's not much fun.

With any luck, your hardware will now be back to it's normal self - you'll be able to turn on the hard disk, hear it spin up, hear the

Adaptec reset and move the heads, and it's ready. If all else fails, try taking your hard disk mechanism to a friend's drive and try to get your data off that way. Dan and I have done this successfully a few times.



III. O.K., THE HARD DISK IS WAKING UP

Well, that's a good sign. Now, we need to make sure the communi- cations between it and the ST are in good shape. Watch the hard disk's "busy" light carefully. At power-up it may either turn on and stay on, turn on and eventually go off when the Adaptec is done resetting the drive. This depends on the Adaptec and is'nt really a symptom of illness. Now turn on your ST. If the hard disk's light is on, it should immediately snap off as the ST says "Reset and Hello" through the hard disk interface. If that light does not snap off it's a sure sign of trouble. I don't know the details of some aftermarket interfaces; this applies only to what I have seen: ICD, Supra, Atari interfaces.

The floppy drive will turn on and try to read in the first sector. If there's no disk in the drive, this will take around five seconds to finish; if there's a disk in the drive, it will take less than a second. Then the ST will try to read in the first sector of the hard disk. This will cause a brief flash of the hard disk drive's light. You can definitely see it, it's just mighty quick.

If this does'nt happen, then your ST is not commanding the hard disk to give it the first sector. Again, suspect the cable first. The next thing to suspect is the Atari DMA chip. I've had this go bad on me several times. The symptoms are that if you boot from the hard disk, the system freezes; if you try to boot from floppy, the floppy window instantly pops up and shows "0 bytes in 0 sectors". It never shows any data on the floppy disk, regardless of what is there. If these symptoms appear, it's your DMA chip - count on it. Almost all the time you can cure this by simply reseating the DMA chip. Open up your ST, find the DMA chip (it's usually in front of the hard disk port, so that the signal lines are as short as possible), pry it up from both sides a little at a time, so that you don't bend the pins to the side, and press it back down again. Of course, anti-static precautions are essential; if you don't know about this, get help from someone who does.

Reseating the chip fixes a lot of DMA problems. What you are doing is scraping off a microscopically thin layer of corrosion on the pins and on the socket. ST owners have been reseating chips for a long time (particularly the MMU and GLUE square chips); add the DMA chip to your bi-monthly reseating schedule.

Naturally, at this point you've cycled power to the hard disk, to be sure it's not getting "stuck" by the bogus commands sent during an ST power off. But watch for that light flashing, if it does'nt flash, something is wrong. If the cable and DMA fixes don't work, just carry your hard disk over to a friend's ST and try it; if it works, your ST is bad - take it to a dealer to get it fixed. If it still does'nt work, try troubleshooting as above.



IV. AUTOBOOT PROBLEMS

Next, let's assume the light flashes and your hard disk tries to self-boot - you see the HD light flashing a few times. Nothing happens, or maybe you get a "Self Boot" message, then the ST dies or the screen gets filled with gibberish or you experience some other bizarre symptom. You've probably got sick autoboot software: something has corrupted it.

If you're set for autoboot, you're going to have to beg and plead for the hard disk to let you boot up to the desktop enough just to fix it! If you've got a hard drive that can be turned on when the ST is running, and not crash the ST, then good! Turn on the ST, put in your hard disk utilities disk, and when the Desktop appears, turn on the hard disk. Re-install the autoboot utility (delete any old autoboot files from the hard disk). That ought to do it.

If your hard disk can't be turned on when the ST is connected - that is, it crashes the ST - welcome to the club. So does mine. Use the program Revive! in the spring 1987 issue of START (and repeated in the May 1989 issue) to make a "bootable floppy disk" which will force the ST to ignore the hard disk at boot. Power up the hard disk, then when the ST boots from this floppy, put in your utilities disk, and re-install the autoboot. That ought to fix it; it did when my hard disk went bad.

Revive! is one of those utilities you don't need often, but boy, when you need it, you NEED it! I keep a Revive! disk in my hard disk tool kit at all times (along with the Supra utilities disk).

If your hard disk used to self-boot, but now boots from floppy, then something has corrupted the autoboot. There are lots of possi- bilities. The "auto-boot" flag could have been shut off. The partition sector could have been damaged, which would disable the boot (the Atari would conclude it was not a "bootable sector"). The hard disk driver program on the hard disk (AHDI.SYS, SUPBOOT.PRG) might have gotten damaged. For instance, Magic Sac's MAGICHD program did this, when it tweaked a partition into Macintosh OS disk format.

Just re-install the auto-boot using the supplied software and the problem will correct itself.

You can do this by booting from floppy, then running the hard disk drive by hand. For instance, with the Supra software, you'd double- click on SUPBOOT.

Watch the hard disk at this point! You should see SUPBOOT poll the hard disk as it wakes up, looking for partitions. If nothing happens (no light flash), you've got a hardware problem. May I repeat? All together now: Distrust the ST/hard disk cable first!

After running SUPBOOT and seeing the light flash, your communi- cations to the hard disk should be restored. Now re-install a hard disk icon (or use a command line shell) so that the ST can access the hard disk. (Note that when you boot from floppy, you're using the floppy's DESKTOP.INF file, so probably all of your Desktop icons are missing. You have to hand-install an icon to access the hard disk from the Desktop.

Then, use SUPUTIL to re-install the automatic boot software. You should be all set.

If you run SUPBOOT but still cannot access your hard disk, there are several possibilities:


1. Your SUPBOOT is bad. Remember, some Atari machines cannot detect all floppy disk errors. Programs really CAN go bad in this manner.

2. The ST cannot talk to the hard drive (no drive light flash) or the communication is corrupted (DMA chip needs reseating).

3. Your partition sector is bad; the data in it has been damaged, such as if the partition tables have been zeroed out by a program error.

If this last has happened, then you'll see the hard disk flash, SUPBOOT will run and exit normally, but you will still not be allowed to talk to the hard disk. For instance, you'll double-klick on the C icon, and get the "Non-existent disk drive" message.



V. TROUBLE IN THE PARTITION SECTOR

At this point you have serious trouble with your partition sector. You'll have to restore it. IF you took the time to write down all of it's specs, as I recommend you do when the system is running okay, then you can use SUPEDIT and punch in the partition sector data. You'll then be able to retrieve the data from your hard disk - maybe only the partition sector is damaged!

If you're like most people and did'nt write down the info and never used Meg-A-Minute-Elite to back up your partition sector, well then, you can either format (and loose all your data), or try to restore your partition sector by hand. Yes, it's true! Meg-A-Minute-

Elite backs up what no other backup program does: the critical partition sector! It's a VERY good idea to use it at least once, just to get a backup of the partition sector; loosing that one sector will shut off access to your hard disk.

It is not easy to restore a partition sector by hand. But try to remember exactly how many megabytes your partition sectors were.

Mostly, everyone uses even numbers: 5 or 10 megabytes. If so, try using these values for partition sizes:

1 megabyte = 2,000 sectors
2 megabytes = 4,000 sectors
5 megabytes = 10,000 sectors
10 megabytes = 20,000 sectors
15 megabytes = 30,000 sectors
16 megabytes = 32,000 sectors

Plug these values into the SUPEDIT partition editing utility, remem- bering to leave one-sector "slop" for fencepost error. I can hear the masses clamoring for an example, so here goes.

Let's assume I have a strange disk layout, of 1-meg, 5-meg, 10- meg and 2-meg partitions in that order (C, D, E, F). My partition table needs to look like this when I'm done with SUPEDIT:

Starting Sector # Length

1-2001 ( 1 meg)
2002-10002 ( 5 meg)
12003-32003 (10 meg)
32004-40004 ( 2 meg)


You can see what I mean about "fencepost error"; 2000 sectors takes us from sector 1 to sector 2001, so the next partition starts at sector 2002.

When done, tell SUPEDIT to write this out as your partition sector. It will question your sanity, but persist; you KNOW what you're doing, right? When it's done, you won't be able to boot from hard disk, but you will from floppy; run the SUPBOOT hard disk driver, install the icons, and test it out. Hopefully it will work.

If not, you've got even more trouble.
The reason I'm stressing this so much is that it is EASY for hard disk software to accidentally blow away the partition sector. The partition sector is sector 0, and 0 is a value often used in programs. If a program accidentally gets a 0 where it should'nt (in the write sector number) and writes to the partition sector, then you've suddenly lost the ability to talk to all of the (still intact) data on your hard disk.

Anyway, that's how to restore your partition sector if it gets damaged. Either use Meg-A-Minute-Elite at least once to backup your partition sector, or repair it by hand with SUPEDIT. I don't know another practical way to do it.

Once More For Emphasis: If your partition sector is damaged, the ST has forgotten where on your hard disk all your "drives" (direc- tories) are. This means EVERYTHING ON YOUR HARD DISK IS INACCES- SIBLE. That's why I'm stressing backup or fix; I see it happen all the time. No other sector on the hard disk is this critical.



VI. DOES YOUR DATA APPEAR DAMAGED?

Okay, let's say your hard disk walks, talks and boots, but strange things happen when you use it. For example, you run a program and get TOS ERROR #35. This means The Operating System (TOS) tried to load the program, did so, and discovered the program was'nt a program; it was just random data, so TOS gave up.

This means you have data damage on your hard disk. Not PHYSICAL damage (well, not necessarily, although it could be; it usually is'nt). This happens all the time. People make big bucks selling disk repair utilities. Or they write columns.

Atari's disk operating system has one particularly nasty bug, known popularly as the "40-folder limit". What happens is every time you access a folder (by opening it, or otherwise "touching" it), information about that folder is loaded into a memory table. Problem: the memory table's size is tiny. After you've touched around 40 folders, the table runs out of space, and the disk operating system goes berserk - writing sectors every which-where, resulting in lost clusters, cross-linked clusters and what-not. It's a nightmare.

To make things worse, folder "slots" are used up just by a drive being connected, by doing a "SHOW INFO" and other ways. It's mighty easy to run out of folder slots.

One common symptom of this is that when you open up a new directory box, and you get data or program files that belong somewhere else. Or you get the dreaded "0 files in 0 items" box, faking you into thinking everything has just been erased.

If this happens, reboot immediately; if you write anything to that hard disk, you're going to damage the directory structure. Your data is probably still out there, and still okay. Upon restarting, go immediately to the offending directory, and try again; if it works this time, take a deep breath - you were lucky.

If this happens from within a program, you may be doomed. Be VERY CAREFUL accessing many folders from within a program; you can run out of slots very quickly that way.

Atari has released an "official" 40-folder bug fixer program, called FOLDRXXX, which is available from bulletin boards, user groups or the August 1989 issue of START. What you do is put this in your AUTO folder with the XXX replaced by how many folder slots you you'd like to reserve. For instance, for 100 folders, name the program "FOLDR100.PRG" in the AUTO folder. (I use FOLDR800 on my system, because I have so many hard disks (six) attached to it, it's necessary.)

At boot-up, FOLDRXXX adds more memory to the dinky memory space. which just moves the problem into the future. Yes, eventually, you will still crash, but hopefully you have RESET or powered off the system before that point. Most people don't touch that many folders; most programmers are aware of the bug, and write their code not to. George Woodside's superb Turtle backup program, for instance, uses only a few folder slots in spite of the fact that it touches everything on the hard disk.

Atari has rewritten GEMDOS, which is the true cause of this problem; by the time you read this, the new TOS 1.4 ROMs (which fix this problem among others) should be available. Contact your Atari dealer for information.

If your directory has been really damaged - say, you get the "0 bytes in 0 items" message - or if folder names are trashed (look like Greek letters, commonly) - or if "Show Info" crashes or shows weird information - you have trouble. (Fair Warning: Once Show Info has been confused once, it takes a reboot to make it work again. If you do a Show Info on a bad partition, then on a good one, it'll show bad data on both.) You have BIG problems. What you need is a good disk fixer program; I don't know of any for the ST! This is most unfortunate.

At this point, you'd best have backed up your hard disk. If you've used Meg-A-Minute-Elite, for instance, it'll take you only 10 minutes to fix a 10-megabyte partition COMPLETELY, directory structure, partition sector, and all. (Not bad, eh?) If you're using Turtle, you'll have to "zero" the partition with hard disk utilities (SUPUTIL) to get a basic directory out there, then go re-create all your folders, then copy the Turtled disks back into them. Be sure to zero the partition; that makes sure the directory structure is new and clean.

The other backup schemes I've seen are too slow to mention; they are so slow you will end up not using them religiously; which is how backups need to be done.

Finally, be sure to run a program to check your disk's structure periodically. Michtron's TUNEUP! does this automatically. A few other disk testers are available. Essentially, they check that what the directory THINKS the disk structure is, and what it really is are the same and let you fix the structure if there's a problem.

TUNEUP has one other very handy feature; it lets you pack all disk data towards the end of the disk. Briefly, whenever the Atari has to write to the disk, it looks for the first open disk sector (starting at 20 or wherever). If you have five or ten megabytes of data before the first open sector, the ST can take up to 30 milliseconds to find an open sector. This greatly slows any write operation, and if you're using lots of temporary files - as in, for example, Alcyon C compiler or in WordPerfect - then you're in big trouble.

Compressing the data to the end of the disk opens up the "fast" area of the hard disk for writing, so the ST does'nt have to search far for the first open sector. Things really zip along when you do this.

I don't recommend TUNEUP! 100 percent, but I have used it's data compaction many, many times. There's not much else like it on the market and it's very much worth having if you use your hard disk much.

Again, the best cure for data loss as the result of directory problems is a good backup schedule. It's much easier to to pull data from a backup than it is to try to reconstruct it.

Generally, however, I find that people don't realize this until they've completely murdered a hard disk and have to rebuild it from scratch. The only positive thing I can say about that experience is that it tends to keep junk from accumulating on your hard disk.

If you have a book on IBM or MS-DOS disk structure, you could read it for additional information about your ST's disk structure, they are compatible. Just find the first sector of a given partition via the partition table, and start tracing. You'll even find the ST "boot block" information to be fairly compatible with IBM boot blocks.

If your directory has been damaged, there's no easy way to tell what programs might have been hurt, especially if you have many folders. Your data may be lost for good or require many hours of painstaking tracing and an intimate knowledge of MS-DOS disk structure (both of which are time intensive) to fix. At that point - and we have all been there - the only reasonable alternatives are a reformat or if you have one, going back to your last backup.



CONCLUSION

Atari disk hardware is pretty good. It's much faster than, say, it's IBM counterpart. It's a rare case of where more speed costs less.

I've tried to give you some of the debugging techniques Dan and I have used over the years, show you some of the common faults we've run into and tell you how to fix them. Believe me when I say that this knowledge was paid for with much blood and sweat. Many of these aren't written up anywhere else (the "0 bytes in 0 folders" = DMA chip problem, for instance) and are a simple fix ... once you know what's wrong.

I wish you luck with your hard disk. May you never have to use the knowledge in this column the same way I had to - a deadline a few hours away, a completely dead system and the data possibly lost for good. But may you keep this article nearby to refer to if it ever does happen.






Copyright © Robert Schaffner (doit@doitarchive.de)
Letzte Aktualisierung am 23. Mai 2004
Home Festplatten / Wechselplatten Festplatten / Wechselplatten Hostadapter