13.1 Troubleshooting Troubleshooting your hard drive
Dieser Artikel ist ausschliesslich in englischer Sprache
verfügbar. Wer diesen Artike übersetzt haben möchte
sollte es selbst tun. Erreicht mich eine korrekte Übersetzung
wird diese hier angefügt.
Okay, you've got something wrong with your hard disk. Let's
classify the problem - break it down into a category where we can look
at potential causes and solutions. Just scan through the list until
you find the information you need. Do note, however, that some hard
disk problems can be very subtle or actually be multiple problems; it
never hurts to read the whole troubleshooting list. It can inspire
good ideas.
I. CATASTROPHIC FAILURES
The disk won't run on. Nothing happens when you flip the power
switch - no spinup "vroooom", no "whick whick
whick" of the Adaptec control- ler resetting, nothing.
This is a thoroughly sick drive. This usually means a power supply
problem; power supplies fail more than anything else. Diagnosis:
First: check the fuse. Yes, you'd be amazed how many people forget
that. Despite the protestations from a generation of electrical
engineering professors, yes, fuses do sometimes blow for no good
reason. Try replacing the fuse (same value, of course, and that means
the same value on both ends, volts and amps). Don't do something dumb
like wrapping it in aluminum foil, or the jam-in-a-
thick-piece-of-copper-wire trick. The fuse is trying to tell you your
drive is inhaling too much electricity; if you bypass it, you may burn
up the drive. This is embarrassing at best.
Not the fuse, eh? Well, make sure the drive is plugged in and that
the power strip is "hot". A quick meter check of the power
strip might not hurt. I've seen remote lightning bolts take out the
MOV's (Metal Oxide Varistors, a power spike protector) and otherwise
burn up a power strip.
Okay, it's getting power, still nothing. You'll have to open it up. First, check the power switch with your meter. Make sure it turns off and on (check for 220V AC drop across it in the off position). CAUTION: THAT CURRENT IS DEADLY. DON'T TRY CHECKING IF YOU'RE NOT
FAMILIAR WITH ELECTRONICS.
Second, check for continuity through the transformer. I'm delibe-
rately using technicalese here; IF YOU DON'T KNOW WHAT THESE TERMS
MEAN, PLEASE DON'T PLAY WITH YOUR POWER SUPPLY. There are dangerous
voltages on those big chunks of metal "heat-sinking" the
power tran- sistors; I had one drive's power supply really zap me, so
I know.
If that looks okay, check for +5 volts DC and +12 volts DC on ALL
the power leads coming out of the supply. This is easy. The ACSI-SCSI
board requires +5 VDC only (measure +5 to the ground lead running
along with it). The hard drive itself and the Adaptec controller
require +5 and +12 VDC. Usually the +5 wire is red and the +12 is
orange, but don't count on it; it seems to depend on what spool of
wire they were using the day they built the supply.
If power goes into the power supply and nothing comes out, you've
got a bad supply. Make sure you test this with everything connected to
the power supply; an unloaded power supply often shows no output at
all! Admittedly, this makes it trickier to troubleshoot; a "dead
load" (such as an old, useless hard disk that spins up, but
does'nt work otherwise) is real handy here.
If the power supply gives +5 and +12 UNTIL something is connected,
then either the supply is weak or the component being supplied is
shorting out across the supply. This can be a little tricky to diag-
nose. A dead-short board will cause the power supply to
"crowbar" and shut itself off to prevent damage, so you tend
to "see" a dead supply. Unhook the suspect board, take a
Pepsi break and let the supply sit and cool off, then try again.
Crowbarring is often signalled by a "click click click"
noise from the drive; also sometimes you'll see the fan barely jerking
as the supply turns on and off. That's a symptom of a dead short.
If you find one board that causes the power supply to shut down
when it's hooked up, obviously, replace it. New Adaptec 4000's are
available everywhere; ACSI-SCSI converters are available from the
manufacturers. It is very helpful to have another drive to swap parts
with. Make friends with your local dealer's service department - take
them out for a beer sometime - and they might let you borrow parts
from a shop drive to test yours.
If it's the actual hard disk mechanism that's broken (I've seen
that happen several times), then you're probably stuck and you have
probably lost all of your data. There are shops that can SOMETIMES
repair hard disks and SOMETIMES get your data off dead hard disks, but
they are VERY expensive.
Also, I've seen four hard disks that, once spinning, could keep
spinning, but which could'nt get the motor started. There's a fix you
can try, but it is GUARANTEED TO CAUSE TROUBLE; if you have to do
this, it's a last ditch effort; be ready to get all the data you can
off this drive. Have your floppies formatted and ready ...
Apply power, then reach to the head stepper motor shaft/cam, and gently wiggle it. That often can cause a head that's frozen to the disk surface to come loose! I know: it sounds like an awful thing to do, but this is desperation strategy. Once it's spun up, copy all the data off that drive that you can
get, and use it for target practice thereafter. You can bet the
platters and head are damaged.
WARNING: The drive may never spin up again, so don't turn it off!
This might be your only chance to recover your data, so don't waste
it. Either make an image copy of the drive to a new drive, dump it to
floppy disks, or both.
I have two Microscience HH 1050 20-meg drives that did this to me. As soon as I have Spectre GCR out the door, I'm taking them out for an appointment with my .270 rifle. If your hard disk squeals unbearably all the time, look under the
hard disk mechanism for where the head spindle touches. There's often
a small copper "strap" here. A SMALL drop of oil here can
cure the squeal. Don't overdo the oil! Oil attracts dust (that's why
older cars used an oil-soaked air cleaner - it really pulled the dust
out of the air) and if you apply too much, you really get a squeaky
drive in a few days. Also, you can very gently loosen the strap, just
a little bit, and see if that helps. Don't overdo any of this; the
hard disk is incredibly fragile.
Look at your Adaptec (ACB) controller. Does it's LED come on?
(Atari wires this LED to the front of the hard disk case.) If not, and
your ACB is getting power, then your ACB is sick; you'll probably have
to replace it. The ACB should go through a power-up cycle that
involves turning that LED on.
Finally, check all the power wiring. You'd be amazed at how many
times the connector that brings power to the ACSI-SCSI board on Atari
drives can come loose; that'll paralyze the unit. Make sure the plugs
are plugged in fully. A loose plug can work for awhile, then oxidize
and quit.
I'm getting used to hard disk mechanisms going bad; I've gotten
sort of blase about it. For example, just before last Christmas, a
40-meg Miniscribe, 20-meg Microscience, and FOUR Hewlett-Packard 20-
meg drives all gave up the ghost one night from an unexpected power
glitch. That's what, 140 megabytes of storage? After that, I just keep
a spare drive around and swap it in if I suspect drive problems.
If you still don't have the problem solved, start swapping parts, until you've completely rebuilt the drive. Swap the 50-pin, 34-pin and 20-pin cables FIRST; these are the least reliable parts of the system. (One good tug on a clamp-on cable will often kill it.) Then, swap the power supply, the drive mechanism, the ACB and the
ACSI-SCSI board last. Again, you can see why it's really helpful to
have a friends drive to swap with.
If you run into a bad cable, THROW IT AWAY. Don't keep it in your junk box, where you might re-use it again. If you want to save the ribbon cable, fine; cut the connectors off with scissors or diagonal cutters. (Note: the clamp-on connectors are not reusable.) If your drive has endured something like a lightning spike coming
into it, it may be that everything inside is fried. Another possibi-
lity is a "ground loop" where your drive gets in the way of
a acci- dental 220 VAC circuit. If either of these events happen, plan
on replacing everything.
II. IT SPINS UP BUT ...
If the drive spins up ("Vrooom") but the head does'nt
move ("whick whick whick"), either your Adaptec isn't
sending out the head move commands, the drive is deaf and cannot
listen or the cabling moving the commands to the drive is bad. Swap
and fix appropriately.
Next, TRY SWAPPING YOUR ST/HARD DISK CABLE. I've mentioned this once before, but it's worth repeating. I have had more trouble with that 19-pin cable than with anything else, period. It's just too short, and that causes bend and strains on the internal conductors. Again, ICD will sell you a new cable. Unless you are darned good
at soldering, don't try rolling your own; it's not much fun.
With any luck, your hardware will now be back to it's normal self - you'll be able to turn on the hard disk, hear it spin up, hear the Adaptec reset and move the heads, and it's ready. If all else
fails, try taking your hard disk mechanism to a friend's drive and try
to get your data off that way. Dan and I have done this successfully a
few times.
III. O.K., THE HARD DISK IS WAKING UP
Well, that's a good sign. Now, we need to make sure the communi-
cations between it and the ST are in good shape. Watch the hard disk's
"busy" light carefully. At power-up it may either turn on
and stay on, turn on and eventually go off when the Adaptec is done
resetting the drive. This depends on the Adaptec and is'nt really a
symptom of illness. Now turn on your ST. If the hard disk's light is
on, it should immediately snap off as the ST says "Reset and
Hello" through the hard disk interface. If that light does not
snap off it's a sure sign of trouble. I don't know the details of some
aftermarket interfaces; this applies only to what I have seen: ICD,
Supra, Atari interfaces.
The floppy drive will turn on and try to read in the first sector.
If there's no disk in the drive, this will take around five seconds to
finish; if there's a disk in the drive, it will take less than a
second. Then the ST will try to read in the first sector of the hard
disk. This will cause a brief flash of the hard disk drive's light.
You can definitely see it, it's just mighty quick.
If this does'nt happen, then your ST is not commanding the hard
disk to give it the first sector. Again, suspect the cable first. The
next thing to suspect is the Atari DMA chip. I've had this go bad on
me several times. The symptoms are that if you boot from the hard
disk, the system freezes; if you try to boot from floppy, the floppy
window instantly pops up and shows "0 bytes in 0 sectors".
It never shows any data on the floppy disk, regardless of what is
there. If these symptoms appear, it's your DMA chip - count on it.
Almost all the time you can cure this by simply reseating the DMA
chip. Open up your ST, find the DMA chip (it's usually in front of the
hard disk port, so that the signal lines are as short as possible),
pry it up from both sides a little at a time, so that you don't bend
the pins to the side, and press it back down again. Of course,
anti-static precautions are essential; if you don't know about this,
get help from someone who does.
Reseating the chip fixes a lot of DMA problems. What you are doing
is scraping off a microscopically thin layer of corrosion on the pins
and on the socket. ST owners have been reseating chips for a long time
(particularly the MMU and GLUE square chips); add the DMA chip to your
bi-monthly reseating schedule.
Naturally, at this point you've cycled power to the hard disk, to
be sure it's not getting "stuck" by the bogus commands sent
during an ST power off. But watch for that light flashing, if it
does'nt flash, something is wrong. If the cable and DMA fixes don't
work, just carry your hard disk over to a friend's ST and try it; if
it works, your ST is bad - take it to a dealer to get it fixed. If it
still does'nt work, try troubleshooting as above.
IV. AUTOBOOT PROBLEMS
Next, let's assume the light flashes and your hard disk tries to
self-boot - you see the HD light flashing a few times. Nothing
happens, or maybe you get a "Self Boot" message, then the ST
dies or the screen gets filled with gibberish or you experience some
other bizarre symptom. You've probably got sick autoboot software:
something has corrupted it.
If you're set for autoboot, you're going to have to beg and plead
for the hard disk to let you boot up to the desktop enough just to fix
it! If you've got a hard drive that can be turned on when the ST is
running, and not crash the ST, then good! Turn on the ST, put in your
hard disk utilities disk, and when the Desktop appears, turn on the
hard disk. Re-install the autoboot utility (delete any old autoboot
files from the hard disk). That ought to do it.
If your hard disk can't be turned on when the ST is connected -
that is, it crashes the ST - welcome to the club. So does mine. Use
the program Revive! in the spring 1987 issue of START (and repeated in
the May 1989 issue) to make a "bootable floppy disk" which
will force the ST to ignore the hard disk at boot. Power up the hard
disk, then when the ST boots from this floppy, put in your utilities
disk, and re-install the autoboot. That ought to fix it; it did when
my hard disk went bad.
Revive! is one of those utilities you don't need often, but boy, when you need it, you NEED it! I keep a Revive! disk in my hard disk tool kit at all times (along with the Supra utilities disk). If your hard disk used to self-boot, but now boots from floppy, then something has corrupted the autoboot. There are lots of possi- bilities. The "auto-boot" flag could have been shut off. The partition sector could have been damaged, which would disable the boot (the Atari would conclude it was not a "bootable sector"). The hard disk driver program on the hard disk (AHDI.SYS, SUPBOOT.PRG) might have gotten damaged. For instance, Magic Sac's MAGICHD program did this, when it tweaked a partition into Macintosh OS disk format. Just re-install the auto-boot using the supplied software and the
problem will correct itself.
You can do this by booting from floppy, then running the hard disk
drive by hand. For instance, with the Supra software, you'd double-
click on SUPBOOT.
Watch the hard disk at this point! You should see SUPBOOT poll the hard disk as it wakes up, looking for partitions. If nothing happens (no light flash), you've got a hardware problem. May I repeat? All together now: Distrust the ST/hard disk cable first! After running SUPBOOT and seeing the light flash, your communi-
cations to the hard disk should be restored. Now re-install a hard
disk icon (or use a command line shell) so that the ST can access the
hard disk. (Note that when you boot from floppy, you're using the
floppy's DESKTOP.INF file, so probably all of your Desktop icons are
missing. You have to hand-install an icon to access the hard disk from
the Desktop.
Then, use SUPUTIL to re-install the automatic boot software. You should be all set. If you run SUPBOOT but still cannot access your hard disk, there
are several possibilities:
1. Your SUPBOOT is bad. Remember, some Atari machines cannot
detect all floppy disk errors. Programs really CAN go bad in this
manner.
2. The ST cannot talk to the hard drive (no drive light flash) or
the communication is corrupted (DMA chip needs reseating).
3. Your partition sector is bad; the data in it has been damaged,
such as if the partition tables have been zeroed out by a program
error.
If this last has happened, then you'll see the hard disk flash,
SUPBOOT will run and exit normally, but you will still not be allowed
to talk to the hard disk. For instance, you'll double-klick on the C
icon, and get the "Non-existent disk drive" message.
V. TROUBLE IN THE PARTITION SECTOR
At this point you have serious trouble with your partition sector.
You'll have to restore it. IF you took the time to write down all of
it's specs, as I recommend you do when the system is running okay,
then you can use SUPEDIT and punch in the partition sector data.
You'll then be able to retrieve the data from your hard disk - maybe
only the partition sector is damaged!
If you're like most people and did'nt write down the info and never used Meg-A-Minute-Elite to back up your partition sector, well then, you can either format (and loose all your data), or try to restore your partition sector by hand. Yes, it's true! Meg-A-Minute- Elite backs up what no other backup program does: the critical
partition sector! It's a VERY good idea to use it at least once, just
to get a backup of the partition sector; loosing that one sector will
shut off access to your hard disk.
It is not easy to restore a partition sector by hand. But try to remember exactly how many megabytes your partition sectors were. Mostly, everyone uses even numbers: 5 or 10 megabytes. If so, try
using these values for partition sizes:
1 megabyte = 2,000 sectors
Plug these values into the SUPEDIT partition editing utility,
remem- bering to leave one-sector "slop" for fencepost
error. I can hear the masses clamoring for an example, so here goes.
Let's assume I have a strange disk layout, of 1-meg, 5-meg, 10-
meg and 2-meg partitions in that order (C, D, E, F). My partition
table needs to look like this when I'm done with SUPEDIT:
Starting Sector # Length
1-2001 ( 1 meg)
You can see what I mean about "fencepost error"; 2000
sectors takes us from sector 1 to sector 2001, so the next partition
starts at sector 2002.
When done, tell SUPEDIT to write this out as your partition sector. It will question your sanity, but persist; you KNOW what you're doing, right? When it's done, you won't be able to boot from hard disk, but you will from floppy; run the SUPBOOT hard disk driver, install the icons, and test it out. Hopefully it will work. If not, you've got even more trouble.
Anyway, that's how to restore your partition sector if it gets
damaged. Either use Meg-A-Minute-Elite at least once to backup your
partition sector, or repair it by hand with SUPEDIT. I don't know
another practical way to do it.
Once More For Emphasis: If your partition sector is damaged, the
ST has forgotten where on your hard disk all your "drives"
(direc- tories) are. This means EVERYTHING ON YOUR HARD DISK IS
INACCES- SIBLE. That's why I'm stressing backup or fix; I see it
happen all the time. No other sector on the hard disk is this
critical.
VI. DOES YOUR DATA APPEAR DAMAGED?
Okay, let's say your hard disk walks, talks and boots, but strange
things happen when you use it. For example, you run a program and get
TOS ERROR #35. This means The Operating System (TOS) tried to load the
program, did so, and discovered the program was'nt a program; it was
just random data, so TOS gave up.
This means you have data damage on your hard disk. Not PHYSICAL damage (well, not necessarily, although it could be; it usually is'nt). This happens all the time. People make big bucks selling disk repair utilities. Or they write columns. Atari's disk operating system has one particularly nasty bug, known popularly as the "40-folder limit". What happens is every time you access a folder (by opening it, or otherwise "touching" it), information about that folder is loaded into a memory table. Problem: the memory table's size is tiny. After you've touched around 40 folders, the table runs out of space, and the disk operating system goes berserk - writing sectors every which-where, resulting in lost clusters, cross-linked clusters and what-not. It's a nightmare. To make things worse, folder "slots" are used up just by
a drive being connected, by doing a "SHOW INFO" and other
ways. It's mighty easy to run out of folder slots.
One common symptom of this is that when you open up a new directory box, and you get data or program files that belong somewhere else. Or you get the dreaded "0 files in 0 items" box, faking you into thinking everything has just been erased. If this happens, reboot immediately; if you write anything to that hard disk, you're going to damage the directory structure. Your data is probably still out there, and still okay. Upon restarting, go immediately to the offending directory, and try again; if it works this time, take a deep breath - you were lucky. If this happens from within a program, you may be doomed. Be VERY
CAREFUL accessing many folders from within a program; you can run out
of slots very quickly that way.
Atari has released an "official" 40-folder bug fixer
program, called FOLDRXXX, which is available from bulletin boards,
user groups or the August 1989 issue of START. What you do is put this
in your AUTO folder with the XXX replaced by how many folder slots you
you'd like to reserve. For instance, for 100 folders, name the program
"FOLDR100.PRG" in the AUTO folder. (I use FOLDR800 on my
system, because I have so many hard disks (six) attached to it, it's
necessary.)
At boot-up, FOLDRXXX adds more memory to the dinky memory space.
which just moves the problem into the future. Yes, eventually, you
will still crash, but hopefully you have RESET or powered off the
system before that point. Most people don't touch that many folders;
most programmers are aware of the bug, and write their code not to.
George Woodside's superb Turtle backup program, for instance, uses
only a few folder slots in spite of the fact that it touches
everything on the hard disk.
Atari has rewritten GEMDOS, which is the true cause of this
problem; by the time you read this, the new TOS 1.4 ROMs (which fix
this problem among others) should be available. Contact your Atari
dealer for information.
If your directory has been really damaged - say, you get the
"0 bytes in 0 items" message - or if folder names are
trashed (look like Greek letters, commonly) - or if "Show
Info" crashes or shows weird information - you have trouble.
(Fair Warning: Once Show Info has been confused once, it takes a
reboot to make it work again. If you do a Show Info on a bad
partition, then on a good one, it'll show bad data on both.) You have
BIG problems. What you need is a good disk fixer program; I don't know
of any for the ST! This is most unfortunate.
At this point, you'd best have backed up your hard disk. If you've
used Meg-A-Minute-Elite, for instance, it'll take you only 10 minutes
to fix a 10-megabyte partition COMPLETELY, directory structure,
partition sector, and all. (Not bad, eh?) If you're using Turtle,
you'll have to "zero" the partition with hard disk utilities
(SUPUTIL) to get a basic directory out there, then go re-create all
your folders, then copy the Turtled disks back into them. Be sure to
zero the partition; that makes sure the directory structure is new and
clean.
The other backup schemes I've seen are too slow to mention; they
are so slow you will end up not using them religiously; which is how
backups need to be done.
Finally, be sure to run a program to check your disk's structure periodically. Michtron's TUNEUP! does this automatically. A few other disk testers are available. Essentially, they check that what the directory THINKS the disk structure is, and what it really is are the same and let you fix the structure if there's a problem. TUNEUP has one other very handy feature; it lets you pack all disk
data towards the end of the disk. Briefly, whenever the Atari has to
write to the disk, it looks for the first open disk sector (starting
at 20 or wherever). If you have five or ten megabytes of data before
the first open sector, the ST can take up to 30 milliseconds to find
an open sector. This greatly slows any write operation, and if you're
using lots of temporary files - as in, for example, Alcyon C compiler
or in WordPerfect - then you're in big trouble.
Compressing the data to the end of the disk opens up the
"fast" area of the hard disk for writing, so the ST does'nt
have to search far for the first open sector. Things really zip along
when you do this.
I don't recommend TUNEUP! 100 percent, but I have used it's data compaction many, many times. There's not much else like it on the market and it's very much worth having if you use your hard disk much. Again, the best cure for data loss as the result of directory
problems is a good backup schedule. It's much easier to to pull data
from a backup than it is to try to reconstruct it.
Generally, however, I find that people don't realize this until they've completely murdered a hard disk and have to rebuild it from scratch. The only positive thing I can say about that experience is that it tends to keep junk from accumulating on your hard disk. If you have a book on IBM or MS-DOS disk structure, you could read it for additional information about your ST's disk structure, they are compatible. Just find the first sector of a given partition via the partition table, and start tracing. You'll even find the ST "boot block" information to be fairly compatible with IBM boot blocks. If your directory has been damaged, there's no easy way to tell
what programs might have been hurt, especially if you have many
folders. Your data may be lost for good or require many hours of
painstaking tracing and an intimate knowledge of MS-DOS disk structure
(both of which are time intensive) to fix. At that point - and we have
all been there - the only reasonable alternatives are a reformat or if
you have one, going back to your last backup.
CONCLUSION
Atari disk hardware is pretty good. It's much faster than, say, it's IBM counterpart. It's a rare case of where more speed costs less. I've tried to give you some of the debugging techniques Dan and I
have used over the years, show you some of the common faults we've run
into and tell you how to fix them. Believe me when I say that this
knowledge was paid for with much blood and sweat. Many of these aren't
written up anywhere else (the "0 bytes in 0 folders" = DMA
chip problem, for instance) and are a simple fix ... once you know
what's wrong.
I wish you luck with your hard disk. May you never have to use the
knowledge in this column the same way I had to - a deadline a few
hours away, a completely dead system and the data possibly lost for
good. But may you keep this article nearby to refer to if it ever does
happen.
Copyright © Robert Schaffner (doit@doitarchive.de) Letzte Aktualisierung am 23. Mai 2004 |