Difference between revisions of "Hard Drive Repair"

From Hack Sphere Labs Wiki
Jump to: navigation, search
(Notes)
(Firmware Works?)
Line 1: Line 1:
=Firmware Works?=
+
=How to Reallocate Sectors=
 +
Read the sector and then write to it.
 +
 
 +
=How Firmware Works?=
  
 
Quote:
 
Quote:

Revision as of 10:39, 28 February 2014

How to Reallocate Sectors

Read the sector and then write to it.

How Firmware Works?

Quote:


I used to write disk firmware for WD, and I once wrote the firmware which reassigned bad blocks.

First, most bad blocks are detected on reads, not writes. Writes are done blindly, meaning the data is written without being checked. Thus on a write if the media is bad, you won't know it until the host does a read to that sector. There is a small part of the sector (the sector header) which is read on writes to locate the correct sector, so that if there is an error in reading the sector header, the drive will reassign the sector and write it with the data received from the write command. But the vast majority of bad blocks are detected on reads, and just because a write succeeds to a sector doesn't mean the media is good or that the sector has been reassigned.

Now about bad block reassignment (also called reallocation). Yes, normally the drive will attempt to reassign a sector if the error is bad enough (i.e., the ECC failure is bad enough) but the drive still could recover the data after ECC correction. Usually this is done automatically. The only exception is that the host could have previously told the drive not to do automatic reallocations, but this is seldom done.

So what happens if the drive does a read and cannot recover the data? Nothing. The error is reported to the host, but no reassignment is done. The problem is that the drive could reassign the sector, but it doesn't have the slightest idea what data to write in the newly reassigned sector. If it just wrote a bunch of zeros, say, and then the sector was read again, it would return all the zeros without any indication that the data wasn't valid. This is essentially the same thing as data corruption. The drive can't count on the host keeping track of errors for a variety of reasons (for example, what if the drive was moved to a new host?), so the best course of action is to do nothing when the data can't be recovered.

Modern drives, however, will save the location of the bad sector when it can't be reallocated. The number of bad sectors waiting reallocation can be found in the SMART data. What happens is if a write is done to one of the bad sectors awaiting reallocation, the reallocation is done because the drive now has valid data to write to it after the reallocation. Thus when people say writing to a bad sector will reallocate it, that's really only half the story. The drive must be read first so the drive can discover all the bad sectors that can't be reallocated automatically. Thus you can write an entire drive, and the SMART data will say there are no bad sectors waiting reallocation, but you haven't necessarily cleared the drive of all bad sectors. So if you really want to clear a drive of all bad sectors, the best thing is to read the entire drive first, followed by writing the entire drive (of course, this will destroy all previous data on the drive).

There are other ways of dealing with bad blocks which can't be reallocated. If the drive is part of a redundant RAID configuration (i.e., anything but RAID 0), the RAID software should automatically recover the data for a bad sector from the other drives and write it to the reallocated sector. SCSI disks have an explicit reassign blocks command which the host can use to force the reassignment even when there is no valid data to write to the block, but its use is pretty low-level.


Links

Notes