You are not logged in.

#1 2018-09-02 09:28:36

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Hard disk failure and SMART

[NOTE] This thread has been moved here from the ModZone because there's nothing private about it, and other members might be interested in the topic.
---

If I suddenly stop posting, you'll know why. Yesterday the box started running like frozen treacle, with continuous hard disk activity. Today it was virtually unable to boot, with various read/write and I/O errors, and filesystems being switched to read-only.

I managed to boot a live session off a USB stick, and was able to run a backup of sorts onto a plug-in drive. It took much longer than usual but the disk was at least still readable somehow.

Then I thought of booting into BL Hydrogen which still lives on a different (LVM) partition. Same disk of course, and again much activity at boot, but I was able to get a workable desktop, read my email, and log into BL. That's the system I'm running now. There were a couple of mails from the SMART daemon about more unreadable sectors than before, but sometimes those sectors get patched over. Right now the disk is quiet.

Tomorrow I'll try booting into Helium again, but won't be surprised if it's impossible, or even if Hydrogen is no longer usable. (At least it allowed me to back up some data.)

Anyway, I might have to go and look for a new computer, so don't be alarmed if I don't post for a while.

Last edited by johnraff (2018-09-07 00:37:47)


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#2 2018-09-02 10:34:36

nobody
The Great
Registered: 2015-08-10
Posts: 3,655

Re: Hard disk failure and SMART

Yeah, the disk can reallocate sectors from a spare sector pool as long as the number of defect sectores stays below a size of the spare spool.

Good job backing up your data, though just cloning your existing disk to a new one might be difficult depending on the state the disk is in…You might want to grab a SSD this time, esp. the popular Samsung models have gotten pretty cheap over the years.

Offline

#3 2018-09-02 18:10:28

hhh
Gaucho
From: High in the Custerdome
Registered: 2015-09-17
Posts: 16,036
Website

Re: Hard disk failure and SMART

bzzzt BZZZZZZTTTT

Not good.


No, he can't sleep on the floor. What do you think I'm yelling for?!!!

Offline

#4 2018-09-03 02:01:43

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Re: Hard disk failure and SMART

nobody wrote:

Yeah, the disk can reallocate sectors from a spare sector pool as long as the number of defect sectores stays below a size of the spare spool.

That must be what happened, because it booted more-or-less OK today. Took the chance to redo yesterday's backup properly, but noticed that parts of it went very slowly, suggesting that sometimes the disk is still hitting bad zones. What seems to be happening is that the read/write instructions are being repeated endlessly until they eventually go through. Maybe those bad disk sectors are also being reallocated at that time?

Where is this spare sector pool? Can I check its size? My HD does have several hundred GB of unused space, with an LVM partition setup.

Good job backing up your data, though just cloning your existing disk to a new one might be difficult depending on the state the disk is in…

That I wouldn't bother with, and anyway have no disk big enough. Just backup my personal data along with all the config files. It makes the reinstall take longer of course.

You might want to grab a SSD this time, esp. the popular Samsung models have gotten pretty cheap over the years.

That might be nice, indeed. Very often the bottleneck to some operation is disk I/O, with the CPU sitting idly waiting.

But my current 4GB of RAM is a bit tight too (especially when running a VM), so I'm going to look for a "new" machine. The current one was second-hand from the start, and the graphics card is also quite old, so an all-round hardware update might be in order if I can find something for an affordable price.


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#5 2018-09-04 06:05:39

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Re: Hard disk failure and SMART

@nobody many thanks!

The output (after a short self-test) doesn't look too bad (snippets):

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-8-amd64] (local build)

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K1000.C
Device Model:     Hitachi HDS721010CLA332
Serial Number:    <CENSORED>
LU WWN Device Id: 5 000cca 373d32104
Firmware Version: JP4OA3EA
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Sep  4 14:47:46 2018 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   086   086   005    -    396
196 Reallocated_Event_Count -O--CK   087   087   000    -    413
197 Current_Pending_Sector  -O---K   100   100   000    -    34
198 Offline_Uncorrectable   ---R--   100   100   000    -    0

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11976         -
# 2  Short offline       Completed without error       00%     11976         -
# 3  Short offline       Completed without error       00%      9087         -
# 4  Short offline       Completed without error       00%      1782         -
# 5  Short offline       Completed without error       00%      1076         -

So it's saying that the disk is now OK!
For about the past year I was getting daily email messages with "3 Currently unreadable (pending) sectors".
Yesterday when the whole thing seemed to have crashed, it went up to 42, and now is reporting 34, which is way up on 3 of course, with 396 reallocated sectors.

I'm amazed, to be honest, of the ability of the drive to heal itself like this. I'm sure that wouldn't have happened a few years ago.

Anyway, I'll take it as a warning that catastrophic failure could come any time, and go and get a replacement box ASAP. (Not today, though, as a powerful typhoon is passing close by.)


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#6 2018-09-04 07:56:12

nobody
The Great
Registered: 2015-08-10
Posts: 3,655

Re: Hard disk failure and SMART

Yeah, with there still being pending sectors, replacing the drive soon sounds like a good idea.

Note that only a long self test delivers reliable results, so if you'd launch a long test, it's likely that the parameters of the drive still get worse.

Don't forgot to run a long self-tests on your new drive after purchasing, and perhaps run a file system benchmark on it like sysbench's a couple of times, then check the kernel log/syslog for I/O errors and the SMART log for suspicious items before committing the drive to active use, just to rule out it's a lemon.

Offline

#7 2018-09-04 19:40:58

hhh
Gaucho
From: High in the Custerdome
Registered: 2015-09-17
Posts: 16,036
Website

Re: Hard disk failure and SMART

@nobody, what do you recommend to a layman for the disk format? I've always used ext3 or ext4. I've never tried btrfs.


No, he can't sleep on the floor. What do you think I'm yelling for?!!!

Offline

#8 2018-09-04 20:21:35

hhh
Gaucho
From: High in the Custerdome
Registered: 2015-09-17
Posts: 16,036
Website

Re: Hard disk failure and SMART

Sounded pretty damn expert to me. smile


No, he can't sleep on the floor. What do you think I'm yelling for?!!!

Offline

#9 2018-09-04 20:26:11

hhh
Gaucho
From: High in the Custerdome
Registered: 2015-09-17
Posts: 16,036
Website

Re: Hard disk failure and SMART

Why XFT for critical media? Arch Wiki says you can manually run a data-corruption tool, I'll guess that's the reason...

https://wiki.archlinux.org/index.php/XF … corruption

Last edited by hhh (2018-09-04 20:27:39)


No, he can't sleep on the floor. What do you think I'm yelling for?!!!

Offline

#10 2018-09-07 01:37:37

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Re: Hard disk failure and SMART

This disk is no more... it has ceased to be... it is an EX-disk.
(Bangs on counter.) "Wake up HDS721010CLA332!!"

No, it finally snuffed it. I'm posting from a live session, so the rest of the computer still works, but plenty of error messages come up during the boot. Thunar actually displays some of the partitions, but when I tried to open one it went into an endless spin, and the disk light just went, and stayed, on. (Now after yet another reboot the disk seems to be quiet.)

So now scouring the web for the best deal in 2nd hand boxes with 8GB RAM, an SSD drive + hard disk to bring total space to 1TB+, and an i5 or i7 CPU. It looks like something around 40,000 yen ($350~400), but I was only halfway through checking when the disk finally gave up.

Still have a question, though: smartctl is available in the live system cool, and is able to read the disk data.

snippet:

user@debian:~$ sudo smartctl -x /dev/sda
=== START OF INFORMATION SECTION ===
<same as above, except:>
Write cache is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!  << THIS
Drive failure expected in less than 24 hours. SAVE ALL DATA.

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   016   016   016    NOW  4294932929
  5 Reallocated_Sector_Ct   PO--CK   001   001   005    NOW  1667
196 Reallocated_Event_Count -O--CK   008   008   000    -    1844
197 Current_Pending_Sector  -O---K   091   091   000    -    358

Is there anything that can be done from the live session that might make the disk bootable again, or at least mountable? (I don't know how much point there would be in doing that though, really...)

Take-home message: take seriously those smart daemon warnings! SMART can patch up bad sectors, but I think I was breaking more and more places by attempting to go on using the system as normal. Especially, booting up a virtual system - which calls on a lot of disk activity - broke the camel's back. So when the amber light comes on, backup everything and look for a new disk.


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#11 2018-09-07 03:16:18

hhh
Gaucho
From: High in the Custerdome
Registered: 2015-09-17
Posts: 16,036
Website

Re: Hard disk failure and SMART

johnraff wrote:

This disk is no more... it has ceased to be... it is an EX-disk.
(Bangs on counter.) "Wake up HDS721010CLA332!!"

Bereft of life, it Rests In Piece. If it wasn't screwed to the laptop, it would be pushing up the daisies.


No, he can't sleep on the floor. What do you think I'm yelling for?!!!

Offline

#12 2018-09-07 04:33:28

ohnonot
...again
Registered: 2015-09-29
Posts: 5,592

Re: Hard disk failure and SMART

i only diagonal-read this, but why do you need a whole new box?

i'm a sucker for recycling, even electronics.
recently i had a closer s.m.a.r.t. look at my hard drives (2 of the big ones, one laptop-sized) and was appalled at how little lifetime was left.
it also emerged that my main drive was the slowest of them all!

went to the store and got the cheapest SSD they had. 40€ for 120GB (WD green iirc).
dd'd my / to it. no further changes, except for adjusting UUIDs in fstab. ext4 as before.
the difference was (still is) amazing. boots in a few seconds. graphical desktop comes up immediately (that's what took the longest before, even though it's just openbox).
i should've done this much earlier.
nowadays SSDs are making computers fast, not CPUs.

we will see how long it lasts, being the cheapest and not samsung.
of course i'm making full backups to an external drive.

PS:
i still have the other hard drives, altogether it comes up to ~700GB right inside the box, plus another ~700GB on my server.
i have no idea how this could ever get filled up.
i delete movies after watching.

Last edited by ohnonot (2018-09-07 04:37:54)

Offline

#13 2018-09-08 04:06:39

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Re: Hard disk failure and SMART

^Yes I gave serious thought to just replacing the drive, but the rest of the hardware is getting old too. In particular it would need some more RAM - 4GB is barely enough after starting up a VM and opening some browser tabs - the GPU is classified "heritage" or something and one of the fans doesn't work so sometimes in summer overheating causes freezes. A 125GB SSD alone would not be enough, not because I keep movies around but: iso files (some irreplacable), git repos (debian-installer alone is several GB), and virtual machines and the like. A lot of that is downloadable on demand, but having the code locally makes grep, find &co. much faster.

True, there is was still some free space on my 1TB disk, but the 500GB /data partition was getting a bit tight. So I'd have to buy much more than 125GB of SSD, or add a hard drive to go with it. Along with the 4GB extra RAM and fixing the fan, a newer machine seemed a better deal, since the motherboard and everything else will be newer too.

I will however take your hint and look for a cheap SSD for the old machine. It would then be a perfectly usable computer - although not for my main workstation. Then to think of a sensible use to put it to...


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#14 2018-09-08 08:46:24

Hyacinth
Member
Registered: 2018-03-26
Posts: 14

Re: Hard disk failure and SMART

The monty python references are absolutely hilarious!

I looked online at how much it costs to bring new life to it and found https://www.amazon.co.jp/Samsung-2-5インチ … B0796B3GL6

8000 yen for 250 GB is so cheap! When I bought an SSD you could maybe get a used 32 GB one for that money. But it’s still a fair sum for something you are not likely to use anymore. Maybe you can find a second hand SSD? They don’t die ever, I think. My fiancé is using one from ancient times daily. I don’t think the company that made it exists today, even! That was before Samsung was in the market. Even the one I have in my desktop computer is an Intel one from Q2 2012 that the Intel Windows tool says is in peak condition still.

Glad you made a backup and that Bunsenlabs has such an excellent live session now, and good luck on looking for a new computer!

Offline

#15 2018-09-10 08:44:40

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Re: Hard disk failure and SMART

^Hey thanks!
The machine I picked should arrive today.
39,999yen, HP ProDesk 600, i5-4570 core, 120GB SSD+2TB HD, 8GB RAM, not to mention Windows 10. roll I suppose with 2TB there's room to keep that around, though I haven't touched Windows for years. Anyway, we'll see how it all works out...


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#16 2018-09-10 13:04:32

Jimbo_G
Member
From: France
Registered: 2017-05-12
Posts: 325

Re: Hard disk failure and SMART

^ If you haven't used Windows for a few years, Windows 10 might come as a bit of a shock... It would be interesting to see what you think of it though!

Offline

#17 2018-09-11 15:41:41

Hyacinth
Member
Registered: 2018-03-26
Posts: 14

Re: Hard disk failure and SMART

Hey that’s the CPU I almost have. Just a little faster! Does a great job for me. Enjoy it!

Offline

#18 2018-09-12 07:49:51

dbvolvox
Member
From: England
Registered: 2015-09-29
Posts: 111
Website

Re: Hard disk failure and SMART

johnraff wrote:

^Hey thanks!
The machine I picked should arrive today.
39,999yen, HP ProDesk 600, i5-4570 core, 120GB SSD+2TB HD, 8GB RAM, not to mention Windows 10. roll I suppose with 2TB there's room to keep that around, though I haven't touched Windows for years. Anyway, we'll see how it all works out...

Good luck! I was going to do something similar but found I couldn't even turn the machine on without having to accept all the MS T&C so went straight to an install that wiped W10.


volvox.biz a very tedious daily account of life during covid,

Offline

#19 2018-09-12 08:32:54

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 12,558
Website

Re: Hard disk failure and SMART

Jimbo_G wrote:

^ If you haven't used Windows for a few years, Windows 10 might come as a bit of a shock... It would be interesting to see what you think of it though!

The last Windows I used was W98. Booted up XP a couple of times, that's it, so W10 was... sort of what I expected. The inscrutible error messages after a process has run for 5 min., mysteriously fixed the next time, sudden reboots without warning, some things don't change.

I was thinking of just wiping it right off both drives (SSD and HD) after making an installer just because, well I paid for it. Anyway, tried reinstalling - somewhat long convoluted process, with much googling and downloading - and managed to put Windows on the hard disk, leaving the SSD free, (without having to set up a Microsoft account). It boots up OK after all that, but it was a tiring day and installing BL to the SSD (with big data on the HD) will have to wait till tomorrow. I hope W10 doesn't do an update down the road and wipe all the hard drive. I might change my mind and delete it anyway.

BTW does anyone use LVM on drives as small as this 120GB SSD?


...elevator in the Brain Hotel, broken down but just as well...
( a boring Japan blog (currently paused), now on Bluesky, there's also some GitStuff )

Introduction to the Bunsenlabs Boron Desktop

Offline

#20 2018-09-12 09:22:33

nobody
The Great
Registered: 2015-08-10
Posts: 3,655

Re: Hard disk failure and SMART

johnraff wrote:

BTW does anyone use LVM on drives as small as this 120GB SSD?

LVM is a win on any disk since you can then forget about disk geometry when working with partitions. If there's even the slightest chance that you'd want to grow/move partitions, then use LVM.

Offline

Board footer

Powered by FluxBB