You are not logged in.

#1 2018-12-18 08:20:18

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 5,734
Website

Is this SSD good to go?

New visitor the other day - a Thinkpad T420s. Made in 2011, but with a fairly good spec: 8GB RAM, Intel i5 CPU and SSD (120GB), so it's quite usable. smile

But it's clearly been through some traumatic experience, and my
question: do you think this SSD is OK to use?

Full Story

1) A friend reported a couple of months ago that his computer had stopped working, wouldn't boot, and he wanted to get his data off it before buying a new one. I suggested some inexpensive options (he's a Windows user) and offered to have a look at it - maybe I could save his data (having recently played with ddrescue). But instead he took it to a shop where a visiting "expert" told him the machine was unfixable, and all he could do was get the data out. That cost $300.

2) A couple of weeks ago, having got a shiny new Windows 10 machine, he loaned me the old computer to "play with", and booting into a BunsenLabs live session I found all his files available on the still-readable hard disk. Made a copy onto a removable drive. 15 minutes total. He could have saved $300. roll

3) Windows however just refused to boot. After the BIOS screen, just a flashing cursor. Now, I didn't think carefully enough about UEFI and just immediately went to an MBR repair:

apt-get install syslinux
dd if=/usr/lib/SYSLINUX/mbr.bin of=/dev/sda

Success! Now a pulsating Windows logo appears... but nothing else.

4) Reboot and now Windows detects a problem and offers to repair things. Say yes, reboot, repeat, this goes on several times - the repair screens gradually got fancier, and at one point even offered a login, but that also failed to do much, and eventually I noticed an option to restore the system from a Lenovo partition at the end of the drive. It would nuke all the settings and data, but they had been abandoned anyway, so OK. The new install - now in Japanese - booted, went through some setup things, but ran like treacle in winter. Awful. Just trying to find the Control Panel so I could set the system language to English was too much - each window took an eternity to open. Reboots hit errors about every other time. Clearly things were not well. Maybe the machine really was unfixable?

5) Back in the BL live session, run smartctl to check out the hard disk. Health report is PASSED! However, some details look suspicious. A short test returns sucess, but a long test fails after 20% with a read error. Try again, this time the failure is after 10%. 'badblocks -v' comes back with four bad blocks, with adjacent addresses. Not good.

6) Abandon Windows. Having read Somewhere On The Internet™ that formatting would automatically pass over bad disc blocks, use gparted to reformat the whole SSD as ext4, then again to break that into 4, no 6 partitions to put some trial Linux systems on. BunsenLabs first, allowing the installer to format the partition yet again. Reboot, error, but the next time is OK. Install Kubuntu 8.4, MX 17, and finally Debian Buster with kde-plasma-desktop (which has had some good reports here). These now all boot OK, and I've seen no further errors.

I did another two long smartctl tests, and this time they both completed with no errors! Likewise 'badblocks -n' (non-destructive read-write).

Here's the current output of 'smartctl -a':

smartctl 6.6 2016-05-31 r4324 [i686-linux-4.9.0-8-686-pae] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZ7PA128HMCD-010L1
Serial Number:    S0MUNEAC202412
Firmware Version: AXM08L1Q
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Dec 17 16:41:49 2018 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  840) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  14) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  9 Power_On_Hours          0x0032   096   096   ---    Old_age   Always       -       18091
 12 Power_Cycle_Count       0x0032   096   096   ---    Old_age   Always       -       3457
175 Program_Fail_Count_Chip 0x0032   099   099   ---    Old_age   Always       -       1
176 Erase_Fail_Count_Chip   0x0032   100   100   ---    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0013   093   093   ---    Pre-fail  Always       -       221
178 Used_Rsvd_Blk_Cnt_Chip  0x0013   075   075   ---    Pre-fail  Always       -       508
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   077   077   ---    Pre-fail  Always       -       930
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   077   077   ---    Pre-fail  Always       -       3102
181 Program_Fail_Cnt_Total  0x0032   099   099   ---    Old_age   Always       -       1
182 Erase_Fail_Count_Total  0x0032   100   100   ---    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   099   099   ---    Pre-fail  Always       -       1
187 Reported_Uncorrect      0x0032   099   099   ---    Old_age   Always       -       13
190 Airflow_Temperature_Cel 0x0022   075   046   ---    Old_age   Always       -       25
195 Hardware_ECC_Recovered  0x001a   199   199   ---    Old_age   Always       -       13
198 Offline_Uncorrectable   0x0030   100   100   ---    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   253   253   ---    Old_age   Always       -       40
233 Media_Wearout_Indicator 0x003a   199   199   ---    Old_age   Always       -       4956
234 Unknown_Attribute       0x0012   100   099   ---    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   ---    Old_age   Always       -       94
236 Unknown_Attribute       0x0012   099   099   ---    Old_age   Always       -       343
237 Unknown_Attribute       0x0012   099   099   ---    Old_age   Always       -       464
238 Unknown_Attribute       0x0012   099   099   ---    Old_age   Always       -       930

SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 occurred at disk power-on lifetime: 18070 hours (752 days + 22 hours)
  When the command that caused the error occurred, the device was sleeping.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 72 12 8a e1  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE
  ef 03 46 00 00 00 00 00      01:05:48.000  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE

Error 5 occurred at disk power-on lifetime: 18070 hours (752 days + 22 hours)
  When the command that caused the error occurred, the device was sleeping.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 72 12 8a e1  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE
  ef 03 46 00 00 00 00 00      01:05:48.000  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE

Error 4 occurred at disk power-on lifetime: 18070 hours (752 days + 22 hours)
  When the command that caused the error occurred, the device was sleeping.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 78 12 8a e1  Error: 

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE
  ef 03 46 00 00 00 00 00      01:05:48.000  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE

Error 3 occurred at disk power-on lifetime: 18070 hours (752 days + 22 hours)
  When the command that caused the error occurred, the device was sleeping.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 72 12 8a e1  Error: UNC 6 sectors at LBA = 0x018a1272 = 25825906

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 68 12 8a 01 00      01:05:48.000  READ DMA
  c8 00 00 e8 12 8a 01 00      01:05:48.000  READ DMA
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE
  ef 03 46 00 00 00 00 00      01:05:48.000  SET FEATURES [Set transfer mode]

Error 2 occurred at disk power-on lifetime: 18070 hours (752 days + 22 hours)
  When the command that caused the error occurred, the device was sleeping.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 76 72 12 8a e1  Error: UNC 118 sectors at LBA = 0x018a1272 = 25825906

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 28 12 8a 01 00      01:05:48.000  READ DMA
  c8 00 20 08 12 8a 01 00      01:05:48.000  READ DMA
  c8 00 08 00 12 8a 01 00      01:05:48.000  READ DMA
  ef 10 02 00 00 00 00 00      01:05:48.000  SET FEATURES [Enable SATA feature]
  ec 00 01 00 00 00 00 00      01:05:48.000  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     18089         -
# 2  Extended offline    Completed without error       00%     18083         -
# 3  Extended offline    Completed without error       00%     18079         -
# 4  Extended offline    Completed: read failure       90%     18070         25825792
# 5  Extended offline    Completed: read failure       80%     18069         25825792
# 6  Short offline       Completed without error       00%     18069         -
# 7  Short offline       Completed without error       00%      6412         -
# 8  Short offline       Completed without error       00%      5739         -
# 9  Short offline       Completed without error       00%      5533         -
#10  Short offline       Completed without error       00%      5309         -
2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

This section looks a bit dubious:

178 Used_Rsvd_Blk_Cnt_Chip  0x0013   075   075   ---    Pre-fail  Always       -       508
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   077   077   ---    Pre-fail  Always       -       930
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   077   077   ---    Pre-fail  Always       -       3102

I guess that means that about 1000 of the 4000 reserved blocks have already been used?

Anyway, to repeat my question, does it look as if this hard disk is going to be usable for a while longer, or should I go and buy a new one - they aren't all that expensive these days. This laptop really looks quite useable otherwise. Friend says I can keep it. cool

Sub-question Any suggestions what could have caused the machine to fail like that when it was running Windows? Doesn't Win have the same kind of disk repair utilities as Linux? Or do they have to be run manually? Or was it a virus? This is also for Friend's curiosity as well as mine.


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
In case you forget, the rules.

Offline

#2 2018-12-18 08:56:45

earlybird
ほやほや
Registered: 2015-12-16
Posts: 717
Website

Re: Is this SSD good to go?

Error: UNC 118 sectors at LBA = 0x018a1272 = 25825906

Yeah I wouldn't trust this device with anything. The 'uncorrectable' errors wll probably pile up and consume the remaining 'reserved' blocks in no time. SSDs fail differently than spinning disks too. When they stop working, it can be abrupt and that's it. The above error indicates an incorrectable read error at that address – the OS would have read garbage, which may or may not have stopped Windows from working. The error is not likely due to a faulty cable but the storage going bad/being bad itself (errors in self-test log).

It is normal that an SSD has some bad flash memory due to manufacturing tolerances. In this case, the SSD's controller will transparently restructure its use of the physical memory to circumvent such bad blocks and keep them 'hidden'/unused. The OS should never see those errors - if you have I/O errors in the kernel ring buffer, that's a bad sign meaning that the controller cannot longer compensate for manufacturing tolerances and encounters blocks previously known as good to go bad, which is an indicator for failure. The disk seems to have been running for a lot of hours too so a failure doesn't seem that unlikely, who knows what load was put on it.

If you intend to risk it, I'd suggest writing and reading/verifying data up and down the disk for a couple of times (that is, x times 128G written and read) or so and see if additional errors are encountered.

Offline

#3 2018-12-19 03:46:09

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 5,734
Website

Re: Is this SSD good to go?

earlybird wrote:

If you intend to risk it, I'd suggest writing and reading/verifying data up and down the disk for a couple of times (that is, x times 128G written and read) or so and see if additional errors are encountered.

Thanks for looking at this.

I already ran

badblocks -nsvp2 /dev/sda

from a live session. As I understand it, that's doing non-destructive read-write to the whole disk, twice? No bad blocks were found this time. (Previously, before the ext4 reformat there were some.)

Is there some other utility you'd recommend? Or wiping the current (unimportant) disk content with a destructive read/write test?

But SSDs are quite cheap now, so replacing the drive is quite a reasonable option.


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
In case you forget, the rules.

Offline

#4 2018-12-26 08:55:44

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 5,734
Website

Re: Is this SSD good to go?

Since an SSD of twice the current capacity (120GB) is not that expensive now, I'm inclined to get a new one and stop worrying about sudden failure.

But before investing any money in this machine is there any other hardware that I ought to check out for wear? (I've already run a memtest, with no errors.)


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
In case you forget, the rules.

Offline

#5 2018-12-26 11:11:00

twoion
ほやほや
Registered: 2015-08-10
Posts: 2,485

Re: Is this SSD good to go?

johnraff wrote:

Since an SSD of twice the current capacity (120GB) is not that expensive now, I'm inclined to get a new one and stop worrying about sudden failure.

But before investing any money in this machine is there any other hardware that I ought to check out for wear? (I've already run a memtest, with no errors.)

I'd just clean the hell out of the device if not already done and take it apart and put it together once to catch any loose connectors. But if you don't want to, I guess you don't have to smile You can check if the RAM is correctly in the slot at the same time you replace the SSD, just by taking off the back cover.

There are general instructions regarding quick and thorough checkouts in the hardware maintenance manual (like running BIOS diagnosis tools etc): https://download.lenovo.com/ibmdl/pub/p … 241_07.pdf.

I would also check if you can find any BIOS/UEFI/firmware updates for this device and apply them. Go to https://pcsupport.lenovo.com/de/en/prod … /downloads check BIOS/UEFI as component and download the bootable CD image for Linux, or the regular installer for Windows (easier!). The workflow for getting a bootable USB out of the image is detailed @ https://www.thinkwiki.org/wiki/BIOS_Upgrade.


A silent kite against the blue, blue sky

Offline

#6 2018-12-27 05:23:09

johnraff
nullglob
From: Nagoya, Japan
Registered: 2015-09-09
Posts: 5,734
Website

Re: Is this SSD good to go?

A good cleanout sounds like a good idea. When I first opened it all kinds of sawdust (or something) fell out!

And thanks for going to the trouble of checking out the Lenovo material. They do seem to be doing quite a good job of supporting older machines.

I have yet to go through that maintainance manual, but as you point out, it's the obvious best place to start.

However, the recommended BIOS upgrade came right on page 3 or 4, so I did do that. The installed BIOS was the original 2011 and there had been a number of upgrades since then, the most recent this year. There's no longer any Windows, so I made a CD and booted it. Successfully, I'm happy to say - it's all to easy to brick a computer by a messed-up BIOS upgrade...


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
In case you forget, the rules.

Offline

Board footer

Powered by FluxBB