HDDのSMART情報を見る

PCがビープ音ならして止まるので念のためHDDのチェックをする。

$ sudo smartctl -A /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   086   085   020    Pre-fail  Always       -       1839
  4 Start_Stop_Count        0x0032   091   091   008    Old_age   Always       -       6237
  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   085   023    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0012   069   069   001    Old_age   Always       -       20926
 10 Spin_Retry_Count        0x0026   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   020    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   093   093   008    Old_age   Always       -       4805
 13 Read_Soft_Error_Rate    0x000b   100   100   023    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   086   081   042    Old_age   Always       -       37
195 Hardware_ECC_Recovered  0x001a   007   002   000    Old_age   Always       -       1346290085
196 Reallocated_Event_Count 0x0010   100   100   020    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x001a   194   194   000    Old_age   Always       -       6

valueがthresh以下だとWHEN_FAILED列に「WHEN_FAILED」がつく。
この時、Pre-failだとやばい、old-ageだと寿命。

まとめると
Pre-failかつvalueがthresh以下だとやばい
old-ageかつvalueがthresh以下だと寿命

この場合特に問題無し。
強いて言えば1 Raw_Read_Error_Rate、198 Offline_Uncorrectableがworst以下なのが気になるがこれは前からだし。

念のためテスト

$ sudo smartctl -t short /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
...
Test will complete after Sun Jun  8 14:57:01 2008

二分ほど待つ

$ sudo smartctl -l selftest /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20926         -
# 2  Short offline       Completed without error       00%      9003         -

特に問題無し、と。

ちなみにガリガリいって壊れる直前のHDDの時

sudo smartctl -l selftest /dev/hdc  #テスト結果取得

smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     15994         11534400
# 2  Short offline       Completed without error       00%     15749         -
# 3  Short offline       Completed without error       00%     15748         -
# 4  Short offline       Completed without error       00%     15747         -
# 5  Short offline       Completed without error       00%     15747         -
# 6  Short offline       Completed: read failure       90%     15742         4190514
# 7  Short offline       Completed: read failure       90%     15734         7343562

read failureがでまくり。データ退避後お亡くなりになりました。

雑感

原因は熱暴走だろうな。夏が近いぜ