Mark Silinio (mark_silinio) wrote in ru_root,
Mark Silinio
mark_silinio
ru_root

Вылетили сразу два диска из RAID-1 массива

есть HP Proliant ML150G3 и на его HP Embeded SATA RAID Controller был сделан RAID-1 из двух SATA дисков
и вот захожу я в серверную, а там примерно такая картина на мониторе(только с сообщением о двух SMART Failed SATA дисках):



так их перетак тыкал в рэйде и никак.

взял подцепил один диск к другому компу и запустил на него smartctl -a:


smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     GB0250C8045
Serial Number:    9SF0AFV7
Firmware Version: HPG1
User Capacity:    250,059,350,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Sun Jan  9 11:33:16 2011 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73)	The previous self-test completed having
					a test element that failed and the test
					element that failed is not known.
Total time to complete Offline 
data collection: 		 ( 625) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  56) minutes.
Conveyance self-test routine
recommended polling time: 	 (   3) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   058   051   044    Pre-fail  Always       -       203135795
  3 Spin_Up_Time            0x0003   099   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       244
  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2046
  7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       100402424513
  9 Power_On_Hours          0x0032   078   078   000    Old_age   Always       -       20064
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       54
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Unknown_Attribute       0x0032   094   094   000    Old_age   Always       -       6
188 Unknown_Attribute       0x0032   098   097   000    Old_age   Always       -       193276477487
189 Unknown_Attribute       0x003a   001   001   000    Old_age   Always       -       610
190 Temperature_Celsius     0x0022   074   062   045    Old_age   Always       -       437780506
194 Temperature_Celsius     0x0022   026   040   000    Old_age   Always       -       26 (Lifetime Min/Max 0/18)
195 Hardware_ECC_Recovered  0x001a   040   027   000    Old_age   Always       -       203135795
196 Reallocated_Event_Count 0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2046
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1839 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1839 occurred at disk power-on lifetime: 20061 hours (835 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 40 00      00:12:51.058  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:12:50.026  NOP [Abort queued commands]
  ec 00 00 00 00 00 40 00      00:12:49.504  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:12:48.471  NOP [Abort queued commands]
  ec 00 00 00 00 00 40 00      00:12:47.949  IDENTIFY DEVICE

Error 1838 occurred at disk power-on lifetime: 20061 hours (835 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 40 00      00:12:49.504  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:12:48.471  NOP [Abort queued commands]
  ec 00 00 00 00 00 40 00      00:12:47.949  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:12:40.131  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      00:12:38.053  NOP [Abort queued commands]

Error 1837 occurred at disk power-on lifetime: 20061 hours (835 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 40 00      00:12:47.949  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:12:40.131  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      00:12:38.053  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      00:11:44.815  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      00:06:39.810  NOP [Abort queued commands]

Error 1836 occurred at disk power-on lifetime: 20061 hours (835 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 40 00      00:06:39.288  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:06:38.255  NOP [Abort queued commands]
  ec 00 00 00 00 00 40 00      00:06:37.733  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:06:36.700  NOP [Abort queued commands]
  ec 00 00 00 00 00 40 00      00:06:36.178  IDENTIFY DEVICE

Error 1835 occurred at disk power-on lifetime: 20061 hours (835 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 40 00      00:06:37.733  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:06:36.700  NOP [Abort queued commands]
  ec 00 00 00 00 00 40 00      00:06:36.178  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff      00:06:24.600  NOP [Abort queued commands]
  00 00 00 00 00 00 00 ff      00:06:22.522  NOP [Abort queued commands]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%     20064         -
# 2  Short offline       Completed: unknown failure    90%     20064         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



- на втором диске тоже самоё

так что это могло произойти? как раз в праздники и одновременно прям два ёкнулись, ни разу такого не встречал
Subscribe

Recent Posts from This Community

  • HP StorageWorks D2700

    парни, а к каким контроллерам можно подключать сей девайс? Родная документация ведет на ныне мертвый http://www.hp.com/go/D2000. В том месте где…

  • vmware player, beep!!!111

    кривой вопрос. а как запретить вмварь-плееру издавать любые звуки, чтоб при таб-таб-таб в виртуальном линухе оно не пищало? совет ссайта афтырей про…

  • dhcp

    а у вас на практике бывали сети, где было не порядка 256, а порядка 64к компов в одной подсети которым надо раздать ip адреса? я не про какой-нибудь…

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 26 comments

Recent Posts from This Community

  • HP StorageWorks D2700

    парни, а к каким контроллерам можно подключать сей девайс? Родная документация ведет на ныне мертвый http://www.hp.com/go/D2000. В том месте где…

  • vmware player, beep!!!111

    кривой вопрос. а как запретить вмварь-плееру издавать любые звуки, чтоб при таб-таб-таб в виртуальном линухе оно не пищало? совет ссайта афтырей про…

  • dhcp

    а у вас на практике бывали сети, где было не порядка 256, а порядка 64к компов в одной подсети которым надо раздать ip адреса? я не про какой-нибудь…