Start a Conversation

Unsolved

This post is more than 5 years old

3136

October 31st, 2012 10:00

Unable to start up a storage node

Hi,

We were wondering if anyone has any tips or clues on this issue; we have an older Gen2 Centera which is now out of support; we are in the process of moving all data onto another device but one node has gone down and will not start up again.

We have a cluster of 16 nodes, 4 of them access nodes and 12 storage nodes.

One of the storage nodes has been down for a few days, so we tried the usual steps of PXE booting the node from the front panel, powering the node off and leaving for some time and restarting, reseating the various cables but to no avail.

We also videoed the startup sequence by attaching a monitor to the back of the device when it was starting up, and found that some errors were coming up during the node setup process.

Some of the errors are below:

"udevd-event[2424]: run_program:exec of program '/sbin/vol_id' failed

udevd-event[2425]: run_program:exec of program '/lib/udev/mount.sh' failed

udevd-event[2428]: run_program:exec of program '/sbin/vol_id' failed

udevd-event[2429]: run_program:exec of program '/lib/udev/mount.sh' failed" 

the first 2 lines are repeated several times with different event numbers.

"Linking panel binary to panelsmbus                    done

Checking for watchdog reboot                            failed "

"Executing node setup                   failed"

"ReiserFS: hdh10: Using r5 hash to sort names

/etc/library: line 24:echo: write error: no space left on device" this line is repeated several times

"System boot control               The system has been set up

Failed features                Systat"

"Re adding <<4>hdg8> to <<6>md8 after it was kicked for being out of syncmdadm: error opening /dev/<6>md8: No such file or directory       failed"

"Re adding <<4>hdg3> to <<6>md3 after it was kicked for being out of syncmdadm: error opening /dev/<6>md3: No such file or directory   failed"

"Error connecting to NEI service: connect (error 111) Connection Refused

Error on undisplay: (1) SMBUS_HANDLE_NOT_INITIALISED

Error connecting to NEI service: connect (error 111) Connection Refused

Error connecting to NEI service: connect (error 111) Connection Refused

Error on display (1) SMBUS_HANDLE_NOT_INITIALISED             done"

"Starting syslog services                            done

syslogd Can't bind unix unix socket to name: address already in use"

"Start activating blob partition /dev/cstarde10

Start activating blob partition /dev/cstardf10

Finish mounting blob partition /dev/cstarde10         done

Start activating blob partition /dev/cstardg10

Finish mounting blob partition /dev/cstardf10          done"      It appears to complete this process for all 4 of the disks with status of "done" but then comes up with below error:

"cp:writing '/var/local/etc/nodesetup' : no space left on device"

"Starting CRON daemon

Starting fragment counter: /etc/init.d/rc4.d/S12fragmentcounter: line48: echo: write error: No space left on device"

"Starting Filepool software................

use of unitialized value in split at ./StartFP.db line 50.

cat: write error:No space left on device

/home/filepool/StartFP: line 239: echo: write error:no space left on device

/home/filepool/StartFP: line 244: echo: write error:no space left on device                done

/sbin/FPgrub-install: line 526: echo: write error: No space left on device"    Final line repeated many times.

"/etc/init.d/rc4.d/S15successful_boot: line34: echo: write error: No space left on device

/etc/init.d/rc4.d/S15successful_boot: line46: echo: write error: No space left on device

Master Resource Control Runlevel 4 has been reached

Failed services in runlevel 4:                       successful boot "

- After node login command,  several lines of  "MGETTY FATAL:yS0 cannot write PID to (temp) lock file"

We do understand that the devices are old and so support will be difficult; however we have some hardware support for the devices so if we could understand which component may have failed then it may help us to keep the cluster going for a little longer.

Any ideas or thoughts gratefully received !

Thank you,

124 Posts

October 31st, 2012 10:00

Sounds like /var is full and possibly /home or /

Dennis

124 Posts

October 31st, 2012 11:00

They are partitions 3, 7 and 8 on all 4 disks and are assembled into raid groups where all 4 partition 3’s make up the root partition, all 4 partition 7’s make up /home and all 4 partition 8’s make up /var – These raids need to be assembled and started than mounted to see if they are full

These are O/S partitions – EMC Service will need to fix this. Since this is Gen2 and out of service, this will likely be a paid engagement

Dennis

3 Posts

October 31st, 2012 11:00

Thank you Dennis for the quick reply - do you know if the directories you mention are actually stored on the disks themselves or is there another area where they are stored on the storage node?

3 Posts

November 1st, 2012 02:00

OK thank you Dennis, that is pretty much what we thought would be the case but appreciate the info as at least we can understand more where the issue is.

No Events found!

Top