Dell ObjectScale 1.3 Administration Guide

Place a failed node into permanent maintenance mode (ObjectScale on OpenShift)

Use kubectl to manually place a failed node (that is powered off, dead, or otherwise inaccessible) into permanent maintenance mode. Use this process for ObjectScale instances on a Red Hat OpenShift cluster.

About this task

NOTE:The removal process for a failed node is largely manual, and does not involve the ObjectScale Operator in the same way that standard PMM does. However, during the node removal process, the ObjectScale Operator starts recovery procedures for certain non-SS stateful pods, such as bookie, influxdb, and zookeeper. Recovery of the non-SS stateful pods occurs automatically, and does not affect the procedure workflow.

Place the failed node to be removed in to permanent maintenance mode:

Steps

Mark the failed node as unschedulable so that it is no longer available to run pods.
```
kubectl cordon <NODE_NAME>
```

Delete the pods from the node.

kubectl drain <NODE_NAME> --force --delete-local-data --ignore-daemonsets

Collect the UUID for the node to be removed from the cluster:
```
kubectl get csibmnodes
```
Remove the node from the OpenShift cluster.
```
kubectl delete node <NODE_NAME>
```
NOTE:Once you delete the node, it is no longer listed in the kubectl get nodes output.

Manually delete the PVCs bound to the failed node:

Get the names of the PVCs:
```
kubectl get pvc
```
Get the details for each of the PVCs:
```
kubectl describe <PVC_NAME>
```

Get the node for each of the PVCs:

for i in `kubectl get pvc --no-headers -o jsonpath="{.items[*].metadata.name}"`; do echo "=== $i"; kubectl get pvc $i -o json | grep selected-node | grep -v "{}"; done

user1@hw-and-os-96:~> kubectl describe pvc data-0-ecs-cluster-ss-0
Name:          data-0-ecs-cluster-ss-0
Namespace:     default
StorageClass:  csi-baremetal-sc-hdd
Status:        Bound
Volume:        pvc-8f28acda-f7e5-4efc-9b4b-d8e50e21e72f
Labels:        app=ecs-cluster-ss
               app.kubernetes.io/component=ss
               app.kubernetes.io/name=ecs-cluster
               app.kubernetes.io/namespace=default
               component=ss
               objectscale.dellemc.com/logging-inject=true
               objectscale.dellemc.com/logging-release-name=ecs-cluster
               operator=objectscale-operator
               release=ecs-cluster
Annotations:   pv.attach.kubernetes.io/ignore-if-inaccessible: yes
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi-baremetal
               volume.kubernetes.io/selected-node: worker3.ocp4.cmo.com
               volumehealth.storage.kubernates.io/health: accessible
               volumerelease.csi-baremetal/support: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      11176Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    ecs-cluster-ss-0
Events:        <none>

Once all PVC names for the node have been gathered, delete the PVCs:
```
kubectl delete pvc <PVC_NAME_1> <PVC_NAME_2> <PVC_NAME_N>
```

Patch and delete the volumes of the failed node:

Patch the volumes:

kubectl get volume | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl patch volume --type merge -p '{"metadata":{"finalizers":null}}'

Remove the volumes:

kubectl get volume | grep <NODE_UUID>| awk '{print $1}' | xargs kubectl delete volume

Patch and delete the LVGs of the failed node:

Patch the LVGs:

kubectl get lvg | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl patch lvg --type merge -p '{"metadata":{"finalizers":null}}'

Remove the LVGs:

kubectl get lvg | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl delete lvg

Clean up all the CSI resources for the failed node:

Patch the LVGs:

kubectl get csibmnode | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl patch csibmnode --type merge -p '{"metadata":{"finalizers":null}}'

Remove the LVGs:

kubectl get csibmnode | grep <NODE_UUID>| awk '{print $1}' | xargs kubectl delete csibmnode

Delete the drive CRs:

kubectl get drive | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl delete drive

Delete the available capacity:

kubectl get ac | grep <NODE_UUID> | awk '{print $1}' | xargs kubectl delete ac

Remove the pending pods for all namespaces that are associated with ObjectScale and object stores:
1. Identify the pods to be deleted:
```
kubectl get pods | grep Pending
```
2. Delete each pod returned, that is associated with the removed node:
```
kubectl delete pods <PODS>
```
Finally, verify that all the resources have been successfully removed:
1. Check for Bare-Metal nodes:
```
kubectl get csibmnode | grep <NODE_UUID>
```
2. Check for available capacity:
```
kubectl get ac | grep <NODE_UUID>
```
3. Check for drive CRs:
```
kubectl get drive | grep <NODE_UUID>
```
Optional: Monitor the automatic recovery of non-SS pods:
In order to ensure data protection, certain non-SS pods, such as the bookie, influxdb, and zookeeper pods, require recovery after they are relocated. ObjectScale Operator initiates recovery for these pods automatically once the pods are removed from the PMM node and started on another available node in the cluster.
```
kubectl get serviceprocedures -A -o custom-columns=Name:metadata.name,Node:spec.nodeInfo.name,Type:spec.type,Time:metadata.managedFields[0].time,Reason:status.reason,Message:status.message
```

Welcome

Welcome to Dell

Dell ObjectScale 1.3 Administration Guide

Place a failed node into permanent maintenance mode (ObjectScale on OpenShift)

About this task

Steps

Rate this content